Previous Row T-SQL (Part2/2)

I started a two part series on linking to the previous rows. In the first part, I’ve showed how to calculate running totals using T-SQL. In this post i will discuss another problem which also requires linking to previous row fron the result set. Calculating change from the previous row.

Test Data:

I use the same sample data that was used for part 1 of this post. It has Yr and SaleAmount for each year.

How to calculate change from previous row?

Suppose you want to calculate the change of SaleAmount from current year to the previous year.

Using CTE

; WITH cteChangefromPrevious
AS
(
SELECT
Yr
, SaleAmount
, ROW_NUMBER() OVER(ORDER BY Yr) Row_Num
FROM
#YrSales
)
SELECT
Cur.Yr
, Cur.SaleAmount
, Cur.SaleAmount - Prev.SaleAmount
FROM
cteChangefromPrevious Cur
LEFT OUTER JOIN
cteChangefromPrevious Prev
ON Cur.Row_Num = Prev.Row_Num + 1

Row_Number function inside the CTE assigns unique row numbers to each year as a new column (Row_Num).

In the outer select query I join the CTE to itself using a left outer join is joining CTE with itself. Join condition current row = (previous row + 1) causes the query to return SaleAmount for previous year. Minus operation between the previous row and current row retrieves the differnece between two values.

Conclusion

You’ve seen three approaches to this problem, linking to previous rows. Correlated sub query, cross join, and CTE. Using sub query is the popularly used method. But it is worth it try the other methods also.

~Sam.

Previous Row T-SQL (Part1/2)

Calculating running totals, difference from the previous row are common problems. The tricky part to resolve these problems is to be able to link to the previous row. This first part of a two part series in which i discuss options to solve this probelm. I start with running totals.

Test Data:

For this post i use the following script to create and populate sample data. Note that i use row constructors introduced in SQL Server 2008. Below script won’t work as is in version prior to 2008 unless inserted using multiple insert statements.

CREATE TABLE #YrSales
(
Yr INT
, SaleAmount INT
)

INSERT #YrSales
VALUES
(2007,15000)
,(
2008,10000)
,(
2009,25000)
,(
2010,20000)
,(
2011,18000)

How to calculate running totals:

Using Sub Query:

Following query uses a correlated sub query inside the select clause. Correlated sub query executes once for each row returned by the outer query. I originally learned this method from an article by Itzik Ben-Gan, who obviously doesn’t need any introduction.

SELECT
Yr, SaleAmount
,RunningTotal = (SELECT SUM(SaleAmount) FROM #YrSales WHERE Yr<= A.Yr)
FROM
#YrSales A

Suppose the first row returned is Yr = 2009; SaleAmount = 25000. Sub query performs a sum of SaleAmount for all the years less than or equals 2009 and returns 50000as RunningTotal. This continues until there are no more rows returned by the outer query.

Using Cross Join:

Using cross join is an efficient way especially when dealing with larger data sets. Cross joining the table with itself causes every row from the left table (a) to be joined to every row in the right table (b). Where clause ensures that join is performed only when Yr from the right table (b) is less than yr from the left table (a). Yr, SaleAmount from left table (a) in the select clause return Yr and SaleAmount. Sum(b.SaleAmount) returns running total.

SELECT a.Yr,
a.SaleAmount
,SUM(b.SaleAmount) AS RunningTotal
FROM #YrSales a
CROSS JOIN #YrSales b
WHERE (b.Yr <= a.Yr)
GROUP BY a.Yr,a.SaleAmount
ORDER BY a.Yr,a.SaleAmount

To better understand, execute the following queries one at a time:

SELECT
a.Yr,
a.SaleAmount
FROM
#YrSales a
CROSS JOIN
#YrSales b

SELECT
a.Yr,
a.SaleAmount
FROM
#YrSales a
CROSS JOIN
#YrSales b
GROUP BY
a.Yr,a.SaleAmount

SELECT
a.Yr,
a.SaleAmount
,SUM(b.SaleAmount) RunningTotal
FROM
#YrSales a
CROSS JOIN
#YrSales b
WHERE (b.Yr <= a.Yr)
GROUP BY
a.Yr,a.SaleAmount
ORDER BY
a.Yr,a.SaleAmount

SQL Server Views

This post is first of yet to come long series on basic concepts of SQL Server. I’ll begin with basic 101 introduction to SQL Server views. I wouldn’t recommend reading any further if you are an experienced sql server’er. I value your time 🙂

What are views?

  • virtual tables
  • named select statements

Where are views used?

  • to simplify underlying data model for users
  • to implement security mechanism
  • anywhere a table is expected

How to create views?

Create View t-sql command is used to create a view. Following is a sample from AdventureWorks database. Views are defined by the underlying query (select statement that creates it). Views contain rows and columns that were returned by the underlying query.

CREATE VIEW dbo.[vEmployee]
AS
SELECT
c.[FirstName]
, c.[LastName]
, [StateProvinceName] = sp.[Name]
, [CountryRegionName] = cr.[Name]
FROM
[HumanResources].[Employee] e
INNER JOIN [Person].[Contact] c
ON c.[ContactID] = e.[ContactID]
INNER JOIN [HumanResources].[EmployeeAddress] ea
ON e.[EmployeeID] = ea.[EmployeeID]
INNER JOIN [Person].[Address] a
ON ea.[AddressID] = a.[AddressID]
INNER JOIN [Person].[StateProvince] sp
ON sp.[StateProvinceID] = a.[StateProvinceID]
INNER JOIN [Person].[CountryRegion] cr
ON cr.[CountryRegionCode] = sp.[CountryRegionCode]

A select query can be used to retrieve data from the view.

SELECT * FROM dbo.vEmployee

If you are following along and executing the scripts so far you will notice that selecting from a view will return a result set (rows and columns).

How are views stored?

Unlike tables, contents of a view aren’t physically stored. Only the underlying query is given a name and saved on the server. Okay Mr.Genius, if the contents of a view are not saved how did selecting a view return data? I am glad you asked.

SQL Server internally replaces view with the underlying query and reads                                                                                                        select * from <view name> as select * from <underlying query>.

Uses of Views:

Most of the OLTP databases are highly normalized; which means data is spread across multiple tables and naturally joins will have to be used to retrieve this data making it difficult for report writers and other users. Views like dbo.vEmployee can be used to hide all the complexity from users. They form a simplified layer between database and users.

Views can also be used to implement security mechanism. Imagine a student table with ID, Name, SSN, and DOB. Not all the users should have access to SSN of students, a view can be created with only ID, Name, and DOB and users be granted access to this view limiting them from viewing SSN.

~Sam

SCRIPT: Find the row count of all tables and views

Today I’m sharing another script that I often find useful – Queries to find the count of rows from tables and views in a SQL Server database.

Count Rows In All Tables:

This query returns the count of rows from all tables in a database. The results can be seen instantaneously without causing any blocking or locking. It is safe to run this in production.

; WITH    A AS ( SELECT   TableName = O.Name ,
                       
SchemaName = SCHEMA_NAME(O.Schema_id) ,
                       
[Rows] = P.Rows
              
FROM     sys.objects O
                       
INNER JOIN sys.partitions P ON O.OBJECT_ID = P.OBJECT_ID
              
WHERE    TYPE = 'U'
            
)
   
SELECT  SchemaName ,
           
TableName ,
           
SUM([Rows])
   
FROM    A
   
GROUP BY SchemaName ,
           
TableName ;

 

Count Rows In All Views:

Unlike for tables, there is no way to query any DMV to get the row count. The only way is to perform COUNT() on all the views. The following query outputs a select for each view. Copy the results and execute in a query window.

SELECT fullname+' UNION ALL'
FROM sys.views v
INNER JOIN
sys.schemas s
ON s.schema_id = v.schema_id
CROSS APPLY (SELECT 'select '+''''+s.name+'.'+v.name+''''+','+ 'count(*) from '+s.name+'.'+v.name AS fullname) AS a

UNION ALL

SELECT 'SELECT '+'''DUMMY'''+', '+'0'

Caution: The query for views can be dangerously slow depending on the view’s definition. Please, please, please test it in your development before running it in production.

@SamuelVanga

SCRIPT: Basic Information About Indexes

I’m having a little fun with documenting basic information about indexes in my current project. I’m posting the scripts here mostly for me to come back when I need them in the future, but hopefully some of you might find them useful.

Find tables without any index:

This script gives you the list of tables that don’t have any index.

; WITH A
AS
(
SELECT SchemaName = OBJECT_SCHEMA_NAME(t.OBJECT_ID) ,
TableName = t.name
FROM   sys.tables t
WHERE  OBJECTPROPERTY(t.OBJECT_ID,'TableHasIndex') = 0
)
SELECT SchemaName ,
TableName ,
TwoPartName = SchemaName+'.'+TableName
FROM A

Find tables without a clustered index:

This script gives the list of tables without a clustered index.

; WITH B
AS
(
SELECT SchemaName = OBJECT_SCHEMA_NAME(OBJECT_ID) ,
TableName = OBJECT_NAME(OBJECT_ID)
FROM sys.indexes
WHERE index_id = 0
AND OBJECTPROPERTY(OBJECT_ID, 'IsUserTable') = 1
)
SELECT SchemaName ,
TableName ,
TwoPartName = SchemaName+'.'+TableName
FROM B
ORDER BY SchemaName

Find indexes on a table along with columns covered:

Use this script to get the index names and a comma separated list of columns included in the index.

; WITH C
AS
(
SELECT TableName = t.name
, IndexName = i.name
, ColumnName = c.name
FROM   sys.tables t
INNER JOIN
sys.indexes i
ON t.OBJECT_ID = i.OBJECT_ID
INNER JOIN
sys.index_columns ic
ON ic.OBJECT_ID = i.OBJECT_ID
AND ic.index_id = i.index_id
INNER JOIN
sys.columns c
ON c.OBJECT_ID = t.OBJECT_ID
AND c.column_id = ic.column_id
)
SELECT DISTINCT TableName, IndexName, ColumnNames
FROM cte A
CROSS APPLY
(
SELECT ColumnName + ', '
FROM cte B
WHERE A.TableName = B.TableName AND A.IndexName = B.IndexName
ORDER BY TableName, IndexName
FOR XML PATH('')
)
D (ColumnNames)

Notice that I generously used CTE’s here. I like to reuse column aliases instead of repeating length expressions everywhere. CTE allows you to do that.

@SamuelVanga

Be careful with two digit years

I recently had a scary conversation with a user, which can be broken down to:

User: Why would the data have customers with a birth date of MM/DD/2020?

Me: What are you talking about? Lemme take a look.

Me again, this time with a changed voice: Umm, Umm, i plead guilty with an explanation your honor!

User: Let’s hear your explanation son.

Me: We receive a raw file with customer data. Date of birth is a field in that file, and it is supposed to contain data in ‘DD-Month-YYYY’ format. But, apparently, one fine day the file contained two digits for year instead of four. My import process decided that the two digit year 20 is 2020 and not 1920.

User: Okay, I’ll let you go this time, but fix it. Won’t you?

<end of story>.

Turns out, SQL Server automatically uses current century for two digit years less than 50, and previous century for years greater than or equal to 50.

SELECT CAST('01-January-49' AS DATETIME) [<50]

Result:

<50
———————–
2049-01-01 00:00:00.000

(1 row(s) affected)

SELECT CAST('01-January-50' AS DATETIME) [>50]

Result:

<50
———————–
1950-01-01 00:00:00.000

(1 row(s) affected)

Here is the books online explanation for the two digit year cutoff. Benjamin Nevarez (t|b) pointed me to this link when i asked for help on twitter using the #sqlhelp hashtag. Thanks Benjamin.

Cheers!

@SamuelVanga.

You will also like If i learn something, sorry, but you have to read it!

LAST_VALUE() Analytic Function–SQL Server 2012

Earlier, i discussed FIRST_VALUE() function. FIRST_VALUE() returns the first value from an ordered set of records. Similarly LAST_VALUE() returns the last value for an ordered result set, but you are in for a bit of a surprise if you aren’t familiar with the enhancements to the SQL windows functions.

I’ll use the same example data i used earlier.

CREATE TABLE Customers
(CustId INT, OrderDate DATE, OrderAmount MONEY) ;

INSERT Customers VALUES
(1, ‘2011-12-03’, 12500)
, (
1, ‘2011-10-07’, 15000)
, (
1, ‘2011-09-12’, 16000)
, (
2, ‘2011-07-11’, 900)
, (
2, ‘2011-09-13’, 18000)
, (
2, ‘2011-11-03’, 7000)
, (
3, ‘2011-01-29’, 19000)
, (
3, ‘2011-04-19’, 1000)
, (
3, ‘2011-07-09’, 12000) ;

We wrote a query using FIRST_VALUE() to calculate the date of first purchase for each customer.

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, FIRST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM
Customers c

Now, let’s say we want to know the recent purchase date for each customer. Similar to FIRST_VALUE(), there is a LAST_VALUE() function. LAST_VALUE() allows you to return the last value from an ordered result set.

I’ll follow my killer instincts and replace FIRST_VALUE() with LAST_VALUE() in the above query. It should work. Right?

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, LAST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM Customers C

Result Set:

… but, oops, this is not the result i was expecting.

There are two points to note here. Window and Frame.

A window is a set of records that are defined by the OVER() clause. Because of the PARTITION BY Custid, you can safely think records for each Custid form a window in this example.

A Frame is a range of rows within a window; a subset of a window in other words, defined by the ROWS clause. A range has boundaries – a begin position and an end position.

Look back at the query using LAST_VALUE(). You’ll notice there wasn’t a ROWS clause. We haven’t defined a frame. In this case SQL Server uses the default frame which is, between the first row without any boundaries (first row of the window partition) and the current row. Last value from this default frame will always be the current row.

Correct usage of the function should instead be:

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, LAST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate ROWS BETWEEN  CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM Customers C

Given all this, FIRST_VALUE() is providing expected results even with the default frame because the first row in the default frame is the first row of the window partition.

Here is an interesting article from Jeremiah Peschka (blog) for further reading: Leaving the Windows Open.

~Sam

Follow Sam on Twitter – @SamuelVanga


FIRST_VALUE() Analytic Function–SQL Server 2012

Given a customer table:

CREATE TABLE Customers
(CustId INT, OrderDate DATE, OrderAmount MONEY) ;

INSERT Customers VALUES
(1, ‘2011-12-03’, 12500)
, (
1, ‘2011-10-07’, 15000)
, (
1, ‘2011-09-12’, 16000)
, (
2, ‘2011-07-11’, 900)
, (
2, ‘2011-09-13’, 18000)
, (
2, ‘2011-11-03’, 7000)
, (
3, ‘2011-01-29’, 19000)
, (
3, ‘2011-04-19’, 1000)
, (
3, ‘2011-07-09’, 12000) ;

Let’s say, we want to retrieve all records along with a column showing the customer’s date of first purchase.

We could start by computing the date of first purchase for each customer.

SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM   Customers
GROUP BY
CustId

Result set has one row per each customer.

image

Joining this to the original table will return the expected results.

SELECT c.*, firstD.FirstPurchaseDate
FROM   Customers c
INNER JOIN
(
SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM   Customers
GROUP BY
CustId
)firstD
ON firstD.CustId = c.CustId

Result set:

image

The query plan uses a nested join because of the join we were using the query above.

image

We can achieve the same results using the FIRST_VALUE() function.

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, FIRST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM
Customers c

[PARTITON BY CustId], creates a partiton for each customer. [ORDER BY OrderDate] sorts each partition by the OrderDate. Now, from this ordered result set, FIRST_VALUE() returns the first value for each partition.

The query is easier to read and a join is not needed. Result: the nested loop is eliminated in the query plan. This could improve performance.

image

I’ll conclude with a question for you to understand LAST_VALUE(). From the same fictitious customer data, let’s say we want to compute the recent purchase date (as opposed to first purchase date from above example) of every customer. What would be the query? I welcome your response in the comments below.

@SamuelVanga

CONCAT() string function–SQL Server 2012

As the name indicates, CONCAT() will concatenate two or more strings. This is one of the two string functions introduced in SQL Server 2012, FORMAT() being the other one.

New? Wait. Concatenating isn’t new. It’s always been there. Yeah. you are right. No, concatenating isn’t new.Yes, CONCAT() function is new.

“+” is used to concatenate strings, prior to 2012. As you can see in the query below, two strings are being concatenated.

SELECT 'HELLO'+SPACE(1)+'WORLD'

It gets tricky when you deal with non varchar data, and null values.

SELECT 1+SPACE(1)+'World' ;

SELECT ‘John’+NULL+‘Doe’ ;

These queries don’t return expected results. To fix that, select query should contain logic to replace null and convert int to string. On the left side of the below image is the corrected query.

image

CONCAT(), as can be seen in the right side, simplifies this process. It concatenates 2 or more strings, and automatically (implicitly) converts null to blank space and integer to string.

One more example to better explain the use of CONCAT().

SELECT CONCAT(1,1) ConcatInt
, CONCAT('John',NULL,'Doe') ConcatNull
, CONCAT(1,SPACE(1),'John') ConcatIntString

Lets wrap up…

Concat() is clearly a simpler way to concatenate strings. It improves code readability and is one of my favorite T-SQL enhancements.

~Sam.

T-SQL Tuesday #025 – Tips and Tricks

And the trick i share for this month’s T-SQL Tuesday is… Distraction free management studio.

If you’re an avid writer, you might be using Writeroom. It’s a full screen distraction free writing application, which gets your computer out-of-the-way so that you can focus on your work. Dark Room is a similar app for Windows.

Distraction free! Focus on your work! Isn’t that amazing?

So how about a similar app for management studio where you spend most of your time, writing queries? Wouldn’t it be nice to focus on queries alone and not worry about new email pop-ups or any other windows on your computer?

I did a quick search, and results, nada. But, better yet, this experience can be achieved by modifying a few options.

  • Add line numbers. From management studio, choose Tools, options. Expand Text Editor and expand Transact-SQL. In the general properties check Line numbers under Display and click ok.
  • Change background color. Again from options expand Environment and choose Fonts and Colors. Click ok.
  • View query window in full screen. Click view and choose full screen when writing queries. This will bring SSMS to the foreground and closes object explorer too. To exit from full screen again click view and deselect full screen.
  • Hide results window. Finally, press Ctrl+R to hide the results window. Press Ctrl+R again to display results window.

image

Give it a try, you’ll know what i am talking about!

@SamuelVanga