LAST_VALUE() Analytic Function–SQL Server 2012

Earlier, i discussed FIRST_VALUE() function. FIRST_VALUE() returns the first value from an ordered set of records. Similarly LAST_VALUE() returns the last value for an ordered result set, but you are in for a bit of a surprise if you aren’t familiar with the enhancements to the SQL windows functions.

I’ll use the same example data i used earlier.

CREATE TABLE Customers
(CustId INT, OrderDate DATE, OrderAmount MONEY) ;

INSERT Customers VALUES
(1, ‘2011-12-03’, 12500)
, (
1, ‘2011-10-07’, 15000)
, (
1, ‘2011-09-12’, 16000)
, (
2, ‘2011-07-11’, 900)
, (
2, ‘2011-09-13’, 18000)
, (
2, ‘2011-11-03’, 7000)
, (
3, ‘2011-01-29’, 19000)
, (
3, ‘2011-04-19’, 1000)
, (
3, ‘2011-07-09’, 12000) ;

We wrote a query using FIRST_VALUE() to calculate the date of first purchase for each customer.

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, FIRST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM
Customers c

Now, let’s say we want to know the recent purchase date for each customer. Similar to FIRST_VALUE(), there is a LAST_VALUE() function. LAST_VALUE() allows you to return the last value from an ordered result set.

I’ll follow my killer instincts and replace FIRST_VALUE() with LAST_VALUE() in the above query. It should work. Right?

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, LAST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM Customers C

Result Set:

… but, oops, this is not the result i was expecting.

There are two points to note here. Window and Frame.

A window is a set of records that are defined by the OVER() clause. Because of the PARTITION BY Custid, you can safely think records for each Custid form a window in this example.

A Frame is a range of rows within a window; a subset of a window in other words, defined by the ROWS clause. A range has boundaries – a begin position and an end position.

Look back at the query using LAST_VALUE(). You’ll notice there wasn’t a ROWS clause. We haven’t defined a frame. In this case SQL Server uses the default frame which is, between the first row without any boundaries (first row of the window partition) and the current row. Last value from this default frame will always be the current row.

Correct usage of the function should instead be:

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, LAST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate ROWS BETWEEN  CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM Customers C

Given all this, FIRST_VALUE() is providing expected results even with the default frame because the first row in the default frame is the first row of the window partition.

Here is an interesting article from Jeremiah Peschka (blog) for further reading: Leaving the Windows Open.

~Sam

Follow Sam on Twitter – @SamuelVanga


Advertisements

FIRST_VALUE() Analytic Function–SQL Server 2012

Given a customer table:

CREATE TABLE Customers
(CustId INT, OrderDate DATE, OrderAmount MONEY) ;

INSERT Customers VALUES
(1, ‘2011-12-03’, 12500)
, (
1, ‘2011-10-07’, 15000)
, (
1, ‘2011-09-12’, 16000)
, (
2, ‘2011-07-11’, 900)
, (
2, ‘2011-09-13’, 18000)
, (
2, ‘2011-11-03’, 7000)
, (
3, ‘2011-01-29’, 19000)
, (
3, ‘2011-04-19’, 1000)
, (
3, ‘2011-07-09’, 12000) ;

Let’s say, we want to retrieve all records along with a column showing the customer’s date of first purchase.

We could start by computing the date of first purchase for each customer.

SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM   Customers
GROUP BY
CustId

Result set has one row per each customer.

image

Joining this to the original table will return the expected results.

SELECT c.*, firstD.FirstPurchaseDate
FROM   Customers c
INNER JOIN
(
SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM   Customers
GROUP BY
CustId
)firstD
ON firstD.CustId = c.CustId

Result set:

image

The query plan uses a nested join because of the join we were using the query above.

image

We can achieve the same results using the FIRST_VALUE() function.

SELECT
c.CustId, c.OrderDate, c.OrderAmount
, FIRST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM
Customers c

[PARTITON BY CustId], creates a partiton for each customer. [ORDER BY OrderDate] sorts each partition by the OrderDate. Now, from this ordered result set, FIRST_VALUE() returns the first value for each partition.

The query is easier to read and a join is not needed. Result: the nested loop is eliminated in the query plan. This could improve performance.

image

I’ll conclude with a question for you to understand LAST_VALUE(). From the same fictitious customer data, let’s say we want to compute the recent purchase date (as opposed to first purchase date from above example) of every customer. What would be the query? I welcome your response in the comments below.

@SamuelVanga