Generating Surrogate Keys With SSIS

A surrogate key is an auto generated value, usually integer, in the dimension table. It is made the primary key of the table and is used to join a dimension to a fact table. Among other benefits, surrogate keys allow you to maintain history in a dimension table. Despite of the their popularity, SSIS doesn’t have a built in solution for generating surrogate keys. Let’s take a look at a few alternatives in this post.

First, create the following table. We will import data from Person.Person table (AdventureWorks sample) into this table. Note PersonSK is the surrogate key.
CREATE TABLE [dbo].Person
[FirstName] NVARCHAR(50) NULL
[LastName] NVARCHAR(50) NULL
) ;


Drag a data flow task on to the control flow and configure the OLE DB source.

Next, drag OLE DB destination and connect it to the source. Specify the connection manager, and choose Table or view – fast load as the data access mode. This performs a bulk insert and is the fastest of all.

Destination table has three columns, but source has only two columns. In the mappings page, map input and output columns for FirstName and LastName and ignore the mapping for PersonSK.

When you run the package, becasue PersonSK is an identity column in the table, SQL Server will automatically generate values for you. This solution is easy and fast, but sometimes depending on your ETL methodology, you can’t rely on IDENTITY().

With Script Component

I frequently use Script Transformation. The steps are nicely written in this post by Phil Brammer (b).  This is simply a script used as a transformation. The script generates a row number for each row passed through the data flow.


SSIS (data flow) engine generated the new row number when using script transformation. Instead, you can use ROW_NUMBER() when working with SQL Server data source to let the database engine do the work for you. This can be faster.

If you’re doing an incremental load, first find the maximum key value from the destination. I’ll use the following query. It’ll return zero if there were no rows, else it returns the maximum value.

FROM    dbo.Person

Add an Execute SQL Task to the control flow and set the result property to single row. Then, add a variable to hold the return value.

Next, connect a data flow task to the execute sql task. We will use the following SQL statement in the OLE DB source editor. In addition to the LastName and FirstName columns, we are using ROW_NUMBER() function to generate a unique number for every row.

, [LastName]
, [FirstName]

The query will generate numbers starting from 1 for each row, but while loading to destination we don’t want to start from 1. We want to find the maximum value in the destination and start from the next highest value. So, I’m adding the max value to every row number using parameters in the OLE DB source.

In the OLE DB destination, check the box that says Keep Identity. By do this we are asking SSIS to keep the identity values that are  generated by the source. In the mappings page, you’ll see a new input that was created in the OLE DB source. Map it to PersonSK surrogate key.

Go ahead and run the package. If it’s all good you will see cute little green tick marks like below.


In this example, we looked at different options to generate surrogate keys while loading dimensions using SSIS. We used IDENTITY() constraint in SQL Server. We talked about Script Component. Finally, we saw making use ROW_NUMBER() function. Last option is twice as fast as using the Script Component with around 20,000 rows and an index on LastName column.

5 thoughts on “Generating Surrogate Keys With SSIS

  1. Pingback: Surrogate Keys | James Serra's Blog

  2. The standard that we use is to encapsulate the INSERT into a stored procedure with an output parameter of the @@IDENTITY value.

    INSERT INTO Table ...
    -- The Next Statement Selects The Generated Identity Value

    This handles a couple of problems like security, internal coding standards and errors that happen when the identity is not immediately read.

  3. It seems the oledb source doesn’t like adding parameter in the ROW_NUMBER() clause. I get the following error

    “Parameters cannot be extracted from the SQL command. The provider might not help to parse parameter information from the command. In that case, use the “SQL command from variable” access mode, in which the entire SQL command is stored in a variable.”

    Have you came across this?

  4. Joyal – I have parameters option in my source editor. I promise 🙂

    Siv – I’m truly sorry for the late response. I was caught up in other things.

    I have no idea why you are seeing the error you mentioned. The only difference i can think of is i used SSIS 2012 for this example. Not sure if there is any change, but will find out If you are working on previous versions, check out the below link. It has a very good example on setting up parameters in OLED source.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s