Create Several Shell SSIS Packages In Minutes Using BIML

What do i mean by shell packages? You’re spot on – that’s an excellent question to begin with. For the purpose of this post, I define shell package as an empty package that is renamed appropriately and contains all the required connection managers.

It is a common pattern among SSIS developers to create separate packages for each unit of work. For example, you will create a package for each table that needs to be loaded.

The only problem: manually creating hundreds of packages, sometimes even more, is time consuming and boring. If you think about it, first steps of creating all these packages is always the same: Right click on SSIS Packages, select New SSIS Package, rename the package, and create Connection Managers.

Wouldn’t it be nice if there is a way to automate these boring steps? I hear you screaming yes.

Package templates help you to some extent. You create a package with the required connection managers and save it as a template. The subsequent packages will automatically contain all connection managers the template has, you don’t have to add them manually to all packages. You still have to add and rename packages though.

Let’s take a look at a much better and time saving method!

I use BIML to dynamically create multiple SSIS packages for each table that needs to be loaded. All packages will be renamed consistently and will contain connection managers.

Demo:

Execute the following SQL to create source and destination databases.

CREATE DATABASE SrcDatabase ;

CREATE DATABASE DestDatabase ;

Next, create some tables in the destination database.

USE DestDatabase;
GO

CREATE TABLE dbo.DimProduct (
ProductKey INT IDENTITY(1, 1) PRIMARY KEY
,ProductCode VARCHAR(10) UNIQUE
) ;

CREATE TABLE dbo.DimCustomer (
CustomerKey INT IDENTITY(1, 1) PRIMARY KEY
,CustomerCode VARCHAR(10) UNIQUE
) ;

CREATE TABLE dbo.DimEmployee (
EmployeeKey INT IDENTITY(1, 1) PRIMARY KEY
,EmployeeCode VARCHAR(10) UNIQUE
) ;

Again, our objective is to create one package for each of the tables in the destination database.

Open BIDS, I’m using SSIS 2008R2 for this example. You’ll have to install BIDS Helper from here. Right click on the project and add New BIML File. Your solution explorer look like this:

image

Double click the BIML file and paste the following script. If you see issues while copy pasting, check this link.

<#@ template language=”C#” hostspecific=”true”#>
<#@ import namespace=”System.Data” #>

<Biml xmlns=”http://schemas.varigence.com/biml.xsd”>
<Connections>
<Connection Name=”SrcDatabase” ConnectionString=”Data Source=(local);Initial Catalog=SrcDatabase;Provider=SQLNCLI10.1;Integrated Security=SSPI” RetainSameConnection =”true” DelayValidation =”false”/>
<Connection Name=”DestDatabase” ConnectionString=”Data Source=(local);Initial Catalog=DestDatabase;Provider=SQLNCLI10.1;Integrated Security=SSPI;” RetainSameConnection =”true” DelayValidation =”false”/>
</Connections>
<Packages>

<# string DestCnStr =”Data Source=(local);Initial Catalog=DestDatabase;Provider=SQLNCLI10.1;Integrated Security=SSPI”;
DataTable tables = ExternalDataAccess.GetDataTable(DestCnStr,”SELECT t.name FROM sys.tables t”) ;
foreach (DataRow row in tables.Rows)
{ #>
<Package Name=”Load_<#=row[0]#>” ConstraintMode=”Linear” >
<Tasks>
<Container Name=”Dummy — remove container” ConstraintMode=”Linear”>
<Tasks>
<ExecuteSQL Name=”Dummy — remove task” ConnectionName=”SrcDatabase”>
<DirectInput>select 1</DirectInput>
</ExecuteSQL>

<ExecuteSQL Name=”Dummy 1 — remove task” ConnectionName=”DestDatabase”>
<DirectInput>select 1</DirectInput>
</ExecuteSQL>
</Tasks>
</Container>
</Tasks>

</Package>
<# } #>
</Packages>
</Biml>

After copying the code, right click on the BIML file and select Generate SSIS Packages.

image

Bingo!

Multiple packages with connections managers have been created in a matter of minutes.

image

You just have to delete the sequence container and proceed with package development. Why?  We created connection managers in the BIML script, but since we are not using them anywhere in the package, they won’t be included in the packages. So, I tricked it by creating a dummy task. This is still easier compared to the traditional process of creating multiple packages.

image

Don’t get me wrong. This example is probably the most basic of what BIML has to offer. I’m spending some time lately with BIML and I hope to do more posts as I learn new things.

Reference: http://bimlscript.com/Browse/Snippets

@SamuelVanga

Generate Uniqueidentifier with SSIS

If you are trying to generate Uniqueidentifier/Newid() in SSIS data flow, you will soon realize that there is no out-of-the-box transformation you could use. Fear not. Script Component can be used to create Uniqueidentifier columns.

In the below package, I’ve an OLE DB source. Then, i dragged script component to the data flow. I selected transformation as the script component type and connected green arrow from source to script transformation.

Next, open the script transformation editor. In the Inputs and Outputs page, expand Output 0, select output columns and click add column in the bottom, give it a name, select unique identifier [DT_GUID] as the data type, and click ok. I named mine UniqueId.

SSIS NEWID() Script Add Column data type

In the script page, make sure Microsoft Visual Basic 2008 is selected as the scripting language. I just like working with VB. Click edit script, then copy paste the following script.

' Microsoft SQL Server Integration Services Script Component
' Write scripts using Microsoft Visual Basic 2008.
' ScriptMain is the entry point class of the script.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<microsoft.sqlserver.dts.pipeline.ssisscriptcomponententrypointattribute()> _
<clscompliant(False)> _
Public Class ScriptMain
Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.UniqueId = System.Guid.NewGuid()
End Sub
End Class

 

When you add a destination, UniqueId will be available as a new column in the mappings page.

SSIS UniqueIdentifier Mappins page New Column

@SamuelVanga

Parallel Execution in SSIS with MaxConcurrentExecutables

MaxConcurrentExecutables, a package level property in SSIS determines the number of control flow items that can be executed in parallel. The default value is -1. This is equivalent to number of processors (logical and physical) plus 2.

For example, in the below package running on my machine with 4 processors and MaxConcurrentExecutables = -1, you can see 6 tasks have completed execution and 6 are currently running. It’s executing 6 at a time because 4 processors + 2 = 6 threads.

maxconcurrentexecutables ssis

This applies to all versions of SSIS. Parallelism is powerful when your goal is to complete a process as quickly as possible, specially when the tasks in a control flow are independent of each other.

If you’re thinking of increasing this setting to an infinity hoping to achieve a Nobel prize in performance tuning… slow down. If the words throughput, threading, multi-tasking scares you, you should be careful with this property. In most cases, the default setting can get the job done just fine.

Follow me on Twitter.

@SamuelVanga

Generating Surrogate Keys With SSIS

A surrogate key is an auto generated value, usually integer, in the dimension table. It is made the primary key of the table and is used to join a dimension to a fact table. Among other benefits, surrogate keys allow you to maintain history in a dimension table. Despite of the their popularity, SSIS doesn’t have a built in solution for generating surrogate keys. Let’s take a look at a few alternatives in this post.

First, create the following table. We will import data from Person.Person table (AdventureWorks sample) into this table. Note PersonSK is the surrogate key.
CREATE TABLE [dbo].Person
(
[PersonSK] INT IDENTITY(1,1) NOT NULL
,
[FirstName] NVARCHAR(50) NULL
,
[LastName] NVARCHAR(50) NULL
,
CONSTRAINT PK_PersonSK PRIMARY KEY [PersonSK]
) ;

With IDENTITY()

Drag a data flow task on to the control flow and configure the OLE DB source.

Next, drag OLE DB destination and connect it to the source. Specify the connection manager, and choose Table or view – fast load as the data access mode. This performs a bulk insert and is the fastest of all.

Destination table has three columns, but source has only two columns. In the mappings page, map input and output columns for FirstName and LastName and ignore the mapping for PersonSK.

When you run the package, becasue PersonSK is an identity column in the table, SQL Server will automatically generate values for you. This solution is easy and fast, but sometimes depending on your ETL methodology, you can’t rely on IDENTITY().

With Script Component

I frequently use Script Transformation. The steps are nicely written in this post by Phil Brammer (b).  This is simply a script used as a transformation. The script generates a row number for each row passed through the data flow.

With ROW_NUMBER()

SSIS (data flow) engine generated the new row number when using script transformation. Instead, you can use ROW_NUMBER() when working with SQL Server data source to let the database engine do the work for you. This can be faster.

If you’re doing an incremental load, first find the maximum key value from the destination. I’ll use the following query. It’ll return zero if there were no rows, else it returns the maximum value.

SELECT  ISNULL(MAX(PersonSK),0) SK
FROM    dbo.Person

Add an Execute SQL Task to the control flow and set the result property to single row. Then, add a variable to hold the return value.

Next, connect a data flow task to the execute sql task. We will use the following SQL statement in the OLE DB source editor. In addition to the LastName and FirstName columns, we are using ROW_NUMBER() function to generate a unique number for every row.

SELECT
ROW_NUMBER() OVER(ORDER BY [LastName]) SK
, [LastName]
, [FirstName]
FROM
Person.Person

The query will generate numbers starting from 1 for each row, but while loading to destination we don’t want to start from 1. We want to find the maximum value in the destination and start from the next highest value. So, I’m adding the max value to every row number using parameters in the OLE DB source.


In the OLE DB destination, check the box that says Keep Identity. By do this we are asking SSIS to keep the identity values that are  generated by the source. In the mappings page, you’ll see a new input that was created in the OLE DB source. Map it to PersonSK surrogate key.

Go ahead and run the package. If it’s all good you will see cute little green tick marks like below.

Summary

In this example, we looked at different options to generate surrogate keys while loading dimensions using SSIS. We used IDENTITY() constraint in SQL Server. We talked about Script Component. Finally, we saw making use ROW_NUMBER() function. Last option is twice as fast as using the Script Component with around 20,000 rows and an index on LastName column.

Status Column in SSISDB Catalog Database

I occasionally review the search terms people use to get to this blog and secretly celebrate knowing my post helped them. If i know it didn’t, i add it to the list of items i should write about. This post is a result of one such thing.

Note: If you read my previous post about Logging in SSIS2012, please stop reading here. This doesn’t have any new information. I wrote this one first, but published the Logging post ahead of this and still publishing this to reach Googlers.

I assume you know about the SSISDB catalog database. Several tables/views have a column called [status] showing the status of an execution. It is an integer column with 1 through 9 as the possible values.

Description for these numbers isn’t available anywhere in the database but buried in books online like this.

Here is an example query you can use to translate the status numbers to descriptions.

USE SSISDB ;

SELECT
    
project_name  
     [status description]
    
= CASE [status]
              
WHEN 1 THEN 'created'
              
WHEN 2 THEN 'Running'
              
WHEN 3 THEN 'canceled'
              
WHEN 4 THEN 'failed'
              
WHEN 5 THEN 'pending'
              
WHEN 6 THEN 'ended unexpectedly'
              
WHEN 7 THEN 'succeeded'
              
WHEN 8 THEN 'stopping'
              
WHEN 9 THEN 'completed'
    
END
FROM  
[catalog] .[executions]
WHERE  execution_id = 30032 -- replace with your execution_id
;

 

SSISDB catalog execute status

Another example is with [Catalog].[Executable_Statistics]. Status column in this view shows the execution status of an executable. An executable is any package or a container or a task.

SELECT  e.package_name ,
e.[executable_name] ,
es.[execution_path] ,
es.[execution_duration] ,
[execution_result] = CASE es.[execution_result]
WHEN 0 THEN 'Success'
WHEN 1 THEN 'Failure'
WHEN 2 THEN 'Completion'
WHEN 3 THEN 'Cancelled'
END
FROM   
[catalog].[executables] e
INNER JOIN [catalog].[executable_statistics] es ON es.[executable_id] = e.[executable_id]
WHERE   e.execution_id = 40044

 

@SamuelVanga

T-SQL Tuesday 31: SSIS 2012 Logging

Topic for this month’s T-SQL Tuesday, hosted by Aaron Nelson (@SQLVariant) is Logging.

I like to think of logging as a Flight Data Recorder (FDR) to an aircraft. FDR records data about functioning of an aircraft. No one cares what is in there as long as the aircraft operates safely – but when bad things happen, every investigation boils down to finding and analyzing FDR data. Investigation results are used to understand what went wrong and make improvements to prevent bad things from reoccurring.

What are the similarities between FDR and Logging? No one will never spend time in reviewing logs as long as the software application is running smoothly, but when accidents happen (think performance degrades and errors) – details collected in the logs become vital for debugging the application. Logging process should collect any data that helps in troubleshooting.

Integration Services 2012 Logging

SQL Server Integration Services prior to 2012 had decent ability in implementing logging. Developers have gone above and beyond to extend the built-in functionality to come up with custom solutions, popularly known as custom logging or frameworks.

image

One of the bazillion improvements introduced in SQL Server Integration Services 2012 is Logging. Most of the custom logging we implemented earlier is now built into SSIS project deployment model. As a developer you don’t have to do any thing different.

Example

The below example project has two child packages – one for each table and a parent package executes the child packages.

image

image

Event Handlers are empty for all the three packages. In addition, there are no red arrows (failure precedence constraints, if that’s how you prefer) in any of the control flows. Showing this will take up quite a bit of space here so you’ll have to trust me on.

This means I’m not implementing any error handling or logging techniques in the packages.

Next, I executed the ETL Master package with logging level as Verbose. SSIS provides different logging levels. Matt Masson wrote an article explaining what events are included in different log levels. Verbose log level logs everything.

image

Let’s run a few queries against the SSISDB database to see what’s logged.

Execution

Every execution gets a unique execution_id. The following query retrieves the status and duration of the execution.

SELECT  [project_name] ,
[package_name] ,
[status description] = CASE [status]
WHEN 1 THEN 'created'
WHEN 2 THEN 'Running'
WHEN 3 THEN 'canceled'
WHEN 4 THEN 'failed'
WHEN 5 THEN 'pending'
WHEN 6 THEN 'ended unexpectedly'
WHEN 7 THEN 'succeeded'
WHEN 8 THEN 'stopping'
WHEN 9 THEN 'completed'
END ,
Duration = DATEDIFF(ms, start_time, end_time)
FROM    [catalog].[executions]
WHERE   [execution_id] = 40044

image

Executable and Executable Statistics

Below query gives you duration and status of all the executables.

SELECT  e.package_name ,
e.[executable_name] ,
es.[execution_path] ,
es.[execution_duration] ,
[execution_result] = CASE es.[execution_result]
WHEN 0 THEN 'Success'
WHEN 1 THEN 'Failure'
WHEN 2 THEN 'Completion'
WHEN 3 THEN 'Cancelled'
END
FROM   
[catalog].[executables] e
INNER JOIN [catalog].[executable_statistics] es ON es.[executable_id] = e.[executable_id]
WHERE   e.execution_id = 40044

image

Component Phases

This applies to data flow tasks. See which component phases take up most of the time.

WITH    cte
AS ( SELECT   [package_name] ,
[task_name] ,
[subcomponent_name] ,
[phase] ,
Duration = DATEDIFF(ms, start_time, [end_time])
FROM     [catalog].[execution_component_phases]
WHERE    [execution_id] = 40044
)
SELECT  [package_name] ,
[task_name] ,
[subcomponent_name] ,
[phase] ,
Duration = SUM(Duration)
FROM    cte
GROUP BY [package_name] ,
[task_name] ,
[subcomponent_name] ,
[phase]
ORDER BY [package_name] ,
Duration DESC

image

Execution Data Statistics

How many rows are transferred? How many buffers are used?

SELECT  [package_name] ,
[task_name] ,
[dataflow_path_id_string] ,
rows_sent = SUM([rows_sent]) ,
buffer_count = COUNT(*)
FROM    [catalog].[execution_data_statistics]
WHERE   [execution_id] = 40044
GROUP BY [package_name] ,
[task_name] ,
[dataflow_path_id_string]

image

And the list goes on…

I first read about SSIS Logging in Denali from an excellent article written by Jamie Thomson (b|twitter). Jamie also put together a bunch of reports called SSIS Reporting Pack on CodePlex. They are free and you must use them in conjunction with SSIS 2012.

I’m @SamuelVanga on Twitter.

How to execute a package from another package?

I hear you. You’ll use the Execute Package Task. This mechanism of executing one package from another is popularly knows as Parent Child paradigm. Let me tell you, this isn’t new in 2012. There are, however, a few changes to the Execute Package Task. Let’s take a look.

Demoland!

I added two packages to the project and conveniently named them Parent.dtsx and Child.dtsx.

Parent Child Packages SSIS 2012

Child Package

In the child package, I added a script task and used the following script. This will display a message box and tells that it is executing.

MsgBox(“Howdy! I’m the child package”)

image

Parent Package

In the parent package, I added an Execute Package Task and renamed it to Run Child Package.

image

In the Package page of the Execute Package Task editor, there is a new property called reference type. It is set to Project Reference by default. This means you can point the execute package task to another package within the same project. I selected Child.dtsx from the drop down.

image

The following is the output when I execute the parent package.

image

In prior versions, you’ll have to choose either file system or SQL Server as the source for child package and have connection managers to all the child packages in the parent. For example, if you have one parent package calling 50 child packages, the parent needs 50 connection managers to each of those child packages. This is still supported for legacy packages – change the reference type property to External Reference.

Passing a variable from parent to child

You often pass variables from parent package to the child package; connection strings, parent package start time, parent package name (ok. May be). Again, in the previous versions, you would use a parent package configurations to do this.

In this example, I’ll pass parent package name to the child package.

Another change to the Execute Package Task is parameter binding. You can map a parameter or a variable from the parent package to a parameter in the child package.

In the below image, I created a parameter in the child package. I wrote about SSIS parameters in an earlier post.

image

From the parent package, open the Execute Package Task Editor. In the parameter bindings page, I mapped the child parameter to the parent’s PackageName system variable.

image

Then I changed the script in the child package as follows:

MsgBox(“Howdy! I’m the Child Package. I was called by ” & Dts.Variables(“$Package::ParentName”).Value.ToString)

When I execute the parent package, I see…

image

Zip It

In this post, I looked at using Execute Package Task to call a package from another and pass variable from parent to child.

@SamuelVanga

Change Package Properties Using Parameters–SSIS 2012

In DTS, changing package properties was close to impossible (the less we talk about DTS, the better). Package configurations introduced in 2005 solved most of the problems. In 2012, SQL Server Integration Services introduces parameters to make this process even easier. Parameters allow you to change package properties without the need for configuration files.

In this example, we will create a variable and change it’s value at run time using parameters.

Let’s enter the Demoland!

In a SSIS package, I created a variable called PromptValue. I assigned the value “I’m Variable” to it.

SSIS Parameters 2012

Next, I added a script task to the package, entered PromptValue as a read only variable, and used the following script.

MsgBox(Dts.Variables(“PromptValue”).Value.ToString)

I see the following message box when I execute the package.

SSIS show messgae box

Now, we want to be able to change this value during execution. Here is where we would create a package configuration and add this variable to it, so we can simply change the configuration file without editing the package.

But this time we will use parameters. I added a parameter called ParmPromptValue and assigned “I’m Parameter” as the value. Parameters are of two types: Package and Project. Package parameters can be used within the package. Project parameters can be used by all the packages in a project.

I created a package parameter.

ssis package parameters

Next, we’ve to pass the value of this parameter to the variable we created earlier. We will use expressions to do that. We know we can use other variables in the expressions builder. In addition to that, we can use parameters. I dragged the package parameter @[$Package::ParmPromptValue] to the expression window.

image

What should we see when I execute the package? Should it be [I’m Variable] or should it be [I’m Parameter]?

Since we used an expression to the variable, the value of the parameter overwrites the value of the variable.

image

We’ve so far executed the package from Visual Studio. In the below image, I deployed the package to SSIS catalog and assigned a different value to the parameter. More about deployment in the next post.

image

This is what I see when I run package from SSIS Catalog (Server).

image

Why in the world would you want to change the value like in this example, you might ask. I agree. This is a fictitious example, but think of connection strings, database names, user name/passwords that you can’t live without changing when moving packages through environments.

I’ve provided an introduction to parameters in this post. Be sure to read the following for more information. Parameters in Denali by Jamie Thomson and this from TechNet Wiki.

Resources

Download the sample package from here.

~Sam.

Follow Sam on Twitter – @SamuelVanga

Why Isn’t There A Deployment Server Edition For SSIS Projects?

Andy Leonard blogged about a gotcha when creating SSIS projects using SSDT yesterday. He showed that you can use any component in SSDT whether or not your SQL Server supports that component. For example, you can create a package with Fuzzy Lookup and deploy to a server running Business Intelligence edition (Fuzzy Lookup isn’t supported on this edition), you won’t even be warned until the package bombs when executed from command line or SQL Agent.

Rafi Asraf made a comment on that post about how it plays well with SSAS projects. I’ll try to elaborate that comment here.

There is a property called Deployment Server Edition for SSAS projects.

SSAS Multidimensional.

DeploymentEdition_SSASMulti

SSAS Tabular.

DeploymentEdition_Version

I selected Standard edition. Perspectives aren’t supported in this edition. When I try to create them, SSDT shows a warning message. This is friendly.

image

Now, why a similar property isn’t available for SSIS projects? That, my friend, is beyond the scope of my brain. Adding it will definitely save a lot of time and reduce confusion for developers.

~Sam.

Follow Sam on Twitter – @SamuelVanga

Execute T-SQL Statement Task

I spotted a new task in SSIS 2012Execute T-SQL Statement Task. It executes T-SQL statements, only Transact-SQL statements against SQL Server databases. Cool, but don’t get too excited. Why? Because you can neither parameterize the SQL nor save the results to a variable.

image

The task editor looks like below:

image

Only plain and static scripts can be executed, for which Execute SQL Task can be used anyway. So, I don’t see a specific need for this task and I don’t plan on using it in the near future. Do you agree?

~Sam.

Update: Please note, this task isn’t new in 2012. It existed in previous versions. I apologize for the incorrect information. Thanks to Devin Knight (b|t) for correcting me.