Inserting Records and Getting Identity in SSIS

Hi ,

When one of my friend asked about how can we catch a Identity value after inserting the rows in to table  having a Identity column and using that result in the same package

well there is no out of box solution in SSIS  once the data is fed in to the OLEDB destination or SQL server destination it becomes a impossible to find inserted Identity values (Unless a trigger is used on the destination table) for these components outputs are ERROR redirection, which we can not use

Problem :

During the Loading phase of ETL it is so common to have identity columns , but it becomes to tough to find the Identity values after inserting and using them in the same Data flow

Solutions :

There are many work around solutions to perform this which I will list them , I will try to demonstrate the one way which I like

1) Using trigger on the destination table catching the inserted values in another table

2) Using the OLEDB command and a stored procedure which uses Output clause to find all Identity values or we can use Identity_scope (problem: dataflow will be terminated which makes the procedure to be the control flow of data (which makes no sense of using SSIS))

Famous MVP Todd Mcdermid  tries to explain the same problem in his view point please take a look

http://toddmcdermid.blogspot.com/2011/08/inserting-records-and-getting-identity.html

3)add a unique number to a comment field to find the rows just inserted and in other data flow task use the unique number to process them as required

As it is a RBAR(row by agonizing row) which is time consuming and it degrades the package performance when the number of rows are more

4)  Using the Record-set destination and taking the rows back to the control flow , and using the for each loop enumerator and catch the inserted Identity columns and use the values for further processing in SSIS

lets look at the example

In this session we will also learn how to use  Recordset destination

How to perform cusrsor action in Dataflow

how to use Foreach loop (Foreach ADO enumerator)

how to transfer data between control flow and dataflow

so looks like nice information session on SSIS so please stay with me

Parallelism in ETL Process :: SSIS 2008 and SSIS 2012

Hi, Today we will talk about achieving  parallelism in ETL using SSIS 2008 and 2012 We can generalize this topic two ways is by
 
1) Using maximum concurrent executable s in SSIS control flow and data flow( easy way to achieve parallelism and not so dynamic everythng should be configured in design)
 
2)Partitioning the source and destination ( little bit complicated but most power full way to achieve parallelism, everything is dynamic)
    1) There is a setting in the SSIS which allows the user to specify the maximum number of executable’s it can execute in a given package Max Concurrent executable
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2)Partitioning the source and destination ( little bit complicated but most power full way to achieve parallelism, everything is dynamic) 
In order achieve parallelism in ETL
 
1) Source and destination should be partitioned according the Data load
 
2) Entire work load should be divided to individual tasks
3) Each task should be sub divided into multiple tasks
4) Single unit of task should not not depend up on the other task(dead locks, I/O waits..)
 
5) before using the maximum threshold CPU for parallelism proper testing should be done, so there are always resources available for system processes Some times Parallelism may degrade the performance
 
6) If work load is divided in to to many small tasks which may also degrade the performance (ex: Time spent by the Worker thread to wait for child threads may exceed single  execution of the process ) Declare variables  Conclusion: Parallelism can be save lot of time in ETL process if utilized properly

SSIS package logging – Custom logging 2008 R2 and 2012

Logging is an important part of SSIS package development , I personaly prefer to have logging in all production level packages

Logging can be used as more than debugging errors in a scheduled package execution environment but also performs the audit functions it needs and we can also measure

performance, Out of the box SSIS 2008 and SSIS 2012 have some nice logging capabilities

The following table describes the predefined events that can be enabled to write log entries when run-time events occur. These log entries apply to executables, the package, and the tasks and containers that the package includes. The name of the log entry is the same as the name of the run-time event that was raised and caused the log entry to be written.

Events Description
OnError Writes a log entry when an error occurs.
OnExecStatusChanged Writes a log entry when the execution status of the executable changes.
OnInformation Writes a log entry during the validation and execution of an executable to report information.
OnPostExecute Writes a log entry immediately after the executable has finished running.
OnPostValidate Writes a log entry when the validation of the executable finishes.
OnPreExecute Writes a log entry immediately before the executable runs.
OnPreValidate Writes a log entry when the validation of the executable starts.
OnProgress Writes a log entry when measurable progress is made by the executable.
OnQueryCancel Writes a log entry at any juncture in the task processing where it is feasible to cancel execution.
OnTaskFailed Writes a log entry when a task fails.
OnVariableValueChanged Writes a log entry when the value of a variable changes.
OnWarning Writes a log entry when a warning occurs.
PipelineComponentTime For each data flow component, writes a log entry for each phase of validation and execution. The log entry specifies the processing time for each phase.
Diagnostic Writes a log entry that provides diagnostic information.For example, you can log a message before and after every call to an external data provider

there are some custom logging messages to some of the control flow tasks and data flow tasks

Data Flow Task

The following table lists the custom log entries for the Data Flow task.

Log entry Description
BufferSizeTuning Indicates that the Data Flow task changed the size of the buffer. The log entry describes the reasons for the size change and lists the temporary new buffer size.
OnPipelinePostEndOfRowset Denotes that a component has been given its end-of-rowset signal, which is set by the last call of the ProcessInput method. An entry is written for each component in the data flow that processes input. The entry includes the name of the component.
OnPipelinePostPrimeOutput Indicates that the component has completed its last call to the PrimeOutput method. Depending on the data flow, multiple log entries may be written. If the component is a source, this means that the component has finished processing rows.
OnPipelinePreEndOfRowset Indicates that a component is about to receive its end-of-rowset signal, which is set by the last call of the ProcessInput method. An entry is written for each component in the data flow that processes input. The entry includes the name of the component.
OnPipelinePrePrimeOutput Indicates that the component is about to receive its call from the PrimeOutput method. Depending on the data flow, multiple log entries may be written.
OnPipelineRowsSent Reports the number of rows provided to a component input by a call to the ProcessInput method. The log entry includes the component name.
PipelineBufferLeak Provides information about any component that kept buffers alive after the buffer manager goes away. This means that buffers resources were not released and may cause memory leaks. The log entry provides the name of the component and the ID of the buffer.
PipelineExecutionPlan Reports the execution plan of the data flow. It provides information about how buffers will be sent to components. This information, in combination with the PipelineExecutionTrees entry, describes what is occurring in the task.
PipelineExecutionTrees Reports the execution trees of the layout in the data flow. The scheduler of the data flow engine use the trees to build the execution plan of the data flow.
PipelineInitialization Provides initialization information about the task. This information includes the directories to use for temporary storage of BLOB data, the default buffer size, and the number of rows in a buffer. Depending on the configuration of the Data Flow task, multiple log entries may be written.

for example in a data flow task you can also log some performance counters to measure the buffer allocation and buffer leakage during the transformations

to know more about custom logging messages in SSIS click the link below

http://msdn.microsoft.com/en-us/library/ms345174.aspx

Why do we need custom logging using event handlers?

Some times it becomes overhead by using the default log providers and it often ends up using the event handlers and system variables to log some key information which some times difficult to analyze using default logging

in order to perform the logging using event handlers we need to better understand the system variables and information they can provide


The following table describes the system variables that Integration Services provides for packages.

System variable Data type Description
CancelEvent Int32 The handle to a Windows Event object that the task can signal to indicate that the task should stop running.
CreationDate DateTime The date that the package was created.
CreatorComputerName String The computer on which the package was created.
CreatorName String The name of the person who built the package.
ExecutionInstanceGUID String The unique identifier of the executing instance of a package.
InteractiveMode Boolean Indicates whether the package is run in interactive mode. If a package is running in SSIS Designer, this property is set to True. If a package is running using the DTExec command prompt utility, the property is set toFalse.
LocaleId Int32 The locale that the package uses.
MachineName String The name of the computer on which the package is running.
OfflineMode Boolean Indicates whether the package is in offline mode. Offline mode does not acquire connections to data sources.
PackageID String The unique identifier of the package.
PackageName String The name of the package.
StartTime DateTime The time that the package started to run.
UserName String The account of the user who started the package. The user name is qualified by the domain name.
VersionBuild Int32 The package version.
VersionComment String Comments about the package version.
VersionGUID String The unique identifier of the version.
VersionMajor Int32 The major version of the package.
VersionMinor Int32 The minor version of the package.

please use the below link to better understand the system variables

http://msdn.microsoft.com/en-us/library/ms141788(v=SQL.105).aspx

lets do some  custom logging using event handlers for a simple package

before digging into more details lets create the table to log the details

as

CREATE TABLE [DBO].[PACKAGELOG](
[LOGID] [INT] IDENTITY(1,1) NOT NULL,
[EXECUTIONID] [UNIQUEIDENTIFIER] NULL,
[PACKAGENAME] [NVARCHAR](255) NULL,
[PACKAGEID] [VARCHAR](50) NULL,
[USERNAME] [NVARCHAR](100) NULL,
[MACHINENAME] [NVARCHAR](255) NULL,
[STARTDATETIME] [DATETIME] NULL,
[ENDDATETIME] [DATETIME] NULL,
[SOURCENAME] [VARCHAR](255) NULL,
[MESSAGE] [NVARCHAR](MAX) NULL,
[MESSAGECODE] [INT] NULL,
[LOGDATETIME] [DATETIME] NULL,
[INTERACTIVEMODE] [BIT] NULL,
[SOURCEDESCRIPTION] [VARCHAR](255) NULL,
[TASKNAME] [VARCHAR](255) NULL,
[TASKID] [NVARCHAR](255) NULL,
[EXECSTATUS] [BIT] NULL,
[SERVERNAME] [NVARCHAR](255) NULL,
CONSTRAINT [PK__PACKAGEL__5E5499A820C44329] PRIMARY KEY CLUSTERED
(
[LOGID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [DBO].[PACKAGELOG] ADD CONSTRAINT [DF_PACKAGELOG_EXECSTATUS] DEFAULT ((0)) FOR [EXECSTATUS]
GO

Note : there are some noted issues while using the system:: start time system variable and mapping to any SSIS date time data type

The reason is the value of the system start time looks like  this SYSTEM::STARTTIME  {18/10/2010 14:30:27}

which can not be mapped to any SSIS date time data types we will be getting a error

“The type is not supported.DBTYPE_DBDATE”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly

To avoid this error we have a work around is to create a variable with string and use a expression and setting the variable value to be evaluate as expression and get the parsed System start time from the variable and map that variable to the execute SQL task

as

(DT_WSTR, 4)YEAR(@[System::StartTime])

+ “-” +
RIGHT(“0” + (DT_WSTR, 2)MONTH(@[System::StartTime]), 2)

+ “-” +

RIGHT(“0″ + (DT_WSTR, 2)DAY(@[System::StartTime]), 2)

+ ” ” +

RIGHT(“0” + (DT_WSTR, 2)DATEPART(“Hour”, @[System::StartTime]), 2)

+ “:” +

RIGHT(“0” + (DT_WSTR, 2)DATEPART(“Minute”, @[System::StartTime]), 2)

+ “:” +

RIGHT(“0” + (DT_WSTR, 2)DATEPART(“Second”, @[System::StartTime]), 2)

+ “.” +

RIGHT(“00” + (DT_WSTR, 3)DATEPART(“Millisecond”,@[System::StartTime]), 3)

insert into dbo.packagelog (ExecutionID,packagename,packageid,username,machinename,interactivemode,sourcename,taskname,taskid,Message,MessageCode,execstatus,servername,startdatetime,enddatetime)
values (?,?,?,?,?,?,?,?,?,?,?,((0)),@@SERVERNAME,?,getdate())

as

log

success log

there are some nice free third party tools to dig the information from the log graphically like log analyzer

http://ssisloganalyzer.codeplex.com/

SSIS Script task and Components (Tool Kit) using – C#

With the introduction of C#, and the embedding of the new Visual Studio Tools for Applications into SSIS, you can ’ t think of using the Script Task and Script Component as scripting anymore; now it ’ s all – out programming. Typically, you ’ d code logic into these Script Tasks to control the
execution and logic flow within a package or to perform some specialized business validation.

The two Scripting Components provide access into a new scripting development environment using Microsoft Visual Studio Tools for Applications (VSTA). This change finally allows us to script logic into packages using Visual Basic 2008 or Visual C# 2008 .NET code.

It is not necessary to know the C# universe in order to write valuable C# script tasks and components . Keep in mind that it is being used as a scripting language, not a full-fledged application development language.

The ability to perform the following basic operations will allow you to do quite a lot.

  • Access SSIS variables
  • Modifying SSIS variables
  • File validation Task (File properties task)
  • Read/write ASCII files
  • Parse strings with the split function
  • Create and loop through list arrays
  • Copy, move and delete files
  • Capture a listing of specified files in a sub directory
  • Create a database connections to SQL Server
  • Execute T-SQL queries and stored procedures and capture any results

Our first task is  accessing  variables

  1. Access SSIS variables

create a variable

drag and drop a script task in control flow

Double click on the variables and use the below shown settings

click on edit script ,once you click the edit script button a window pops pup (Microsoft Visual Studio Tools for Applications (VSTA))

Result

as

Code:

string str = null;

Dts.VariableDispenser.LockForRead(“User::user”);

str = (string) Dts.Variables[“user”].Value;

MessageBox.Show(str);
Dts.TaskResult = (int)ScriptResults.Success;

2. Modifying variables in SSIS using script task

click on the script task and remove the variable “user” as read only variable and assign it to the read write variable

Modifying the variable

string str = null;

Dts.VariableDispenser.LockForWrite(“User::user”);

str = (string)Dts.Variables[“user”].Value;

MessageBox.Show(str,”variable value before changing “);

Dts.Variables[“user”].Value = “Sql junkie share”;

str = (string)Dts.Variables[“user”].Value;

MessageBox.Show(str,”variable value after changing “);

3. File Validation Task 

In any of the ETL processes when dealing with flat files, some times there will be a need to check the existence of the file so lets create a simple file validation task using C#

create a  variable “file exits” in  control flow and assign default value as False

as

Click the button Edit Script

Script

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
namespace ST_ad163f48be91459c9c185a38bccd8436.csproj
{
    [System.AddIn.AddIn(“ScriptMain”, Version = “1.0”, Publisher = “”, Description = “”)]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
        public void Main()
        {
            string filelocation = “D:\\New Text Document.txt”;
            if (File.Exists(filelocation) == true)
            {
                Dts.Variables[“fileexists”].Value = true;
            }
            else
            {
                Dts.Variables[“fileexists”].Value = false;
            }
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

File exists

File did not exist

Success

Script Component

  1. Read/write ASCII files
  2. Parse strings with the split function

we will use the earlier package , in this section  we will learn how to create a flat file source connection using script component and read the contents of the flat file and push the columns into the data flow task and  parse the contents of the text using split function

Replace the file exists part with a Data flow task , in the data flow task drag and drop a script component , click on the script component and select the component type as Source

and the flat file we will be dealing with here  looks like this

1,john,apt#3 – California USA\
2,tom,st 112 ca;lifornia USA

Task 

  • Read the flat file
  • Separate the columns which are delimited by ‘,’
  • and parse the address column and remove the special characters (apt#3 – California USA\) to (apt 3  California USA)

back to the source component

and in the script section select the scripting language as C# and click the button edit script

Code:
using System;
using System.Data;
using System.IO;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Text;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

public override void CreateNewOutputRows()
{

StreamReader reader = new StreamReader(“D:\\New Text Document.txt”);

string line;

while ((line = reader.ReadLine()) != null)
{
string[] columns = line.Split(‘,’);

this.Output0Buffer.AddRow();
if (columns.Length > 0)
{
Output0Buffer.id = (columns[0]);
}
if (columns.Length > 1)
{
Output0Buffer.name = (columns[1]);
}
if (columns.Length > 2)
{

string[] split = columns[2].Split(new Char[] { ‘#’, ‘,’, ‘.’, ‘:’, ‘-‘, ‘\\’, ‘;’ });

Output0Buffer.address = ConvertStringArrayToString(split);

}
}
}
static string ConvertStringArrayToString(string[] array)
{
//
// Concatenate all the elements into a StringBuilder.
//
StringBuilder builder = new StringBuilder();
foreach (string value in array)
{
builder.Append(value);
}
return builder.ToString();
}
}

Output

CDC – Change Data Capture (zero cost ETL solution)

 

What is CDC ? 

Change data capture records insert, update, and delete activity that is applied to a SQL Server table

How it does ?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • The source of change data for change data capture is the SQL Server transaction log.
  • As inserts, updates, and deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log serves as input to the change data capture capture process.
  • This reads the log and adds information about changes to the tracked table’s associated change table.
  • Functions are provided to enumerate the changes that appear in the change tables over a specified range, returning the information in the form of a filtered result set.

Where it is used ?

A good example of a data consumer that is targeted by this technology is an extraction, transformation, and loading (ETL) application. An ETL application incrementally      loads change data from SQL Server source tables to a data warehouse or data mart.

we can  eliminate the use the of after update/delete/insert trigger

Note:Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.

Advantages of using CDC :

  • Minimal impact on the database (even more so if one uses log shipping to process the logs on a dedicated host).
  • No need for programmatic changes to the applications that use the database.
  • Low latency in acquiring changes.
  • Transactional integrity: log scanning can produce a change stream that replays the original transactions in the order they were committed. Such a change stream include changes made to all tables participating in the captured transaction.
  • No need to change the database schema

is CDC (Change Data Capture)  available in other off the shelf products)?

products perform change data capture using database transaction log files. These include:

  • Attunity Stream
  • Centerprise Data Integrator from Astera
  • DatabaseSync from WisdomForce
  • GoldenGate Transactional Data Integration
  • HVR from HVR Software
  • DBMoto from HiT Software
  • Shadowbase from Gravic
  • IBM InfoSphere Change Data Capture (previously DataMirror Transformation Server)
  • Informatica PowerExchange CDC Option (previously Striva)
  • Oracle Streams
  • Oracle Data Guard
  • Replicate1 from Vision Solutions
  • SharePlex from Quest Software
  • FlexCDC, part of Flexviews for MySQL

How to implement CDC ?

Lets get into the details of how to implement the CDC in SQL server 2008

Before enabling the cdc on the tables we need to enable the cdc on the Database.

To determine weather the database is enabled or not check the sys.databases table

SELECT NAME,IS_CDC_ENABLED 

FROM SYS.DATABASES

WHERE NAME = 'SQLJUNKIESHARE'

Use the following system stored procedure to enable CDC on a database

USE SQLJUNKIESHARE

EXEC SYS.SP_CDC_ENABLE_DB
GO

after CDC is enabled on the database above listed tables are created in system tables

  • cdc.captured_columns Returns the columns tracked for a specific capture instance.
  • cdc.change_tables Returns tables created when CDC is enabled for a table. Use sys.sp_cdc_help_change_data_capture to query this informationrather than query this table directly.
  • cdc.ddl_history Returns rows for each DDL change made to the table, once CDE is enabled. Use sys.sp_cdc_get_ddl_history instead of querying this table directly.
  • cdc.index_columns Returns index columns associated with the CDC-enabled table. Query
  • sys.sp_cdc_help_change_data_capture to retrieve this information rather than querying this table directly.
  • cdc.lsn_time_mapping Helps you map the log sequence number to transaction begin and end times. Again, avoid querying the table directly, and instead use the functions sys.fn_cdc_map_lsn_to_time and sys.fn_cdc_map_time_to_lsn.

after enabling CDC on database lets enable CDC on table

Now that Change Data Capture is enabled, I can proceed with capturing changes for tables in
the database by using the sys.sp_cdc_enable_table system stored procedure. The parameters of
this stored procedure are described below
sp_cdc_enable_table Parameters
Parameter Description

  • @source_schema       This parameter defines the schema of the object.
  • @source_name           This parameter specifies the table name.
  • @role_name                This option allows you to select the name of the user-defined role that will have permissions to access the CDC data.
  • @capture_instance     You can designate up to two capture instances for a single table. This comes in handy if you plan on altering the schema of a table already                                                 captured by CDC. You can alter the schema without affecting theoriginal CDC (unless it is a data type change), create a new capture instance,                                        track changes in two tables, and then drop the original  capture instance once you are sure the new schema capture fits your requirements. If                                        you don’t designate the name, the default value is schema_source.
  • @supports_net_changes     When enabled, this option allows you to show just the latest change to the data within the LSN range selected. This option requires a                                                      primary key be defined on the table. If no primary key is defined, you can alsodesignate a unique key in the @index_name option.
  • @index_name             This parameter allows you to designate the unique key on the table to be used by CDC if a primary key doesn’t exist.
  • @captured_column_list If you aren’t interested in tracking all column changes, this option allows you to narrow down the list.
  • @filegroup_name       This option allows you to designate where the CDC data will be stored. For very large data sets, isolation on a separate filegroup may yield                                              better manageability and performance.
  • @partition_switch       This parameter takes a TRUE or FALSE value designating whether or not a ALTER TABLE…SWITCH PARTITION command will be                                                    allowed against the  CDC table (default is FALSE)here how its like before enabling the cdc on a particular4when we set the @supports_net_changes to 1 either we need to have a primary key or we need to specify a non clustered index name
    after making those changes when we execute the stored procedure result will beJob ‘cdc.sqljunkieshare_capture’ started successfully.
    Job ‘cdc.sqljunkieshare_cleanup’ started successfully.

    Capture and Cleanup Jobs

    Besides the Change Data Capture tables and query functions that have been created in our example, two SQL Server Agent jobs are created: a Capture and a Cleanup Job.

    The Capture job generally runs continuously and is used to move changed data to the CDC tables from the transaction log.

    The Cleanup job runs on a scheduled basis to remove older data from the CDC tables so that they don’t get too large. By default, data older than three days is automatically removed from CDC tables by this job.

    we can also validate the settings of your newly configured capture instance using the
    sys.sp_cdc_help_change_data_capture stored procedure:
    EXEC sys.sp_cdc_help_change_data_capture ‘dbo’, ‘EMPLOYEE’        

  • The commit LSN both identifies changes that were committed within the same transaction, and orders those transactions.
  • The column __$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change.
  • The column __$seqval can be used to order more changes that occur in the same transaction.
  • The column __$operation records the operation that is associated with the change:
  • 1 = delete,
  • 2 = insert,
  • 3 = update (before image)
  • 4 = update (after image).
  • The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns.
Before going in to more details  lets look at some CDC functions

cdc.fn_cdc_get_all_changes_capture_instance ( from_lsn , to_lsn , ‘<row_filter_option>’ )

<row_filter_option> ::= { all | all update old }

it is prefreble to use cdc function to query the change tables

DECLARE @from_lsn binary(10), @to_lsn binary(10)
SET @from_lsn =
 sys.fn_cdc_get_min_lsn('dbo_EMPLOYEE')
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_EMPLOYEE (@from_lsn, @to_lsn, 'all');
 
<row_filter_option> ::= { all | all update old } 
Option 

‘all’ is used to filter the result set so that to get all the changes on that table or capture instance but one row for each change

(i.e one row for insert ,one row for delete one row for update)

‘all update old’ is used to filter the result set so that to get all the changes on that table or capture instance with two rows for update
(i.e one containing the values of the captured columns before the update and another containing the values of the captured columns after the update).
another important function
cdc.fn_cdc_get_net_changes_capture_instance ( from_lsn , to_lsn , ‘<row_filter_option>’ )
<row_filter_option> ::= { all | all with mask | all with merge }
You might have confused what is the difference between cdc.fn_cdc_get_net_changes_capture_instance and cdc.fn_cdc_get_all_changes_capture_instance  when to use net changes and all changes
 cdc.fn_cdc_get_net_changes_capture_instance  returns one net change row for each source row changed within the specified LSN range. That is, when a source row has multiple changes during the LSN range, a single row that reflects the final content of the row is returned by the function. For example, if a transaction inserts a row in the source table and a subsequent transaction within the LSN range updates one or more columns in that row, the function returns only one row, which includes the updated column values.
lets say we need to get all the net changes in the source the table during a period of 24 hours
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = GETDATE() -1;
-- DML statements to produce changes in the DBO.Employee table.

-- Obtain the end of the time interval.
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

-- Return the net changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn, @to_lsn, 'all');
 
a new function we used above is sys.fn_cdc_map_time_to_lsn() ,function returns the LSN when supplied with time and relational operator

sys.fn_cdc_map_time_to_lsn ( ‘<relational_operator>’, tracking_time )

<relational_operator>::= { largest less than | largest less than or equal | smallest greater than | smallest greater than or equal }

please use the below link to know some more  CDC functions

http://msdn.microsoft.com/en-us/library/bb510744.aspx

now lets create a  dimension table which servers as a destination table

CREATE TABLE DBO.DIMEMPLOYEE
(ID INT IDENTITY (1,1) PRIMARY KEY, COUSTEMERID INT UNIQUE ,FIRST_NAME VARCHAR(50),LAST_NAME VARCHAR(50))

INSERT INTO DBO.DIMEMPLOYEE (COUSTEMERID,FIRST_NAME,LAST_NAME)

SELECT ID,FIRST_NAME,LAST_NAME FROM EMPLOYEE

now lets update some rows and delete some rows  in the dbo.employee table 

delete from dbo.employee where id between 50 and 100
update dbo.employee
set last_name = 'no lastname'
where id between 150 and 200 

now lets use the below query to get the changes in the table after updating and deleting 

DECLARE @begin_time datetime, @end_time datetime,
 @from_lsn binary(10), @to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = DATEADD(HH,-1,GETDATE());
-- DML statements to produce changes in the Db.employee  table.

-- Obtain the end of the time interval.
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal'
, @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal'
, @end_time);

-- Return the net changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn, @to_lsn, 'all');
Now lets create a simple package in SSIS which query’s these functions and updates the dimension table
Use the below query in the OLEDB source  , we can also query the change tables directly
SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE
(sys.fn_cdc_map_time_to_lsn
('smallest greater than or equal'
, DATEADD(HH,-1,GETDATE()))
, sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE()), 'all')

write simple update and delete queries in and map the id to coustemerid column in DBO.DIMEMPLOYEE table
Using CDC with out effecting the source table we were able to log those changes and using simple package we updated the data warehouse environment

Executing SSIS packages from a stored procedure

hi,

lets learn how to execute SSIS packages from a stored procedure

This can be done in two ways

a) Creating a job and calling the job from procedure

    b)Using Xp_cmdshell Extended stored procedure

Lets look at the first option

Step 1 : Create job in Sql server

 Job name : SSIS

go steps  on the left side of  job menu

Create new step

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Select the type and account to run the job

select the package location

use the simple procedure below

CREATE  PROCEDURE SSIS_EXE (@JOB_NAME VARCHAR(100))

AS

BEGIN 

EXEC MSDB..SP_START_JOB @JOB_NAME

END 

EXEC SSIS_EXE 'SSIS'
Job 'SSIS' started successfully.
 Go to job activity monitor and refresh depending upon the size of package wait for the success message


or user the below query to find the job status
SELECT S.NAME,H.JOB_ID,H.MESSAGE 

FROM MSDB..SYSJOBHISTORY H 

INNER JOIN MSDB..SYSJOBS S ON H.JOB_ID = S.JOB_ID 

WHERE S.NAME = 'SSIS'



Using Xp_cmdshell system stored procedure:
This is the best and the easy way 

we pass the parameter as the package location, package path, and server name to the 
stored procedure Before jumping in to the code let's make sure  the 
XP_Cmdshell extended procedure is turned on you machine 

Use the below mentioned procedure to turn on XP_CMDSHELL

-- To allow advanced options to be changed.
EXEC sp_configure 'show advanced options', 1
GO
-- To update the currently configured value for advanced options.
RECONFIGURE
GO

-- To enable the feature.
EXEC sp_configure 'xp_cmdshell', 1
GO
-- To update the currently configured value for this feature.
RECONFIGURE
GO
 
and here's the procedure

CREATE PROCEDURE SSIS_XP 
(@LOCATION VARCHAR(1000),@PATH VARCHAR(100),@SERVERNAME VARCHAR(40))

AS

DECLARE @CMD VARCHAR(1000)

BEGIN

IF @LOCATION = 'FILESYSTEM'

 IF(@SERVERNAME IS NULL OR PATINDEX('%[a-z]%',@SERVERNAME)= 0 )

 PRINT 'PLEASE PROVIDE VALID SERVER NAME'

 ELSE 

    BEGIN

           SET @CMD = 'DTEXEC /DTS "\FILE SYSTEM\'+@PATH +'" /SERVER "'+ @SERVERNAME 
                        + '" /CHECKPOINTING OFF  /REPORTING V'

    IF @LOCATION = 'MSDB' 

       IF(@SERVERNAME IS NULL OR PATINDEX('%[A-Z]%',@SERVERNAME)= 0 )

 PRINT 'PLEASE PROVIDE VALID SERVER NAME'

 ELSE 

   BEGIN

           SET @CMD = @CMD + 'DTEXEC /DTS "\MSDB'+@PATH +'" /SERVER "'+ @SERVERNAME
                      + '" /CHECKPOINTING OFF  /REPORTING V'

IF @LOCATION = 'FILE'  

 BEGIN

 SET @CMD = @CMD + 'DTEXEC /FILE'+@PATH +'.DTSX" /CHECKPOINTING OFF  /REPORTING V'

END

END

END

ELSE

BEGIN

PRINT 'PLEASE SPECIFY CORRECT PACKAGE LOCATION'

END 

--

EXEC MASTER..XP_CMDSHELL  @CMD

END

how to use
 EXEC SSIS_XP 'FILESYSTEM' ,'PACKAGES\PACKAGE' ,'--servername--' 

or
 EXEC SSIS_XP 'MSDB' ,'SSIS\PACKAGE1' ,'--servername--'
or

EXEC SSIS_XP 'FILE' ,'C:\Programfiles\mydocuments\SSIS\PACKAGE1' ,'--servername--' 


 
We can still have lot parameters like reporting , configurations, variable values 
to make it complicated

 Data driven package execution 

A) CURSOR

 Create a table 

CREATE TABLE [dbo].[PACKAGES](
	[PATH] [varchar](100) NULL,
	[LOCATION] [varchar](1000) NULL,
	[SERVERNAME] [varchar](40) NULL
) ON [PRIMARY]

Insert the records as required by the packages
crate a cursor to loop through the table for packages
DECLARE @PATH VARCHAR(1000)

DECLARE @LOCATION VARCHAR(50)

DECLARE @SERVERNAME VARCHAR(40) 

DECLARE DATA_PACKAGE CURSOR

FOR SELECT [PATH],LOCATION,SERVERNAME  FROM DBO.PACKAGES

OPEN DATA_PACKAGE

 FETCH NEXT FROM DATA_PACKAGE 

 INTO @PATH,@LOCATION,@SERVERNAME

WHILE @@FETCH_STATUS = 0 

BEGIN

 EXEC SSIS_XP @LOCATION,@PATH,@SERVERNAME

 FETCH NEXT FROM DATA_PACKAGE 

  INTO @PATH,@LOCATION,@SERVERNAME

END

 CLOSE DATA_PACKAGE

 DEALLOCATE DATA_PACKAGE

B) we can create a simple package to do above cursor action in SSIS if you don't 
like cursor

 step1) drag and drop a data flow task in control flow


and a OLEDB source and OLEDB command in data flow

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Map the parameters to the procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
any questions please comment below
 
 

How to load a variable value into a Execute SQL Task

Lets take a look at  a simple example of how to load a variable value into execute sql task in the control flow

We  need to update a table called Employee in the database according to the value supplied to the variable

so lets create  a  variable called ID  , Data type Int32

lets drag and drop a execute sql task into control flow and make the necessary connections

Query we used in the sql task is a simple  update query, in order to map the ID value of the variable to the id value in the where clause of the update query we use a question

once we written the query now we need to do the parameter mapping for the variable

To the left side of the Execute SQL Task there is a option called parameter mapping once we click it it will show the mapping list which will be empty

Click the add button

Browse the available parameters in the parameter name  and select the variable  User:ID

and now lets execute the package

lets see the result

That’s it! for today Thank you

Variables in Integration Services

let’s  take a look at the simple definition of variables in SSIS

Variables store values which can be used  to update tasks in control flow , data flow and in event handlers,   dynamically during the run time of the package

Types of Variables

1) System variables (system defined)

2) User defined Variables

How they are accessed(scope)  :  system variables can be accessed from any task(if it allows) in the package and these can only be read

User Defined Variables :    are created by the users  and data types include    string, int16, int32, int64, double,  dbnull, object, single,  Boolean, char, Uint32, Uint64, datetime, sbyte

how they accessed in the package is kind of different from system defined variables and now we can start using the word called scope

if a variable is created in data flow  (scope)  it can not be accessed from control flow

so lets consider different scopes here

Package level :  a variable created with out highlighting any of the task in  control flow

Task level : a variable created with highlighting any of the tasks in control flow

Data flow :  variable  created in data flow by selecting or not selecting any tasks are transformations in data flow

Important : variables are case sensitive

a variable as Var1 is not same as var1