System Functions – $partition

Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function.


[ database_name. ] $PARTITION.partition_function_name(expression)

It returns a Partition number of column value when mapped to a partition function

this will be useful if you want find the rows in a particular partition

so lets explain this with a simple example


create partition function












now lets create a partitioned table






insert some sample data

now lets see some examples on $partition function


How to use Import column and Export column in SSIS 2008,2012


in this demo we will learn what are Import Column and Export Column Transformations in SSIS

Import Column :

MSDN says:

The Import Column transformation reads data from files and adds the data to columns in a data flow. Using this transformation, a package can add text and images stored in separate files to a data flow

The data type of the output column must be DT_TEXT, DT_NTEXT, or DT_IMAGE.

So Import Column transformation reads a file from the file system any file (.jpeg,.doc,.pdf….) converts to into text stream and adds to the data flow a column ,

Export Column Transformation:

Export column does in exactly reverse to the Import column transformation

he Export Column transformation reads data in a data flow and inserts the data into a file. For example, if the data flow contains product information, such as a picture of each product, you could use the Export Column transformation to save the images to files.

You can configure the Export Column transformation in the following ways:

  • Specify the data columns and the columns that contain the path of files to which to write the data.
  • Specify whether the data-insertion operation appends or truncates existing files.
  • Specify whether a byte-order mark (BOM) is written to the file.Lets Understand both transformations in detail using a simple example



    /****** Object: Table [dbo].[file] Script Date: 1/25/2012 7:04:55 PM ******/







    CREATE TABLE [dbo].[file](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [location] [varchar](100) NOT NULL,

    [textfile] [varchar](max) NULL





    So first we will use a FOR EACH LOOP CONTAINER with FILE ENUMERATOR to loop all the files in the source and insert the file locations in the DBO.FILE table

    so lets execute package so far see the results

    result of the import column

    Now lets  export  text file column to a file location using export column transformation

    drag and drop a data flow task and add a oledb source same as import column with table as file

    so now we configured the export column transformation lets run

    before running the destination look like this

    Above post is explained by me in this video


This wait occurs while a task is waiting for I/O operations to complete that are related to non data page I/O’s











There are several reasons why this wait type can be accumulated over time

Now we have a wait time we need to investigate why this is happening in order to do that we need to go through the perfmon physical disk counters

because most of the this wait type can be accumulated due to improper I/O sub system design , and improper system design

Disk Performance Counters

The Disk Performance counters help you to evaluate the performance of the disk subsystem. The disk subsystem is more than the disk itself. It will include to disk controller card, the I/O bus of the system, and the disk. When measuring disk performance it is usually better to have a good baseline for performance than simply to try and evaluate the disk performance on a case by case basis.

There are two objects for the disk—PhysicalDisk and LogicalDisk. The counters for the two are identical. However, in some cases they may lead to slightly different conclusions. The PhysicalDisk object is used for the analysis of the overall disk, despite the partitions that may be on the disk. When evaluating overall disk performance this would be the one to select. The LogicalDisk object analyzes information for a single partition. Thus the values will be isolated to activity that is particularly occurring on a single partition and not necessarily representative of the entire load that the disk is burdened with. The LogicalDisk object is useful primarily when looking at the affects or a particular application, like SQL Server, on the disk performance. Again the PhysicalDisk is primarily for looking at the performance of the entire disk subsystem. In the list that follows, the favored object is indicated with the counter. When the LogicalDisk and PhysicalDisk objects are especially different, the counter will be listed twice and the difference specifically mentioned.

  • PhysicalDisk : Current Disk Queue Length. This counter provides a primary measure of disk congestion. Just as the processor queue was an indication of waiting threads, the disk queue is an indication of the number of transactions that are waiting to be processed. Recall that the queue is an important measure for services that operate on a transaction basis. Just like the line at the supermarket, the queue will be representative of not only the number of transactions, but also the length and frequency of each transaction.
  • PhysicalDisk : % Disk Time. Much like % Processor time, this counter is a general mark of how busy the disk is. You will see many similarities between the disk and processor since they are both transaction-based services. This counter indicates a disk problem, but must be observed in conjunction with the Current Disk Queue Length counter to be truly informative. Recall also that the disk could be a bottleneck prior to the % Disk Time reaching 100%.
  • PhysicalDisk : Avg. Disk Queue Length. This counter is actually strongly related to the %Disk Time counter. This counter converts the %Disk Time to a decimal value and displays it. This counter will be needed in times when the disk configuration employs multiple controllers for multiple physical disks. In these cases, the overall performance of the disk I/O system, which consists of two controllers, could exceed that of an individual disk. Thus, if you were looking at the %Disk Time counter, you would only see a value of 100%, which wouldn’t represent the total potential of the entire system, but only that it had reached the potential of a single disk on a single controller. The real value may be 120% which the Avg. Disk Queue Length counter would display as 1.2.
  • PhysicalDisk : Disk Reads/sec. This counter is used to compare to the Memory: Page Inputs/sec counter. You need to compare the two counters to determine how much of the Disk Reads are actually attributed to satisfying page faults.
  • LogicalDisk : Disk Reads/sec. When observing an individual application (rather a partition) this counter will be an indication of how often the applications on the partition are reading from the disk. This will provide you with a more exact measure of the contribution of the various processes on the partition that are affecting the disk.
  • PhysicalDisk : Disk Reads Bytes/sec. Primarily, you’ll use this counter to describe the performance of disk throughput for the disk subsystem. Remember that you are generally measuring the capability of the entire disk hardware subsystem to respond to requests for information.
  • LogicalDisk : Disk Reads Bytes/sec. For the partition, this will be an indication of the rate that data is being transferred. This will be an indication of what type of activity the partition is experiencing. A smaller value will indicate more random reads of smaller sections.
  • PhysicalDisk : Avg. Disk Bytes/Read. This counter is used primarily to let you know the average number of bytes transferred per read of the disk system. This helps distinguish between random reads of the disk and the more efficient sequential file reads. A smaller value generally indicates random reads. The value for this counter can also be an indicator of file fragmentation.
  • PhysicalDisk : Avg. Disk sec/Read. The value for this counter is generally the number of seconds it takes to do each read. On less-complex disk subsystems involving controllers that do not have intelligent management of the I/O, this value is a multiple of the disk’s rotation per minute. This does not negate the rule that the entire system is being observed. The rotational speed of the hard drive will be the predominant factor in the value with the delays imposed by the controller card and support bus system.
  • PhysicalDisk: Disk Reads/sec. The value for this counter is the number of reads that the disk was able to accomplish per second. Changes in this value indicate the amount of random access to the disk. The disk is a mechanical device that is capable of only so much activity. When files are closer together, the disk is permitted to get to the files quicker than if the files are spread throughout the disk. In addition, disk fragmentation can contribute to an increased value here.

physical disk counters in context to the SQL server

PhysicalDisk\Avg. Disk Queue Length

Threshold: Should not be higher than the number of spindles plus two.

Significance: This counter indicates the average number of both read and writes requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk Read Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of read requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk Write Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of write requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk sec/Read

Threshold: No specific value.

Less than 10 ms – very good

Between 10-20 ms – okay

Between 20-50 ms – slow, needs attention

Greater than 50 ms – Serious I/O bottleneck

Significance: This counter indicates the average time, in seconds, of a read of data from the disk.

  • PhysicalDisk\Avg. Disk sec/Transfer

Threshold: Should not be more than 18 milliseconds.

Significance: This counter indicates the time, in seconds, of the average disk transfer. This may indicate a large amount of disk fragmentation, slow disks, or disk failures. Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, so you need more RAM.

  • PhysicalDisk\Disk Writes/sec

Threshold: Depends on manufacturer’s specification.

Significance: This counter indicates the rate of write operations on the disk.

  • ·         Physical Disk: %Disk Time

Threshold:  Greater than 50 percent, it represents an I/O bottleneck

Significance: Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.

  • ·         Physical Disk\Avg. Disk Reads/Sec  and Physical Disk\Avg. Disk Writes/Sec

Threshold: It should be less than 85% of the disk capacity

Significance: It represents the rate of read operations on the disk.

When using above counters, you may need to adjust the values for RAID configurations using the following formulas.

Raid 0 — I/Os per disk = (reads + writes) / number of disks

Raid 1 — I/Os per disk = [reads + (2 * writes)] / 2

Raid 5 — I/Os per disk = [reads + (4 * writes)] / number of disks

Raid 10 — I/Os per disk = [reads + (2 * writes)] / number of disks

For example, you have a RAID-1 system with two physical disks with the following values of the counters.

Disk Reads/sec            80

Disk Writes/sec           70

Avg. Disk Queue Length    5

In that case, you are encountering (80 + (2 * 70))/2 = 110 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck.

Disk I/O issues can be minimized by having larger and sequential I/O activities.    To achieve than you should consider partitioning.    Partitioning can be done in many ways – Database partitioning, Tempdb partitioning, Table partitioning, index partitioning.    At a minimum, it is highly recommended to partition your database in such a way that clustered index, non-clustered index, and transaction logs are each on a separate physical drive and spread across multiple logical filegroups/files within that drive.

Physical Disk: % Disk Time

This counter measures how busy a physical array is (not a logical partition or individual disks in an array). It provides a good relative measure of how busy your arrays are.
As a rule of thumb, the % Disk Time counter should run less than 55%. If this counter exceeds 55% for continuous periods (over 10 minutes or so during your 24 hour monitoring period), then your SQL Server may be experiencing an I/O bottleneck. If you see this behavior only occasionally in your 24 hour monitoring period, I wouldn’t worry too much, but if it happens often (say, several times an hour), then I would start looking into finding ways to increase the I/O performance on the server, or to reduce the load on the server. Some ways to boost disk I/O include adding drives to an array (if you can), getting faster drives, adding cache memory to the controller card (if you can), using a different version of RAID, or getting a faster controller.

Refer :

HBA queue depth

perfmon counters

predeployment best practices

Introduction to wait stats part 2- SOS_SCHEDULER_YIELD

 In this post you will  learn

What is Multitasking , Preemptive and Non preemptive Scheduling


How to know whether you CPU is under pressure (Why ?)

In the above posts we learned how to  understand the waits and how to monitor them , specially CXPACKET which comes in to the picture when achieving parallelism

I have been researching about the specific wait type called SOS_SCHEDULER_YEILD I got some interesting things to share and Views from famous MVP and SQL gurus

Before going in to details we need to understand the concept of Multitasking please use below link to understand in detail about multitasking

I’ll try to highlight the main points in the above post


What is Multitasking ?

On a single-processor multitasking system, multiple processes don’t actually run at the same time since there’s only one processor. Instead, the processor switches among the processes that are active at any given time. Because computers are so fast compared with people, however, it appears to the user as though the computer is executing all of the tasks at once. Multitasking also allows the computer to make good use of the time it would otherwise spend waiting for I/O devices and user input–that time can be used for some other task that doesn’t need I/O at the moment.

  So CPU switches its time spent on the Multiples processes so fast so that for the end-user it looks like the processor is doing multitasking

In order to perfectly switch between processes or tasks there should be a Scheduler to manage the switches the way scheduler stores the contents of the processes or threads into a registers or L1 or L2 or Memory is called Context Switching

There are majorly two types scheduling algorithms Preemptive and Non Preemptive Scheduling

Preemptive :: Means Non – cooperative – If a CPU time(Quantum)allocated to a particular thread is exceeded or if a high priority task needs to be executed the OS will interrupt the previous processes and executes the high priority task by maintining a queue of suspended or runnable tasks

Non Preemptive:: Means Co operative mode – .i.e. here OS will not interrupt the currently running process in stead it waits untill the currently running process voluntary yields

 Now we will learn how these scheduling effects the sql server performance

Sql server OS (level of layer between OS and user) have the Non – preemptive mode as we learned the active thread will voluntarily yield for the next runnable thread , the SQL OS is designed such a way that  active thread will not starve other runnable threads SQL decides when should the active thread should yield to other during this process SQl OS records this value as last wait type in the DMV’s ‘SOS_SCHEDULAR_YEILD’

Most operating systems uses preemptive mode scheduling but all scheduling done by the SQL OS is in Non preemptive mode so , when thread or processes can change  Non preemptive mode to preemptive mode Eg: when using XP_CMD sehll and running batch commands, accessing files and directory , Extended stored procedures , linked servers…  when these occurs it record in the DMV SYS.DM_OS_WAIT_STATS

all the preemptive wait stats are recorded in this format Preemptive_>category<_>function<

in the above case PREEMPTIVE_OS_%

There are 190 preemptive waits  that SQL OS records in the DMV


for example


Waits until the specified object is in the signaled state or the time-out interval elapses.

The WaitForSingleObject function checks the current state of the specified object. If the object’s state is nonsignaled, the calling thread enters the wait state until the object is signaled or the time-out interval elapses.

The function modifies the state of some types of synchronization objects. Modification occurs only for the object whose signaled state caused the function to return. For example, the count of a semaphore object is decreased by one.

The WaitForSingleObject function can wait for the following objects:

  • Change notification
  • Console input
  • Event
  • Memory resource notification
  • Mutex
  • Process
  • Semaphore
  • Thread
  • Waitable timer

In our case it may any thing OS may preempt the thread for any of the above objects

these below are documented in MSDN

PREEMPTIVE_ABR Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
PREEMPTIVE_AUDIT_ACCESS_EVENTLOG Occurs when the SQL Server Operating System (SQLOS) scheduler switches to preemptive mode to write an audit event to the Windows event log.
PREEMPTIVE_AUDIT_ACCESS_SECLOG Occurs when the SQLOS scheduler switches to preemptive mode to write an audit event to the Windows Security log.
PREEMPTIVE_CLOSEBACKUPMEDIA Occurs when the SQLOS scheduler switches to preemptive mode to close backup media.
PREEMPTIVE_CLOSEBACKUPTAPE Occurs when the SQLOS scheduler switches to preemptive mode to close a tape backup device.
PREEMPTIVE_CLOSEBACKUPVDIDEVICE Occurs when the SQLOS scheduler switches to preemptive mode to close a virtual backup device.
PREEMPTIVE_CLUSAPI_CLUSTERRESOURCECONTROL Occurs when the SQLOS scheduler switches to preemptive mode to perform Windows failover cluster operations.
PREEMPTIVE_COM_COCREATEINSTANCE Occurs when the SQLOS scheduler switches to preemptive mode to create a COM object.

please use this below links to better understand waits

once now you have an idea what switching is happening more in the SQl server next step is to find out whether CPU is under pressure or not

Best DMV for this information is SYS.DM_OS_SCHEDULERS

schedulersare basically they schedule tasks or they manage tasks between processors or processor, here in this DMV we will have a scheduler for each processor all the schedulers re not available for user query processing, some are used by the Database engine or may be disabled for SQL server use using affinity masks and each scheduler have some number of active workers which take care of utilizing quantum for processing

So each scheduler can have no tasks waiting for the scheduler (cpu) runnable_task_count

and each worker has queue of work waiting for the worker thread to pick  worke_queue_count

runnable_task_count should not be in double digits and avg should be less than one 

now you found out that there is cpu pressure now we have to investigate, why?

best way is to find out the execution plans of the top 20 max_worker_time in SYS.dm_EXEC_QUERY_STATS

max worker time is the amount of time spent executing and also the execution count so that we will have a clear idea of  queries using maximum cpu , which resulting in Preemptive_OS _%% wait and non preemptive SOS_SCHEDULER_YIELD

SELECT TOP 10 st.text


FROM sys.dm_exec_query_stats qs

CROSSAPPLYsys.dm_exec_sql_text(qs.sql_handle) st

CROSSAPPLYsys.dm_exec_query_plan(qs.plan_handle) qp

ORDER BY qs.total_worker_time  DESC

SELECT scheduler_id
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

@msdn Slava Oks blog


  1. Q. Do I need to by more CPUs?

In order to answer this question you have to find out if your load is really CPU bounded.  Your load is really CPU bounded if a number of runnable tasks per each scheduler always greater than 1 and all of your queries have correct plan.  The latter statement is very important, your load can be CPU bounded due to the fact that somehow optimizer generated bad plan – it can happen if your statistics out of date or you tried to perform handcrafted optimization. In this case you don’t want to run toCircuitCityto buy more CPUs right a way – you want to fix the plan. Here is the query to find out average length of a runable queue on the system:


AVG (runnable_tasks_count)





Buying more CPUs has also to do with capacity planning. You have to be very careful when performing capacity planning on hardware with HT enabled – remember you don’t have extra physical CPUs. Keep in mind that if your load runs at 60% CPU utilization – it doesn’t mean that you have 40% of extra CPU capacity. You will be very surprise how fast CPU load will jump from 60% to 80% and then even faster to 100% once you apply more and more load.

  1. Q. What is affinity of my schedulers to CPUs?



      CAST (cpu_id as varbinary) AS scheduler_affinity_mask



  1. Does my machine have either hard or soft NUMA configuration enabled?


      CASE count( DISTINCT parent_node_id)

      WHEN 1 THEN ‘NUMA disabled’

      ELSE ‘NUMA enabled’




where parent_node_id <> 32

  1. Q. Should I configure SQL Server to use more threads – sp_configure ‘max server threads’?

You can answer this question by looking at the work queue length for each scheduler. If on average such value is above 1 then you might benefit from adding more threads to the system but only if

A.  Your load currently is not CPU bounded (See info above on how to find out if your load is CPU bound)

B. Your load currently doesn’t experience any other heavy waits (If you add more threads in this case they will just end up waiting as everyone else)


AVG (work_queue_count)





  1. Q: Is my system I/O bound?

You can answer this question by monitoring length of I/O queues.





 If over time they keep on growing or you are seeing periodic jumps or numbers stay relatively high most likely your system is I/O bound. In order to identify the cause you will have to dive further.

Preemptive Vs Non Preemptive and Multitasking vs Multithreading

 What Is Multitasking?

Multitasking is the ability of a computer to run more than one program, or task , at the same time. Multitasking contrasts with single-tasking, where one process must entirely finish before another can begin. MS-DOS is primarily a single-tasking environment, while Windows 3.1 and Windows NT are both multi-tasking environments.

On a single-processor multitasking system, multiple processes don’t actually run at the same time since there’s only one processor. Instead, the processor switches among the processes that are active at any given time. Because computers are so fast compared with people, however, it appears to the user as though the computer is executing all of the tasks at once. Multitasking also allows the computer to make good use of the time it would otherwise spend waiting for I/O devices and user input–that time can be used for some other task that doesn’t need I/O at the moment.

Multitasking on a multiple-processor system still involves the processors switching between tasks because there are almost always more tasks to run than there are processors. Note, however, that there can be as many tasks running simultaneously as there are processors in the system. For the moment, we’ll discuss multitasking on a single-processor system.

Review: Preemptive and Non-Preemptive Multitasking

Within the category of multitasking, there are two major sub-categories: preemptive and non-preemptive (or cooperative). In non-preemptive multitasking , use of the processor is never taken from a task; rather, a task must voluntarily yield control of the processor before any other task can run. Windows 3.1 uses non-preemptive multitasking for Windows applications.

Programs running under a non-preemptive operating system must be specially written to cooperate in multitasking by yielding control of the processor at frequent intervals. Programs that do not yield sufficiently often cause non-preemptive systems to stay “locked” in that program until it does yield. An example of failed non-preemptive multitasking is the inability to do anything else while printing a document in Microsoft Word for Windows 2.0a. This happens because Word does not give up control of the processor often enough while printing your document. The worst case of a program not yielding is when a program crashes. Sometimes, programs which crash in Windows 3.1 will crash the whole system simply because no other tasks can run until the crashed program yields.

Preemptive multitasking differs from non-preemptive multitasking in that the operating system can take control of the processor without the task’s cooperation. (A task can also give it up voluntarily, as in non-preemptive multitasking.) The process of a task having control taken from it is called preemption. Windows NT uses preemptive multitasking for all processes except 16-bit Windows 3.1 programs. As a result, a Window NT application cannot take over the processor in the same way that a Windows 3.1 application can.

A preemptive operating system takes control of the processor from a task in two ways:

  • When a task’s time quantum (or time slice) runs out. Any given task is only given control for a set amount of time before the operating system interrupts it and schedules another task to run.
  • When a task that has higher priority becomes ready to run. The currently running task loses control of the processor when a task with higher priority is ready to run regardless of whether it has time left in its quantum or not.











Switching Among Tasks

At any given time, a processor (CPU) is executing in a specific context. This context is made up of the contents of its registers and the memory (including stack, data, and code) that it is addressing. When the processor needs to switch to a different task, it must save its current context (so it can later restore the context and continue execution where it left off) and switch to the context of the new task. This process is called context switching .

When Windows NT switches tasks, it saves all of the registers of the task it’s leaving and re-loads the registers of the task to which it’s switching. This process also enables the proper address space for the task to which Windows NT is switching.

Processes and Threads

In addition to being a preemptive multitasking operating system, Windows NT is also multithreaded, meaning that more than one thread of execution (or thread ) can execute in a single task at once.

A process comprises:

  • A private memory address space in which the process’s code and data are stored.
  • An access token against which Windows NT makes security checks.
  • System resources such as files and windows (represented as object handles).
  • At least one thread to execute the code.

A thread comprises:

  • A processor state including the current instruction pointer.
  • A stack for use when running in user mode.
  • A stack for use when running in kernel mode.

Since processes (not threads) own the access token, system resource handles, and address space, threads do NOT have their own address spaces nor do they have their own access token or system resource handles. Therefore, all of the threads in a process SHARE the same memory, access token, and system resources (including quota limits) on a “per-process” rather than a “per-thread” basis. In a multithreaded program, the programmer is responsible for making sure that the different threads don’t interfere with each other by using these shared resources in a way that conflicts with another thread’s use of the same resource. (As you might suspect, this can get a little tricky.)

Why Use Multithreading?

Multithreading provides a way to have more than one thread executing in the same process while allowing every thread access to the same memory address space. This allows very fast communication among threads. Threads are also easier to create than processes since they don’t require a separate address space.

Inside Windows NT, processes and threads are represented as objects that are created, maintained, and destroyed by the Process Manager. These Process Manager process and thread objects contain simpler kernel process and thread objects.

Some typical examples of the use of multiple threads are using a background thread to print a document in a word processor and to recalculate a spreadsheet. When a new thread is created to do these tasks, the main thread can continue responding to user input. A single-threaded application can’t respond to user input until it’s done printing or recalculating or whatever.

On a uniprocessor platform, the use of multiple threads allows a user to continue using a program even while another thread is doing some lengthy procedure. But only one thread executes at a time.

On a multiprocessor platform, more than one processor may be running different threads in the same process. This has the potential for very significantly speeding up the execution of your program.

Sharing A Single Address Space–Synchronizing Access To Data

Running each process in its own address space had the advantage of reliability since no process can modify another process’s memory. However, all of a process’s threads run in the same address space and have unrestricted access to all of the same resources, including memory. While this makes it easy to share data among threads, it also makes it easy for threads to step on each other. As mentioned before, multithreaded programs must be specially programmed to ensure that threads don’t step on each other.

A section of code that modifies data structures shared by multiple threads is called a critical section . It is important than when a critical section is running in one thread that no other thread be able to access that data structure. Synchronization is necessary to ensure that only one thread can execute in a critical section at a time. This synchronization is accomplished through the use of some type of Windows NT synchronization object. Programs use Windows NT synchronization objects rather than writing their own synchronization both to save coding effort and for efficiency: when you wait on a Windows NT synchronization object, you do NOT use any CPU time testing the object to see when it’s ready.

Windows NT provides a variety of different types of synchronization objects that programs can use to coordinate threads’ access to shared data structures. Synchronization objects remember their states and can be set and tested in one uninterruptable step. They also cause the thread to be suspended while waiting on an object and to automatically restart when the other thread signals that it’s done.

During the initialization of a program, the program creates a synchronization object for each data structure or object that will be shared among threads.

EVERY critical section will have the following structure:

  1. Wait on the synchronization object before accessing the data structure. The Windows NT waiting API insures that your thread is suspended until the synchronization object becomes unlocked. As soon as the synchronization object becomes unlocked, Windows NT sets the synchronization object to “locked” and restarts your thread.
  2. Access the data structure. (This is the critical section.)
  3. Unlock the synchronization object so that the data can be accessed by other threads.

The first step is critical because if it’s omitted then any thread can access the data structure while you’re accessing. The last step is also critical–it it’s omitted, then no thread will be able to access the data even after you’re done.

Using this technique on every critical section insures that only one thread can access the data at a time.

The Life Cycle Of A Thread

Each thread has a dispatcher state that changes throughout its lifetime.

The most important dispatcher states are:

  • Running: only one thread per processor can be running at any time.
  • Ready: threads that are in the Ready state may be scheduled for execution the next time the kernel dispatches a thread. Which Ready thread executes is determined by their priorities.
  • Waiting: threads that are waiting for some event to occur before they become Ready are said to be waiting. Examples of events include waiting for I/O, waiting for a message, and waiting for a synchronization object to become unlocked.

The Kernel’s Dispatcher

The kernel’s dispatcher performs scheduling and context switching.

Thread scheduling is the act of determining which thread runs on each processor at a given time.

Context switching is the act of saving one thread’s volatile state (CPU register contents) and restoring another thread’s state so it can continue running where it previously left off.

How Thread Priorities Affect Scheduling

The kernel’s dispatcher schedules threads to run based a 32-level priority scheme. Windows NT guarantees that the threads that are ready that have the highest priority will be running at any given time. (That’s one thread on a single-processor system.) Threads with a priority of 31 will be run before any others, while threads with a priority of 0 will run only if no other threads are ready. The range of priorities is divided in half with the upper 16 reserved for real-time threads and the lower 16 reserved for variable priority threads.

Real-time threads run at the same priority for their entire lifetime. They are commonly used to monitor or control systems that require action to be taken at very precise intervals. These threads run at higher priorities than all variable priority threads, which means that they must be used sparingly.

Variable priority threads are assigned a base priority when they are created. (A thread’s base priority is determined by the process to which the thread belongs.) The priority of such threads can be adjusted dynamically by the kernel’s dispatcher. A thread’s dynamic priority can vary up to two priority levels above or below its base priority.

The dispatcher maintains a priority queue of ready tasks. When prompted to reschedule, it changes the state of the highest priority task to Standby. When the conditions are right, a context switch is performed to begin the thread’s execution and the thread goes into the Ready state.

Lower priority threads will always be preempted when a higher priory thread enters the ready state. This is true even if the lower priority thread has time remaining in its quantum, or if the lower priority thread is running on a different processor.

Performance Tuning

In order to get the computer system to perform as users expect, Windows NT changes the priorities of threads over time.

Each process has a base priority. Threads in a process can alter their base priority by up to two levels up or down.

Depending on the type of work the thread is doing, Windows NT may also adjust the thread’s dynamic priority upwards from its base priority. For instance:

  • Threads that are waiting for input get a priority boost, as do threads in the foreground process. This makes the system responsive to the user.
  • Threads get a priority boost after completing a voluntary wait.
  • All threads periodically get a priority boost to prevent lower priority threads from holding locks on shared resources that are needed by higher priority threads.
  • Compute-bound threads get their priorities lowered.

Scheduling On Multiprocessor Systems

A multiprocessing operating system is one that can run on computer systems that contain more than one processor. Windows NT is a symmetric multiprocessing (SMP) system, meaning that it assumes that all of the processors are equal and that they all have access to the same physical memory. Therefore, Windows NT can run any thread on any available processor regardless of what process, user or Executive, owns the thread.

There are also asymmetric multiprocessing (ASMP) systems in which processors are different from each other–they may address different physical memory spaces, or they may have other differences. These operating systems only run certain processes on certain processors–for instance, the kernel might always execute on a particular processor.

The design of Windows NT supports processor affinity , whereby a process or thread can specify that it is to run on a particular set of processors, but this facility isn’t supported in the first release.

Windows NT uses the same rules for scheduling on a multiprocessor system as it does on a single processor system, so at any given time the threads that are ready and have the highest priorities are actually running.

Understanding SQL server Memory Grant and Memory Management

SQl server 2005 and SQL server 2008 r2

Query memory grant (a.k.a. query work buffer) is a part of server memory used to store temporary row data while sorting and joining rows. It is called “grant” because the server requires those queries to “reserve” before actually using memory. This reservation improves query reliability under server load, because a query with reserved memory is less likely to hit out-of-memory while running, and the server prevents one query from dominating entire server memory. In the following sections, I will describe how SQL server estimates query memory requirement, and how it throttles memory grants when multiple queries are competing.

When SQL server receives a user query, it follows well-defined steps to produce a result for user. First, it creates a compiled plan, which is a set of logical instructions such as how to join rows. Next, it creates an execution plan based on the compiled plan. This execution plan contains instructions with all logical references translated to actual objects, and tracking mechanisms for query execution. Finally, the server starts execution from the top of instruction tree. Creating a compiled plan is expensive because the server needs to find the optimal plan out of hundreds of potential candidates. Distinguishing between compile and execution helps overall server performance because each compiled plan can be cached and shared among multiple execution plans. The memory grant estimate follows the same overall sequence. It has parameters saved in compiled plan, and a mechanism to calculate actual grant size at execution time.

Memory consumers
Now that I briefly covered the lifetime of query execution, I would like to show where query memory grant fits in overall server memory usage. As hinted in the previous paragraph, a successful query execution involves 3 major memory consumers: compile, cache, and memory grant.

  • Compile (query optimization): Building and searching the most optimal plan out of hundreds candidates typically requires significant amount of memory. The lifetime of this usage is typically short because optimizer releases memory as soon as optimal plan is found. Lack of available memory would cause delays in compile, and potentially inefficient (slow) plans.
  • Cache: Finding optimal plan is costly in terms of CPU and memory usage. SQL server tries to store compiled plans in caches for later reuse. Lifetime of this memory usage is long-term. Lack of cache memory would cause more unnecessary re-compiles.
  • Memory grant: This memory is used to store temporary rows for sort and hash join. The lifetime of memory grant is the same as the lifetime of query. Lack of available memory grant causes a query to use hard disk, which affects query performance.

SQL Server maintains the balance between these 3 consumers with internal facility called “memory broker”. Based on the usage and physical memory available, the memory broker sets the usage limit and tells each component to trim its memory usage if it anticipates a shortage. Generally, a well behaving server would have about the same contributions from these consumers.

Grant parameters
When SQL server creates a compiled plan, it calculates two memory grant parameters called “required memory” and “additional memory”.

  • Required memory: Minimum memory needed to run sort and hash join. It is called required because a query would not start without this memory available. SQL server uses this memory to create internal data structures to handle sort and hash join.
  • Additional memory: Amount needed to store all temporary rows in memory. This depends on the cardinality estimate (expected number rows and average size of row). This is called additional because a query can survive lack of such memory by storing part of temporary rows on hard disk. A query is not guaranteed to have the full amount if the total exceeds preset limit.

For example, let’s consider a simple query which needs to sort 1 million rows of 10 bytes each in size. The required memory for this query is 512KB because this is the minimum amount SQL server needs to construct internal data structures to handle one sort. Since it would take 10MB to store all rows, the additional memory would be 10MB (slightly higher when overhead is included). This calculation becomes complicated if the compiled plan has multiple sorts and joins because SQL server also considers the lifetime of each operator for more efficient memory usage. You would generally see smaller estimate than the sum of all sorts and joins. If you are interested in the relative memory usages among operators, you can check out <MemoryFractions> tag in Showplan XML. The following sections show how these parameters are used when calculating grant size at runtime.

DOP dependence
If SQL server has more than 1 CPU, it can run a query in parallel mode for improved performance by sharing work among parallel workers. These workers run independent of each other, and use “parallelism operator (a.k.a. exchange)” to transfer processed rows. This parallel mode increases memory usage because each worker needs its own copy of sort or hash join, and the parallelism operator needs buffers for temporary storage of transferred rows. Since DOP N would use N parallel workers, the query would need N times more required memory. On the other hand, the total number of rows to handle (and memory cost to store them) does not change with DOP. This means that the additional memory would stay the same regardless of DOP setting. Starting with SQL 2008, the buffer memory used by parallelism operator is also counted as a required part of memory grant.

Memory grant process
In the previous section, we discussed how parallelism affects the query memory requirement. In this section, we will discuss how SQL server takes server memory and number of concurrent queries into considerations. The server needs to consider such dynamic factors to avoid committing memory beyond its physical limit. This is done in 2 distinct steps. First, the server calculates how much memory to grant for given query. Then it uses the internal facility called Resource Semaphore to reserve actual memory, or throttle if too many queries ask for memory. First, the following steps show how the request size is determined.

  • The server decides parallelism (DOP) based on the plan and the server state.
  • The server checks if memory grant is needed not. If not needed, the server can start the query immediately. For example, a simple serial query without “ORDER BY” or “GROUP BY” may not need memory grant.
  • The server calculates the memory limit for one query. By default, this is 25% (20% on 32bit SQL 2005) of total query memory (which is set by memory broker as about 90% of server memory). This per-query limit helps to prevent one query from dominating the whole server. This percentage is configurable on SQL 2008.
  • The server calculates the ideal query memory as required*DOP + additional (+ exchange on SQL 2008). This is amount a query would like to have, based on its cardinality estimate.
  • The server checks if the ideal memory exceeds the per-query limit. If it does, then the server reduces the additional memory until the total fits within the limit. This revised size is called requested memory.The server asks Resource Semaphore to grant the requested memory.

Resource Semaphore
Resource Semaphore is responsible for satisfying memory grant requests while keeping overall memory grant usages within the server limit.

  • Resource Semaphore allows a query to reserve memory only if there is enough free memory available. Otherwise, a requesting query is forced to wait in a queue.
  • When Resource Semaphore receives a new request, it first checks if any query is waiting. If it finds one, it puts the new query into a queue for fairness because the wait queue is designed as first-come-first-served basis.
  • Resource Semaphore checks for waiting query in its queue. If it finds one, it puts the new query into a wait queue for fairness, because the wait queue is designed as first-come-first-served basis with small weight to favor small queries.
  • Resource Semaphore makes a grant attempt if it does not find waiting query, or when existing query returns memory.
  • Resource Semaphore attempts to grant memory when there is no waiting query, or when a query returns memory.
  • Grant attempt is made when there is no waiting query, or when an existing query returns reserved memory.
  • If it finds a waiting query, it puts the current query into a waiting queue for fairness.
  • If it does not find any waiting query, it then checks available free memory.
  • If it finds enough free memory, then the requested memory is granted and the query can start running.
  • If it does not find enough free memory, then it puts the current query into the waiting queue.
  • Resource Semaphore wakes up queries in the waiting queue when enough free memory becomes available.

Debugging memory grant related issues
SQL server supplies a few dynamic management views (DMV) to help investigate memory grant related issues. Please refer BOL (books-on-line) for detailed information on DMVs discussed below.

This DMV shows the current status of Resource Semaphore mentioned in the previous section. On SQL 2005, you will find 2 of them. One with non-null max_target_memory_kb column is called Regular Resource Semaphore, and the other is called Small Resource Semaphore. As its name implies, Regular Resource Semaphore is used by all queries under normal condition, while Small Resource Semaphore is used by small size queries (less than 5 MB) when they have to wait (see step 6 in the previous section). This would improve response time of small queries which are expected to finish very fast. Note max_target_memory_kb column shows the server memory limit used in step 2 of previous section.

This DMV shows all queries consuming memory grants including those waiting in Resource Semaphore queue. Waiting queries would have null grant_time column. Resource Semaphore uses internal query cost estimate to prioritize memory grants, and is_next_candidate column shows which query to wake up when memory is available.

This DMV shows wait statistics of all server objects. Memory grant uses “RESOURCE_SEMAHORE” wait type. If you see significant waits on this wait type, you may have an issue with big queries.

Sample queries
The following sample queries show how memory grant DMV are used.

Find all queries waiting in the memory queue:

SELECT * FROM sys.dm_exec_query_memory_grants where grant_time is null

Find who uses the most query memory grant:

SELECT mg.granted_memory_kb, mg.session_id, t.text, qp.query_plan
FROM sys.dm_exec_query_memory_grants AS mg
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) AS t
CROSS APPLY sys.dm_exec_query_plan(mg.plan_handle) AS qp

Search cache for queries with memory grants:

SELECT t.text, cp.objtype,qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
JOIN sys.dm_exec_query_stats AS qs ON cp.plan_handle = qs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
WHERE qp.query_plan.exist(‘declare namespace n=”“; //n:MemoryFractions’) = 1

Source : MSDN : SQL Server Query Processng Team

@@OPTIONS — Configuration Function

What is @@OPTIONS ?

Returns information about the current SET options

Why it is used?

SQL Server offers many built in Configuration functions and one of these functions, @@OPTIONS which allows you to get the current values that are set for the current session.  When each connection is made the default values are established for each connection and remain set unless they are overridden by some other process

here are the list of set options

Value Configuration Description
1 DISABLE_DEF_CNST_CHK Controls interim or deferred constraint checking.
2 IMPLICIT_TRANSACTIONS For dblib network library connections, controls whether a transaction is started implicitly when a statement is executed. The IMPLICIT_TRANSACTIONS setting has no effect on ODBC or OLEDB connections.
4 CURSOR_CLOSE_ON_COMMIT Controls behavior of cursors after a commit operation has been performed.
8 ANSI_WARNINGS Controls truncation and NULL in aggregate warnings.
16 ANSI_PADDING Controls padding of fixed-length variables.
32 ANSI_NULLS Controls NULL handling when using equality operators.
64 ARITHABORT Terminates a query when an overflow or divide-by-zero error occurs during query execution.
128 ARITHIGNORE Returns NULL when an overflow or divide-by-zero error occurs during a query.
256 QUOTED_IDENTIFIER Differentiates between single and double quotation marks when evaluating an expression.
512 NOCOUNT Turns off the message returned at the end of each statement that states how many rows were affected.
1024 ANSI_NULL_DFLT_ON Alters the session’s behavior to use ANSI compatibility for nullability. New columns defined without explicit nullability are defined to allow nulls.
2048 ANSI_NULL_DFLT_OFF Alters the session’s behavior not to use ANSI compatibility for nullability. New columns defined without explicit nullability do not allow nulls.
4096 CONCAT_NULL_YIELDS_NULL Returns NULL when concatenating a NULL value with a string.
8192 NUMERIC_ROUNDABORT Generates an error when a loss of precision occurs in an expression.
16384 XACT_ABORT Rolls back a transaction if a Transact-SQL statement raises a run-time error.

Source : Books online SQL server “Denali”

Use the user options option to specify global defaults for all users. A list of default query processing options is established for the duration of a user’s work session.

The user options option allows you to change the default values of the SET options (if the server’s default settings are not appropriate).

A user can override these defaults by using the SET statement. You can configure user options dynamically for new logins. After you change the setting of user options, new login sessions use the new setting; current login sessions are not affected.

The bit positions in user options are identical to those in @@OPTIONS. Each connection has its own @@OPTIONS function, which represents the configuration environment. When logging in to Microsoft SQL Server, a user receives a default environment that assigns the current user options value to @@OPTIONS. Executing SET statements for user options affects the corresponding value in the session’s @@OPTIONS function.

All connections created after this setting is changed receive the new value.

lets get in to the details how this setting can be changed and effect of SQL server performance with these options


This returns a integer and this represents the bit values for each of the options mentioned above

use the below query to know more about the current sessions setting for current user


IF ( (32 & @OPTIONS) = 32 ) PRINT ‘ANSI_NULLS’
IF ( (512 & @OPTIONS) = 512 ) PRINT ‘NOCOUNT’
IF ( (1024 & @OPTIONS) = 1024 ) PRINT ‘ANSI_NULL_DFLT_ON’
IF ( (2048 & @OPTIONS) = 2048 ) PRINT ‘ANSI_NULL_DFLT_OFF’
IF ( (16384 & @OPTIONS) = 16384 ) PRINT ‘XACT_ABORT’

Integer value returned from the @@OPTIONS changes according the options turned on/off

SQL Server Query Execution Plan Performance Problems with user options

Query execution plans are created the first time a stored procedure gets executed, once created the plans are cached and reused unless the query optimizer decides based on several factors (schema changes, statistics changes, etc.), to create a new plan. However, different query plans can be created for different connections if they have different SEToptions (SET ARITHABORT ON/OFF, SET ANSI_NULLS ON/OFF, etc.), so it is possible to have several execution plans for the same stored procedure at the same time.

If the user’s application has different SET options than those of the tools you are using to execute the queries (for example, SQL Server Management Studio), you will very likely get a different execution plan then the one the user is getting, and the stored procedure’s performance will be different.

In order to reproduce the problem in the same database where the performance problem is happening, you have to get the SET options that the application’s connection is using and then set your connection the same way. The easiest way to get this information is by using SQL Profiler and tracing the ExistingConnection event class (it is included in theStandard Template).

Once you get the application’s SET options, you can setup your connection the same way and you will be able to get the same cached plans that the users are getting from the application. The following steps show you how to do this:

  1. Launch SQL Profiler
  2. In SQL Profiler, select the Standard Template.
  3. Start the trace with the default settings. The very first rows that will appear correspond to the ExistingConnection event class. Each row corresponds to a different database connection.
  4. Select the row that corresponds to the connection being used by the user’s application and copy the TextData column to your query window. The text data column for the ExistingConnection event class contains the requiredSET commands to setup any connection the same way as the selected connection.
  5. Execute the user’s stored procedure to reproduce the problem. Now you will get the same execution plan that the user is getting and you can start to analyze the execution plan.

OUTPUT clause (change capture for DML statements)

What is output cluase ?

  • OUTPUT cluase returns the information of all the rows affected by the INSERT,UPDATE,DELETE and MERGE staments.
  • The results can also be inserted into a table or table variable. Additionally, you can capture the results of an OUTPUT clause in a nested INSERT, UPDATE, DELETE, or MERGE statement, and insert those results into a target table or view.

Why it is used ?

  • The information returned from the OUTPUT clause can be used as confirmation purposes, archiving, change capture  and some other application requirements

Can be used in


lets get into more details of how to use it

















Some times when using the identity columns in the tables there will be a need for the applications interacting with the database to know the current identity values for the inserted columns






  • Using OUTPUT  INTO to log the changes into another table
USE AdventureWorks2008R2;
DECLARE @MyTableVar table( NewScrapReasonID smallint,
                           Name varchar(50),
                           ModifiedDate datetime);
INSERT Production.ScrapReason
        INTO @MyTableVar
VALUES (N'Operator error', GETDATE());

--Display the result set of the table variable.
SELECT NewScrapReasonID, Name, ModifiedDate FROM @MyTableVar;
--Display the result set of the table.
SELECT ScrapReasonID, Name, ModifiedDate
FROM Production.ScrapReason;
same way we can log the changes into another table in delete and update statements
  • Using OUTPUT clause in MERGE
  •  Inserting the results of the MERGE statement into another table

            AND SOH.ORDERDATE BETWEEN '20080701' AND '20080731'


Below is some use full information about OUTPUT clause where we can use and where we can not 

  • The OUTPUT clause may be useful to retrieve the value of identity or computed columns after an INSERT or UPDATE operation.
  • When a computed column is included in the <dml_select_list>, the corresponding column in the output table or table variable is not a computed column. The values in the new column are the values that were computed at the time the statement was executed.
  • There is no guarantee that the order in which the changes are applied to the table and the order in which the rows are inserted into the output table or table variable will correspond.
  • If parameters or variables are modified as part of an UPDATE statement, the OUTPUT clause always returns the value of the parameter or variable as it was before the statement executed instead of the modified value.

You can use OUTPUT with an UPDATE or DELETE statement positioned on a cursor that uses WHERE CURRENT OF syntax.

The OUTPUT clause is not supported in the following statements:

  • DML statements that reference local partitioned views, distributed partitioned views, or remote tables.
  • INSERT statements that contain an EXECUTE statement.
  • Full-text predicates are not allowed in the OUTPUT clause when the database compatibility level is set to 100.
  • The OUTPUT INTO clause cannot be used to insert into a view, or rowset function.
  • A user-defined function cannot be created if it contains an OUTPUT INTO clause that has a table as its target.

To prevent nondeterministic behavior, the OUTPUT clause cannot contain the following references:

  • Subqueries or user-defined functions that perform user or system data access, or are assumed to perform such access. User-defined functions are assumed to perform data access if they are not schema-bound.
  • A column from a view or inline table-valued function when that column is defined by one of the following methods:
    • A subquery.
    • A user-defined function that performs user or system data access, or is assumed to perform such access.
    • A computed column that contains a user-defined function that performs user or system data access in its definition.

    When SQL Server detects such a column in the OUTPUT clause, error 4186 is raised. For more information, see MSSQLSERVER_4186.

Inserting Data Returned From an OUTPUT Clause Into a Table

When you are capturing the results of an OUTPUT clause in a nested INSERT, UPDATE, DELETE, or MERGE statement and inserting those results into a target table, keep the following information in mind:

  • The whole operation is atomic. Either both the INSERT statement and the nested DML statement that contains the OUTPUT clause execute, or the whole statement fails.
  • The following restrictions apply to the target of the outer INSERT statement:
    • The target cannot be a remote table, view, or common table expression.
    • The target cannot have a FOREIGN KEY constraint, or be referenced by a FOREIGN KEY constraint.
    • Triggers cannot be defined on the target.
    • The target cannot participate in merge replication or updatable subscriptions for transactional replication.
  • The following restrictions apply to the nested DML statement:
    • The target cannot be a remote table or partitioned view.
    • The source itself cannot contain a <dml_table_source> clause.
  • The OUTPUT INTO clause is not supported in INSERT statements that contain a <dml_table_source> clause.
  • @@ROWCOUNT returns the rows inserted only by the outer INSERT statement.
  • @@IDENTITY, SCOPE_IDENTITY, and IDENT_CURRENT return identity values generated only by the nested DML statement, and not those generated by the outer INSERT statement.
  • Query notifications treat the statement as a single entity, and the type of any message that is created will be the type of the nested DML, even if the significant change is from the outer INSERT statement itself.
  • In the <dml_table_source> clause, the SELECT and WHERE clauses cannot include subqueries, aggregate functions, ranking functions, full-text predicates, user-defined functions that perform data access, or the TEXTPTR function.


Columns returned from OUTPUT reflect the data as it is after the INSERT, UPDATE, or DELETE statement has completed but before triggers are executed.

For INSTEAD OF triggers, the returned results are generated as if the INSERT, UPDATE, or DELETE had actually occurred, even if no modifications take place as the result of the trigger operation. If a statement that includes an OUTPUT clause is used inside the body of a trigger, table aliases must be used to reference the trigger inserted and deleted tables to avoid duplicating column references with the INSERTED and DELETED tables associated with OUTPUT.

If the OUTPUT clause is specified without also specifying the INTO keyword, the target of the DML operation cannot have any enabled trigger defined on it for the given DML action. For example, if the OUTPUT clause is defined in an UPDATE statement, the target table cannot have any enabled UPDATE triggers.

If the sp_configure option disallow results from triggers is set, an OUTPUT clause without an INTO clause causes the statement to fail when it is invoked from within a trigger.

Data Types

The OUTPUT clause supports the large object data types: nvarchar(max), varchar(max), varbinary(max), text, ntext, image, and xml. When you use the .WRITE clause in the UPDATE statement to modify an nvarchar(max), varchar(max), or varbinary(max)column, the full before and after images of the values are returned if they are referenced. The TEXTPTR( ) function cannot appear as part of an expression on a text, ntext, or image column in the OUTPUT clause.


CDC – Change Data Capture (zero cost ETL solution)


What is CDC ? 

Change data capture records insert, update, and delete activity that is applied to a SQL Server table

How it does ?















  • The source of change data for change data capture is the SQL Server transaction log.
  • As inserts, updates, and deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log serves as input to the change data capture capture process.
  • This reads the log and adds information about changes to the tracked table’s associated change table.
  • Functions are provided to enumerate the changes that appear in the change tables over a specified range, returning the information in the form of a filtered result set.

Where it is used ?

A good example of a data consumer that is targeted by this technology is an extraction, transformation, and loading (ETL) application. An ETL application incrementally      loads change data from SQL Server source tables to a data warehouse or data mart.

we can  eliminate the use the of after update/delete/insert trigger

Note:Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.

Advantages of using CDC :

  • Minimal impact on the database (even more so if one uses log shipping to process the logs on a dedicated host).
  • No need for programmatic changes to the applications that use the database.
  • Low latency in acquiring changes.
  • Transactional integrity: log scanning can produce a change stream that replays the original transactions in the order they were committed. Such a change stream include changes made to all tables participating in the captured transaction.
  • No need to change the database schema

is CDC (Change Data Capture)  available in other off the shelf products)?

products perform change data capture using database transaction log files. These include:

  • Attunity Stream
  • Centerprise Data Integrator from Astera
  • DatabaseSync from WisdomForce
  • GoldenGate Transactional Data Integration
  • HVR from HVR Software
  • DBMoto from HiT Software
  • Shadowbase from Gravic
  • IBM InfoSphere Change Data Capture (previously DataMirror Transformation Server)
  • Informatica PowerExchange CDC Option (previously Striva)
  • Oracle Streams
  • Oracle Data Guard
  • Replicate1 from Vision Solutions
  • SharePlex from Quest Software
  • FlexCDC, part of Flexviews for MySQL

How to implement CDC ?

Lets get into the details of how to implement the CDC in SQL server 2008

Before enabling the cdc on the tables we need to enable the cdc on the Database.

To determine weather the database is enabled or not check the sys.databases table




Use the following system stored procedure to enable CDC on a database



after CDC is enabled on the database above listed tables are created in system tables

  • cdc.captured_columns Returns the columns tracked for a specific capture instance.
  • cdc.change_tables Returns tables created when CDC is enabled for a table. Use sys.sp_cdc_help_change_data_capture to query this informationrather than query this table directly.
  • cdc.ddl_history Returns rows for each DDL change made to the table, once CDE is enabled. Use sys.sp_cdc_get_ddl_history instead of querying this table directly.
  • cdc.index_columns Returns index columns associated with the CDC-enabled table. Query
  • sys.sp_cdc_help_change_data_capture to retrieve this information rather than querying this table directly.
  • cdc.lsn_time_mapping Helps you map the log sequence number to transaction begin and end times. Again, avoid querying the table directly, and instead use the functions sys.fn_cdc_map_lsn_to_time and sys.fn_cdc_map_time_to_lsn.

after enabling CDC on database lets enable CDC on table

Now that Change Data Capture is enabled, I can proceed with capturing changes for tables in
the database by using the sys.sp_cdc_enable_table system stored procedure. The parameters of
this stored procedure are described below
sp_cdc_enable_table Parameters
Parameter Description

  • @source_schema       This parameter defines the schema of the object.
  • @source_name           This parameter specifies the table name.
  • @role_name                This option allows you to select the name of the user-defined role that will have permissions to access the CDC data.
  • @capture_instance     You can designate up to two capture instances for a single table. This comes in handy if you plan on altering the schema of a table already                                                 captured by CDC. You can alter the schema without affecting theoriginal CDC (unless it is a data type change), create a new capture instance,                                        track changes in two tables, and then drop the original  capture instance once you are sure the new schema capture fits your requirements. If                                        you don’t designate the name, the default value is schema_source.
  • @supports_net_changes     When enabled, this option allows you to show just the latest change to the data within the LSN range selected. This option requires a                                                      primary key be defined on the table. If no primary key is defined, you can alsodesignate a unique key in the @index_name option.
  • @index_name             This parameter allows you to designate the unique key on the table to be used by CDC if a primary key doesn’t exist.
  • @captured_column_list If you aren’t interested in tracking all column changes, this option allows you to narrow down the list.
  • @filegroup_name       This option allows you to designate where the CDC data will be stored. For very large data sets, isolation on a separate filegroup may yield                                              better manageability and performance.
  • @partition_switch       This parameter takes a TRUE or FALSE value designating whether or not a ALTER TABLE…SWITCH PARTITION command will be                                                    allowed against the  CDC table (default is FALSE)here how its like before enabling the cdc on a particular4when we set the @supports_net_changes to 1 either we need to have a primary key or we need to specify a non clustered index name
    after making those changes when we execute the stored procedure result will beJob ‘cdc.sqljunkieshare_capture’ started successfully.
    Job ‘cdc.sqljunkieshare_cleanup’ started successfully.

    Capture and Cleanup Jobs

    Besides the Change Data Capture tables and query functions that have been created in our example, two SQL Server Agent jobs are created: a Capture and a Cleanup Job.

    The Capture job generally runs continuously and is used to move changed data to the CDC tables from the transaction log.

    The Cleanup job runs on a scheduled basis to remove older data from the CDC tables so that they don’t get too large. By default, data older than three days is automatically removed from CDC tables by this job.

    we can also validate the settings of your newly configured capture instance using the
    sys.sp_cdc_help_change_data_capture stored procedure:
    EXEC sys.sp_cdc_help_change_data_capture ‘dbo’, ‘EMPLOYEE’        

  • The commit LSN both identifies changes that were committed within the same transaction, and orders those transactions.
  • The column __$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change.
  • The column __$seqval can be used to order more changes that occur in the same transaction.
  • The column __$operation records the operation that is associated with the change:
  • 1 = delete,
  • 2 = insert,
  • 3 = update (before image)
  • 4 = update (after image).
  • The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns.
Before going in to more details  lets look at some CDC functions

cdc.fn_cdc_get_all_changes_capture_instance ( from_lsn , to_lsn , ‘<row_filter_option>’ )

<row_filter_option> ::= { all | all update old }

it is prefreble to use cdc function to query the change tables

DECLARE @from_lsn binary(10), @to_lsn binary(10)
SET @from_lsn =
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_EMPLOYEE (@from_lsn, @to_lsn, 'all');
<row_filter_option> ::= { all | all update old } 

‘all’ is used to filter the result set so that to get all the changes on that table or capture instance but one row for each change

(i.e one row for insert ,one row for delete one row for update)

‘all update old’ is used to filter the result set so that to get all the changes on that table or capture instance with two rows for update
(i.e one containing the values of the captured columns before the update and another containing the values of the captured columns after the update).
another important function
cdc.fn_cdc_get_net_changes_capture_instance ( from_lsn , to_lsn , ‘<row_filter_option>’ )
<row_filter_option> ::= { all | all with mask | all with merge }
You might have confused what is the difference between cdc.fn_cdc_get_net_changes_capture_instance and cdc.fn_cdc_get_all_changes_capture_instance  when to use net changes and all changes
 cdc.fn_cdc_get_net_changes_capture_instance  returns one net change row for each source row changed within the specified LSN range. That is, when a source row has multiple changes during the LSN range, a single row that reflects the final content of the row is returned by the function. For example, if a transaction inserts a row in the source table and a subsequent transaction within the LSN range updates one or more columns in that row, the function returns only one row, which includes the updated column values.
lets say we need to get all the net changes in the source the table during a period of 24 hours
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = GETDATE() -1;
-- DML statements to produce changes in the DBO.Employee table.

-- Obtain the end of the time interval.
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

-- Return the net changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn, @to_lsn, 'all');
a new function we used above is sys.fn_cdc_map_time_to_lsn() ,function returns the LSN when supplied with time and relational operator

sys.fn_cdc_map_time_to_lsn ( ‘<relational_operator>’, tracking_time )

<relational_operator>::= { largest less than | largest less than or equal | smallest greater than | smallest greater than or equal }

please use the below link to know some more  CDC functions

now lets create a  dimension table which servers as a destination table




now lets update some rows and delete some rows  in the dbo.employee table 

delete from dbo.employee where id between 50 and 100
update dbo.employee
set last_name = 'no lastname'
where id between 150 and 200 

now lets use the below query to get the changes in the table after updating and deleting 

DECLARE @begin_time datetime, @end_time datetime,
 @from_lsn binary(10), @to_lsn binary(10);
-- Obtain the beginning of the time interval.
SET @begin_time = DATEADD(HH,-1,GETDATE());
-- DML statements to produce changes in the Db.employee  table.

-- Obtain the end of the time interval.
SET @end_time = GETDATE();
-- Map the time interval to a change data capture query range.
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal'
, @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal'
, @end_time);

-- Return the net changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE(@from_lsn, @to_lsn, 'all');
Now lets create a simple package in SSIS which query’s these functions and updates the dimension table
Use the below query in the OLEDB source  , we can also query the change tables directly
FROM cdc.fn_cdc_get_net_changes_dbo_EMPLOYEE
('smallest greater than or equal'
, sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE()), 'all')

write simple update and delete queries in and map the id to coustemerid column in DBO.DIMEMPLOYEE table
Using CDC with out effecting the source table we were able to log those changes and using simple package we updated the data warehouse environment