Wait stats I/O_COMPLETION

This wait occurs while a task is waiting for I/O operations to complete that are related to non data page I/O’s

 

 

 

 

 

 

 

 

 

 

There are several reasons why this wait type can be accumulated over time

Now we have a wait time we need to investigate why this is happening in order to do that we need to go through the perfmon physical disk counters

because most of the this wait type can be accumulated due to improper I/O sub system design , and improper system design

Disk Performance Counters

The Disk Performance counters help you to evaluate the performance of the disk subsystem. The disk subsystem is more than the disk itself. It will include to disk controller card, the I/O bus of the system, and the disk. When measuring disk performance it is usually better to have a good baseline for performance than simply to try and evaluate the disk performance on a case by case basis.

There are two objects for the disk—PhysicalDisk and LogicalDisk. The counters for the two are identical. However, in some cases they may lead to slightly different conclusions. The PhysicalDisk object is used for the analysis of the overall disk, despite the partitions that may be on the disk. When evaluating overall disk performance this would be the one to select. The LogicalDisk object analyzes information for a single partition. Thus the values will be isolated to activity that is particularly occurring on a single partition and not necessarily representative of the entire load that the disk is burdened with. The LogicalDisk object is useful primarily when looking at the affects or a particular application, like SQL Server, on the disk performance. Again the PhysicalDisk is primarily for looking at the performance of the entire disk subsystem. In the list that follows, the favored object is indicated with the counter. When the LogicalDisk and PhysicalDisk objects are especially different, the counter will be listed twice and the difference specifically mentioned.

  • PhysicalDisk : Current Disk Queue Length. This counter provides a primary measure of disk congestion. Just as the processor queue was an indication of waiting threads, the disk queue is an indication of the number of transactions that are waiting to be processed. Recall that the queue is an important measure for services that operate on a transaction basis. Just like the line at the supermarket, the queue will be representative of not only the number of transactions, but also the length and frequency of each transaction.
  • PhysicalDisk : % Disk Time. Much like % Processor time, this counter is a general mark of how busy the disk is. You will see many similarities between the disk and processor since they are both transaction-based services. This counter indicates a disk problem, but must be observed in conjunction with the Current Disk Queue Length counter to be truly informative. Recall also that the disk could be a bottleneck prior to the % Disk Time reaching 100%.
  • PhysicalDisk : Avg. Disk Queue Length. This counter is actually strongly related to the %Disk Time counter. This counter converts the %Disk Time to a decimal value and displays it. This counter will be needed in times when the disk configuration employs multiple controllers for multiple physical disks. In these cases, the overall performance of the disk I/O system, which consists of two controllers, could exceed that of an individual disk. Thus, if you were looking at the %Disk Time counter, you would only see a value of 100%, which wouldn’t represent the total potential of the entire system, but only that it had reached the potential of a single disk on a single controller. The real value may be 120% which the Avg. Disk Queue Length counter would display as 1.2.
  • PhysicalDisk : Disk Reads/sec. This counter is used to compare to the Memory: Page Inputs/sec counter. You need to compare the two counters to determine how much of the Disk Reads are actually attributed to satisfying page faults.
  • LogicalDisk : Disk Reads/sec. When observing an individual application (rather a partition) this counter will be an indication of how often the applications on the partition are reading from the disk. This will provide you with a more exact measure of the contribution of the various processes on the partition that are affecting the disk.
  • PhysicalDisk : Disk Reads Bytes/sec. Primarily, you’ll use this counter to describe the performance of disk throughput for the disk subsystem. Remember that you are generally measuring the capability of the entire disk hardware subsystem to respond to requests for information.
  • LogicalDisk : Disk Reads Bytes/sec. For the partition, this will be an indication of the rate that data is being transferred. This will be an indication of what type of activity the partition is experiencing. A smaller value will indicate more random reads of smaller sections.
  • PhysicalDisk : Avg. Disk Bytes/Read. This counter is used primarily to let you know the average number of bytes transferred per read of the disk system. This helps distinguish between random reads of the disk and the more efficient sequential file reads. A smaller value generally indicates random reads. The value for this counter can also be an indicator of file fragmentation.
  • PhysicalDisk : Avg. Disk sec/Read. The value for this counter is generally the number of seconds it takes to do each read. On less-complex disk subsystems involving controllers that do not have intelligent management of the I/O, this value is a multiple of the disk’s rotation per minute. This does not negate the rule that the entire system is being observed. The rotational speed of the hard drive will be the predominant factor in the value with the delays imposed by the controller card and support bus system.
  • PhysicalDisk: Disk Reads/sec. The value for this counter is the number of reads that the disk was able to accomplish per second. Changes in this value indicate the amount of random access to the disk. The disk is a mechanical device that is capable of only so much activity. When files are closer together, the disk is permitted to get to the files quicker than if the files are spread throughout the disk. In addition, disk fragmentation can contribute to an increased value here.

physical disk counters in context to the SQL server

PhysicalDisk\Avg. Disk Queue Length

Threshold: Should not be higher than the number of spindles plus two.

Significance: This counter indicates the average number of both read and writes requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk Read Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of read requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk Write Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of write requests that were queued for the selected disk during the sample interval.

  • PhysicalDisk\Avg. Disk sec/Read

Threshold: No specific value.

Less than 10 ms – very good

Between 10-20 ms – okay

Between 20-50 ms – slow, needs attention

Greater than 50 ms – Serious I/O bottleneck

Significance: This counter indicates the average time, in seconds, of a read of data from the disk.

  • PhysicalDisk\Avg. Disk sec/Transfer

Threshold: Should not be more than 18 milliseconds.

Significance: This counter indicates the time, in seconds, of the average disk transfer. This may indicate a large amount of disk fragmentation, slow disks, or disk failures. Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, so you need more RAM.

  • PhysicalDisk\Disk Writes/sec

Threshold: Depends on manufacturer’s specification.

Significance: This counter indicates the rate of write operations on the disk.

  • ·         Physical Disk: %Disk Time

Threshold:  Greater than 50 percent, it represents an I/O bottleneck

Significance: Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.

  • ·         Physical Disk\Avg. Disk Reads/Sec  and Physical Disk\Avg. Disk Writes/Sec

Threshold: It should be less than 85% of the disk capacity

Significance: It represents the rate of read operations on the disk.

When using above counters, you may need to adjust the values for RAID configurations using the following formulas.

Raid 0 — I/Os per disk = (reads + writes) / number of disks

Raid 1 — I/Os per disk = [reads + (2 * writes)] / 2

Raid 5 — I/Os per disk = [reads + (4 * writes)] / number of disks

Raid 10 — I/Os per disk = [reads + (2 * writes)] / number of disks

For example, you have a RAID-1 system with two physical disks with the following values of the counters.

Disk Reads/sec            80

Disk Writes/sec           70

Avg. Disk Queue Length    5

In that case, you are encountering (80 + (2 * 70))/2 = 110 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck.

Disk I/O issues can be minimized by having larger and sequential I/O activities.    To achieve than you should consider partitioning.    Partitioning can be done in many ways – Database partitioning, Tempdb partitioning, Table partitioning, index partitioning.    At a minimum, it is highly recommended to partition your database in such a way that clustered index, non-clustered index, and transaction logs are each on a separate physical drive and spread across multiple logical filegroups/files within that drive.

Physical Disk: % Disk Time

This counter measures how busy a physical array is (not a logical partition or individual disks in an array). It provides a good relative measure of how busy your arrays are.
As a rule of thumb, the % Disk Time counter should run less than 55%. If this counter exceeds 55% for continuous periods (over 10 minutes or so during your 24 hour monitoring period), then your SQL Server may be experiencing an I/O bottleneck. If you see this behavior only occasionally in your 24 hour monitoring period, I wouldn’t worry too much, but if it happens often (say, several times an hour), then I would start looking into finding ways to increase the I/O performance on the server, or to reduce the load on the server. Some ways to boost disk I/O include adding drives to an array (if you can), getting faster drives, adding cache memory to the controller card (if you can), using a different version of RAID, or getting a faster controller.

Refer :

http://sqlblog.com/blogs/joe_chang/archive/2008/03/04/storage-performance-for-sql-server.aspx

http://www.sql-server-performance.com/2005/hardware-bottlenecks/

HBA queue depth

http://blogs.msdn.com/b/joesack/archive/2009/01/28/sql-server-and-hba-queue-depth-mashup.aspx

perfmon counters

http://blogs.technet.com/b/vipulshah/archive/2006/11/30/understanding-perfmon-counters-while-troubleshooting-sql-server-performance-issues.aspx

predeployment best practices

http://technet.microsoft.com/en-us/library/cc966412.aspx

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s