What is parallel query execution in SQL server
Lets go in to the details of parllel execution and how it is done behind the scenes
The default value of max degree of parallelism in SQL Server is 0, which is interpreted as follows: If MAXDOP = 0 is used for a query, either via the default or through explicit use of a MAXDOP query hint, SQL Server can potentially use all the processor cores to parallelize a query up to a maximum of 64. For example:
a) If MAXDOP is set to 0 on a 32 logical processor machine, the maximum degree of parallelism used is 32 unless another value is explicitly specified.
b) If MAXDOP is set to 0 on a 256 logical processor machine, the maximum degree of parallelism used is 64 unless another value is explicitly specified.
In Step 1, if the MAXDOP value is equal to 1 or if the cost of a query plan is lower than the cost threshold of parallelism, the query optimizer generates a serial plan and the query is executed serially. In the case of a parallel plan, if MAXDOP is set to 0 or to a value greater than 1, the SQL Server query execution engine uses the information from the query plan to calculate an expected degree of parallelism value (Step 2). This degree of parallelism calculation is dependent on the plan shape, the availability of system resources, and the query cost, while honoring the specified MAXDOP value if MAXDOP is greater than 1.
In Step 3, the query execution engine in SQL Server calculates the resources required for a query to execute such as memory and the number of threads. In Step 4, SQL Server uses an internal facility called the resource semaphore to reserve the threads and the memory resources that are required for execution. The resource semaphore ensures that queries are able to satisfy the respective resource reservation values while keeping the overall resource usage within the set limits. First, the memory reservation values for a query are checked to make sure the query needs memory. For example: Simple serial queries without ORDER BY or GROUP BY clauses may not need a memory grant, and in such cases the query is dispatched for execution immediately. Next, SQL Server examines the maximum memory limit for a particular query. This value can be configured through the Resource Governor workload group REQUEST_MAX_MEMORY_GRANT_PERCENT value. If the memory grant request can be met (that is, if the memory grant request value is less that the per-query maximum limit) and the required thread resources can be obtained, the query is executed immediately. After a query is dispatched for execution, the degree of parallelism cannot change.
If there are not enough resources readily available, the query is put into a resource semaphore queue to wait for resources to become available. The time-out value associated with a query waiting in the resource semaphore queue is proportional to the query cost. The resource semaphore uses an internal query cost estimate to prioritize memory grants and decide which query to run next when memory is available. While the query is waiting in the resource semaphore queue, if the query times out, an error is returned (8657 or 8645). If the query is picked as a candidate for execution from the resource semaphore queue but cannot obtain the required resources, SQL Server reduces the estimated degree of parallelism (DOP) by half (Step 5), and then it recomputes the memory and thread resources for the query (Step 3). SQL Server may ultimately downgrade the parallel plan to a serial plan. Downgrading to a serial plan causes removal of parallelism operators from the plan prior to execution.
Lets do some tests with max dop
Now lets disable the MAX DOP and see the diffrence
when using the parallelism in the execution another important is the number of threads
In SQL Server, the max worker threads sp_configure configuration value determines the maximum number of worker threads created for handling all simultaneous SQL Server requests. The default value of max worker threads in SQL Server 2008 is 0. If the default value (0) is used for max worker threads, SQL Server automatically configures the number of worker threads at startup. In this case, the value depends on the machine architecture (32-bit vs. 64-bit) and on the number of logical processors.
Depending on the degree of parallelism value and the different possibilities of data flow and execution order in a parallel query plan, the number of threads calculated by the query execution engine for running a query is estimated
Estimated number of threads = MAXDOP * Number of parallel data flow tasks
so in the above case = 2 * 1 + 1 (cordinator thread) = 3
as we know every coin has two sides somtimes parallelism will not improve performance it may degrade when compared to the serial execution
lets lists the problems
1) Insufficiant memory during execution of parallel query plans
Memory requred for exchange operator and memory buffers in parallel execution is proportional to (DOP)² , so memory grant request for parallel execution can be large
2) Synchronizing overhead of parallel operators
in case of exchange operators (exchange row data between threads) synchronization is required between producer and consumer threads as they are accessing same exchange buffers
CXPACKET lock is used for exclusive acces of buffers for requesting producer and consumer threads
The synchronization between producer and consumer threads seems to perform well. However, there are cases (for example, when either the producer or consumer thread is waiting for a resource such as I/O, CPU, or memory grants) in which excessive CXPACKET waits can occur. Other possible causes for high CXPACKET waits are underestimation of the number of rows flowing through an operator and highly skewed data
3) Statistical estimation errors
4) Data skew errors
Reference(MSDN Parallel query processing )
Start with the default value of MAXDOP and reduce it only if you identify performance problems. A good approach is to progressively test with powers of two values. If you only have one or a few concurrent queries running, the default for MAXDOP (0) will often be optimal. Following are some techniques you can use to understand problems related to parallel processing in SQL Server and tips for mitigating them.
(1) If you observe excessive CXPACKET waits, reduce the value of MAXDOP:
- · In an OLTP system, use a MAXDOP value of 1 (serial execution), if that is sufficient. Consider explicitly setting it to a higher value for individual queries that require parallelism.
- · In a data warehousing system, where you need high parallelism for better overall response time, setting MAXDOP between ¼ the number of logical processors and ½ the number of logical processors generally works well. Experiment in preproduction environments to decide the MAXDOP value that gives you the best combination of throughput and response time for your environment.
(2) Higher degree of parallelism queries generally require more memory to run. If several concurrently running complex queries each execute in parallel with a degree of parallelism that is greater than 1, the memory requirement may be significant. Consider using Resource Governor to throttle the degree of parallelism and total number of parallel queries by workload using the guidelines in .
(3) If there is a lack of worker threads:
- · Reduce MAXDOP.
- · Use Resource Governor to throttle the degree of parallelism and total number of parallel queries by workload using the guidelines in .
(4) If you are observing what appears to be a suboptimal query plan due to cardinality estimation errors, consider updating table statistics using the guidelines in .
(5) Consider rewriting some queries so that they perform well in parallel environments using the guidelines in .
(6) Remember that parallel query performance is multifaceted; there is no silver-bullet solution that can solve all parallel performance issues.