How to achieve Initcap functionality in SSIS , and how to use FINDSTRING(),

How to use FINDSTRING() SSIS

FINDSTRING() works as the same as CHARINDEX() in TSQL

it returns the position of character or the string in side the string that we want to search

Syntax : FINDSTRING( “search string”,”searching term”,occurrence)

to better understand we will use a simple example in validating a email address, so for the email to be valid it needs to have at least one @ character

in the email address , if not its considered as not a valid address

As we learned how to use FINDSTRING() now we will use this functionality to get the initcap functionality in SSIS , Initcap is a cool feature in oracle when this function used,

The first letter of each word into uppercase, all other letters in lowercase. Words are delimited by white space or characters that are not alphanumeric

This is how it is used in Oracle example

The following example capitalizes each word in the string:

\\***** PL\SQl****\\\
SELECT INITCAP(‘the soap’) “Capitals” FROM DUAL;

Capitals
———
The Soap
\\*** PL/SQl*****\\

Now we will achieve this in SSIS using FINDSTRING, TRIM, UPPER, and LOWER functions in derived columns

In the derived column we are using

(UPPER(SUBSTRING(TRIM(Name),1,1)) + TRIM(LOWER(SUBSTRING(TRIM(Name),2,FINDSTRING(TRIM(Name),” “,1) – 1 < 0 ? LEN(TRIM(Name)) : FINDSTRING(TRIM(Name)," ",1) – 1)))) + " " + (FINDSTRING(TRIM(Name)," ",1) – 1 < 0 ? " " : UPPER(SUBSTRING(TRIM(SUBSTRING(TRIM(Name),FINDSTRING(TRIM(Name)," ",1),LEN(TRIM(Name)))),1,1)) + TRIM(LOWER(SUBSTRING(TRIM(Name),FINDSTRING(TRIM(Name)," ",1) + 2,LEN(TRIM(Name))))))

The above expression looks big but it is very simple

In this we are taking the first letter of the word and making it upper case , if there is a empty in the string we are considering as a second word and then we are selecting the

first letter of the world to upper case and all other characters are mapped to lower case

How to insert explicit values into the identity columns in SQL server


SET IDENTITY_INSERT [ database. [ owner. ] ] { table } { ON | OFF }




Msg 544, Level 16, State 1, Line 1
Cannot insert explicit value for identity column in table ‘CHECKIDENT’ when IDENTITY_INSERT is set to OFF.

USE [SQLJUNKIESHARE]
GO

CREATE TABLE [dbo].[CHECKIDENT](
[ID] [int] IDENTITY(1,1) NOT NULL,
[date] [date] NULL,
[descp] [varchar](10) NULL
) ON [PRIMARY]

GO

-- How to insert explicit values in identity columns with out altering table

INSERT INTO DBO.CHECKIDENT(ID,descp)

VALUES(1,'CHECK')

-- WHEN SET TO ON WE CAN INSERT EXPLICIT VALUES IN TO THE TABLE

SET IDENTITY_INSERT DBO.CHECKIDENT ON

INSERT INTO DBO.CHECKIDENT(ID,descp)

VALUES(1,'CHECK')

INSERT INTO DBO.CHECKIDENT(ID,descp)

VALUES(1,'CHECK')

SELECT * FROM DBO.CHECKIDENT

SET IDENTITY_INSERT DBO.CHECKIDENT OFF

How find the last inserted identity value or values in the SQL server

CREATE TABLE [dbo].[CHECKIDENT](
[ID] [int] IDENTITY(1,1) NOT NULL,
[date] [date] NULL,
[descp] [varchar](10) NULL
) ON [PRIMARY]

GO

INSERT INTO DBO.CHECKIDENT(DESCP)

VALUES (‘CHECK’)

SELECT @@IDENTITY AS ‘LASTINSERTED IDENTITY VALUE’

INSERT INTO DBO.CHECKIDENT(DESCP)

VALUES (‘CHECK’),
(‘CHECK1’)

SELECT @@IDENTITY AS ‘LASTINSERTED IDENTITY VALUE’

–OR

— find all identity values insereted in the previous satement

declare @t table (ID int)

INSERT INTO DBO.CHECKIDENT(DESCP)

OUTPUT inserted.ID into @t(ID)

VALUES (‘CHECK3’),
(‘CHECK4’)

select * from @t

Set Operators : Except and Intersect -Tsql

SQL Server provides the following set operators. Set operators combine results from two or more queries into a single result set

1) Intersect and Except

2) Union

Today in this post we will learn Intersect and Except set operators

Returns distinct values by comparing the results of two queries.

EXCEPT returns any distinct values from the left query that are not also found on the right query.

INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

The basic rules for combining the result sets of two queries that use EXCEPT or INTERSECT are the following:

  • The number and the order of the columns must be the same in all queries.
  • The data types must be compatible.

 

Lets take a simple example and you will better understand the set operators

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

create table #newproducts(ProductID int, name varchar(20), size varchar(20),price varchar(20))
insert into #newproducts(ProductID,name,size,price)

values(1,’productx’,’small’,20),
(2,’producty’,’small’,40),
(3,’productz’,’large’,50),
(1,’productxy’,’small’,60)
create table #oldproducts(ProductID int, name varchar(20), size varchar(20),price varchar(20))
insert into #oldproducts(ProductID,name,size,price)

values(4,’productq’,’small’,20),
(5,’productk’,’small’,40),
(6,’productz’,’large’,50),
(1,’productxy’,’small’,60),
(1,’productzy’,’small’,80)
select * from #newproducts

select * from #oldproducts
;with oldonly as
(
select * from #oldproducts

except

select * from #newproducts

),

newonly as (select * from #newproducts

except

select * from #oldproducts)
select o.ProductID,o.name as oldname, n.name as newname

from oldonly o inner join newonly n on o.ProductID = n.ProductID

where o.name <> n.name

Understanding and Controlling Parallel query processing – MAXDOP,Cost threshold,Parallel execution plans,Operators

What is parallel query execution in SQL server

Lets go in to the details of parllel execution and how it is done behind the scenes

The default value of max degree of parallelism in SQL Server is 0, which is interpreted as follows: If MAXDOP = 0 is used for a query, either via the default or through explicit use of a MAXDOP query hint, SQL Server can potentially use all the processor cores to parallelize a query up to a maximum of 64. For example:

a)      If MAXDOP is set to 0 on a 32 logical processor machine, the maximum degree of parallelism used is 32 unless another value is explicitly specified.

b)      If MAXDOP is set to 0 on a 256 logical processor machine, the maximum degree of parallelism used is 64 unless another value is explicitly specified. 

In Step 1, if the MAXDOP value is equal to 1 or if the cost of a query plan is lower than the cost threshold of parallelism, the query optimizer generates a serial plan and the query is executed serially. In the case of a parallel plan, if MAXDOP is set to 0 or to a value greater than 1, the SQL Server query execution engine uses the information from the query plan to calculate an expected degree of parallelism value (Step 2). This degree of parallelism calculation is dependent on the plan shape, the availability of system resources, and the query cost, while honoring the specified MAXDOP value if MAXDOP is greater than 1.

In Step 3, the query execution engine in SQL Server calculates the resources required for a query to execute such as memory and the number of threads. In Step 4, SQL Server uses an internal facility called the resource semaphore to reserve the threads and the memory resources that are required for execution. The resource semaphore ensures that queries are able to satisfy the respective resource reservation values while keeping the overall resource usage within the set limits. First, the memory reservation values for a query are checked to make sure the query needs memory. For example: Simple serial queries without ORDER BY or GROUP BY clauses may not need a memory grant, and in such cases the query is dispatched for execution immediately. Next, SQL Server examines the maximum memory limit for a particular query. This value can be configured through the Resource Governor workload group REQUEST_MAX_MEMORY_GRANT_PERCENT value. If the memory grant request can be met (that is, if the memory grant request value is less that the per-query maximum limit) and the required thread resources can be obtained, the query is executed immediately. After a query is dispatched for execution, the degree of parallelism cannot change. 

If there are not enough resources readily available, the query is put into a resource semaphore queue to wait for resources to become available. The time-out value associated with a query waiting in the resource semaphore queue is proportional to the query cost. The resource semaphore uses an internal query cost estimate to prioritize memory grants and decide which query to run next when memory is available. While the query is waiting in the resource semaphore queue, if the query times out, an error is returned (8657 or 8645). If the query is picked as a candidate for execution from the resource semaphore queue but cannot obtain the required resources, SQL Server reduces the estimated degree of parallelism (DOP) by half (Step 5), and then it recomputes the memory and thread resources for the query (Step 3). SQL Server may ultimately downgrade the parallel plan to a serial plan. Downgrading to a serial plan causes removal of parallelism operators from the plan prior to execution.

Lets do some tests with max dop

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now lets disable the MAX DOP and see the diffrence

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

when using the parallelism in the execution another important is the number of threads

In SQL Server, the max worker threads sp_configure configuration value determines the maximum number of worker threads created for handling all simultaneous SQL Server requests. The default value of max worker threads in SQL Server 2008 is 0. If the default value (0) is used for max worker threads, SQL Server automatically configures the number of worker threads at startup. In this case, the value depends on the machine architecture (32-bit vs. 64-bit) and on the number of logical processors.

Depending on the degree of parallelism value and the different possibilities of data flow and execution order in a parallel query plan, the number of threads calculated by the query execution engine for running a query is estimated

Estimated number of threads = MAXDOP * Number of parallel data flow tasks

  so in the above case     =  2 * 1  + 1  (cordinator thread) = 3

as we know every coin has two sides somtimes parallelism will not improve performance it may degrade when compared to the serial execution

lets lists the problems

1) Insufficiant memory during execution of parallel query plans

Memory requred for exchange operator and memory buffers in parallel execution is proportional to (DOP)² , so memory grant request for parallel execution can be large

2) Synchronizing overhead of parallel operators

in case of exchange operators (exchange row data between threads) synchronization is required between producer and consumer threads as they are accessing same exchange buffers  

CXPACKET lock is used for exclusive acces of buffers for  requesting producer and consumer threads

The synchronization between producer and consumer threads seems to perform well. However, there are cases (for example, when either the producer or consumer thread is waiting for a resource such as I/O, CPU, or memory grants) in which excessive CXPACKET waits can occur. Other possible causes for high CXPACKET waits are underestimation of the number of rows flowing through an operator and highly skewed data

3) Statistical estimation errors

4) Data skew errors

Reference(MSDN Parallel query processing )

Recommendations

Start with the default value of MAXDOP and reduce it only if you identify performance problems. A good approach is to progressively test with powers of two values. If you only have one or a few concurrent queries running, the default for MAXDOP (0) will often be optimal. Following are some techniques you can use to understand problems related to parallel processing in SQL Server and tips for mitigating them.

(1)    If you observe excessive CXPACKET waits, reduce the value of MAXDOP:

  • ·         In an OLTP system, use a MAXDOP value of 1 (serial execution), if that is sufficient. Consider explicitly setting it to a higher value for individual queries that require parallelism.
  • ·         In a data warehousing system, where you need high parallelism for better overall response time, setting MAXDOP between ¼ the number of logical processors and ½ the number of logical processors generally works well. Experiment in preproduction environments to decide the MAXDOP value that gives you the best combination of throughput and response time for your environment.

(2)    Higher degree of parallelism queries generally require more memory to run. If several concurrently running complex queries each execute in parallel with a degree of parallelism that is greater than 1, the memory requirement may be significant. Consider using Resource Governor to throttle the degree of parallelism and total number of parallel queries by workload using the guidelines in [8].

(3)    If there is a lack of worker threads:

  • ·         Reduce MAXDOP.
  • ·         Use Resource Governor to throttle the degree of parallelism and total number of parallel queries by workload using the guidelines in [8].

(4)    If you are observing what appears to be a suboptimal query plan due to cardinality estimation errors, consider updating table statistics using the guidelines in [4].

(5)    Consider rewriting some queries so that they perform well in parallel environments using the guidelines in [2].

(6)    Remember that parallel query performance is multifaceted; there is no silver-bullet solution that can solve all parallel performance issues.