Data Cube: A Relational Aggregation Operator : Microsoft research paper

Very nice article on group by clause , how it works and what are its limitations and how can we overcome with a CUBE operator

Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or 1-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube, or simply cube. The cube operator generalizes the histogram, cross-tabulation, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points form an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an “infinite value”, ALL. For example, the point would represent the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation

Source :

http://research.microsoft.com/apps/pubs/default.aspx?id=69578

 

Download PDF file by clicking below link

MIcrosoft cube research

 

How to summarize your data using ROLLUP and CUBE in TSQL

We all know when reporiting any financial or sales information adding sub totals and grand totals to report makes it more effective but most offten develop0rs dont realize TSQL provides a effective way in doing this is using rollup and cube operators in group by clause.

So for this demo I am using Adventure works 2008 OLTP sample database

Three tables

  1. Production.Product
  2. Production.ProductCategory
  3. Production.ProductSubcategory

here is a simple querying joining three tables and grouping by product category and subcategory

select

pc.ProductCategoryID,

s.ProductSubcategoryID,

AVG(ListPrice)as Avglistprice,

MAX (ListPrice)as MAxlistprice,

MAX(StandardCost)as Maxstandardcost

from Production.Product p

inner join Production.ProductSubcategory s

on  p.ProductSubcategoryID = s.ProductSubcategoryID

inner join Production.ProductCategory pc

on  pc.ProductCategoryID = s.ProductCategoryID

group by pc.ProductCategoryID,s.ProductSubcategoryID

SAMPLE RESULT

You can see we are missing some things in the above result i.e  subtotals and grand totals

change group by cluase to include rollup operator like this

group by ROLLUP(pc.ProductCategoryID,s.ProductSubcategoryID)

trying running it with out errors

Result looks like this

CUBE is used as same way but it gives us a little different result , CUBE summarizes all the combinations of  columns in group by clause and it also gives us grandtotal which does not make sense for above example because each sub category exists once in each category.

you will ‘%_[^LIKE]%’ like

Like cluase allows you to match a character string found in a column to a specified pattern in the where clause

lets take an example to better understand its wild cards

How to use like in where clause

DECLARE@TTABLE (charcvarchar(20))

INSERTINTO@T(charc)

VALUES(‘Cat’)

,(‘hat’)

,(‘that’)

,(‘at’)

,(‘c’)

,(‘a’)

 

 

Percent % :

Underscore _ :

Square Brackets [] :

Caret:

 

 

 

 

 

 

 

watch out for nulls .. know ur environment

ANSI_NULLS is a user option in sql server , which is like a env variable when a user establishes a connection to sqlserver

ANSI_NULLS specifies the behavior of nulls when using equality and inequality comparision against null

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

as ANSI_NULLS is turned on , any query having comparing nulls will result in unknown result

Once we turn it off we will be able to query

 

 

 

Worst enemy to a developer (nolock) , Why shouldn’t you use

As I said before this is the exact same session that I learned from Randy Knight , when I saw his presentation I’ve decide not to use any where unless required especially in fianancial and critical decsion making reports , here we will not be going in to details of different isolation levels but we will be specifically focusing on  readuncommited isolation level or using nolock as tablehint , dirty reads and pagesplits

My first time at SQL saturday (#129 Rochester)

Hi,

Unfortunately I didn’t had enough time to update my experience at Sql Saturday #129 Rochester but here it is

First as I live in Pennsylvania so it’s a 3 hour journey to the north I packed my bags and reserved a hotel near Rochester institute of Technology that’s where SQL Saturday was held

I was very excited, normally I like travelling so anyway , I started here on Friday evening and reached there around 11 O’clock and had a good sleep and woke up around 7 O clock got my things together took a snapshot of the map and was at RIT by 8 30 , registration supposed to close around 8 45 , as soon as I got out of parking I met a guy who said he was also heading to SQL Saturday and this was his first time too and then we went inside took our badges and schedule, as I already made my schedule using schedule manager in sql saturday , so I pretty much knew what I am gona listen on that day , atmosphere was good and there were quite a few vendors for that event like redgate, confio…. for marketing there DBMS tools and other stuff where we can have a chance to get some freebies like eBooks, and books

 

Here  is my schedule for that day

What to look for Execution plans

Grant Fitchey

http://www.scarydba.com

also a author of a book called SQL Server Execution plans , free eBook available to download on redgate

My first session , I was very excited because I was reading his book for a while (First chapter) lol, any way his presentation was pretty bold concentrating more on the first operator (Select insert update delete) and important properties of first operator

Compile time

Compile CPU

Cache plan size

Estimated number of rows

Estimated Subtree cost

Reason for early termination

Optimization Level

Parameters

and we also looked at why seek or scan , and how to suppress key lookup , on the whole pretty good learned some new things

Next session

Top Tips writing better TSQL queries (beginer level)

How to format tsql for better readability

Naming standards (procedures(get update delete), naming variables)

Error handling

Unnecessary Explicit data conversion

Improper use of functions, query hints

On the whole pretty good topic to discuss , I’ve added a link to download his presentation

Top Tips for Better TSQL

Getting started with MDX

Willima E. Pearson III

Believe me most toughest thing to understand in SQL BI stack is MDX (Multi dimensional Expression language), its was the best presentation I’ve seen on the MDXeasy to understand for beginners , and we went through some beginner stuff like Dimensions, facts, members , SET, TUPLE . I havent had a chance to email him for asking his presentation once I get it, I will upload here so hold on and other two sessions you can see is from Randy Knight , well known sql guru I will post his session information in an another post because those are worth spending more time.

so that’s all from the sessions , at the end of the day they do this raffle drawing and I did not won anything and we finished our sql saturday there, and I made some new friends we went to movie it was fun , learned lot of stuff and am looking forward to attend the next sql saturday at Philadelphia soon.

Thank you

Implementing Error handling raising User Defined Errors ( part 3):

In the last post we seen how to manage UDF errors (Add , Update , delete)  UDF error messages

in SYS.MESSAGES catalog. In this post we will look in-detail in to the Use RAISERROR function

in SQL Server to raise UDF errors. When I decided to write this, I then remembered that I written

a post for SQL server 2o12 Throw statement which will explain the differences of  raiserror and throw

, disadvantages of raiserrors and how to implement raiserror and throw, and how to raise UDF

errors , so have look at this post

http://sqljunkieshare.com/2011/12/10/throw-statement-tsql-2012/

 

but I will go through  some aspects of Raiserror that I didn’t cover  in the above post

lets start with syntax of raiseerror

RAISERROR ( { msg_id | msg_str | @local_variable }
{,severity,state }
[,argument [,…n ] ] )
[ WITH option [,…n ] g

 

We talked about Msg_id in the earlier post, we will look at the msg_str

msg_str
Is a user-defined message with formatting similar to the printf function in the C standard library. The error message can have a maximum of 2,047 characters. If the message contains 2,048 or more characters, only the first 2,044 are displayed and an ellipsis is added to indicate that the message has been truncated.

 

msg_str is a string of characters with optional embedded conversion specifications. Each conversion specification defines how a value in the argument list is formatted and placed into a field at the location of the conversion specification in msg_str. Conversion specifications have this format:

% [[flag] [width] [. precision] [{h | l}]] type

flag

Is a code that determines the spacing and justification of the substituted value.

Code Prefix or justification Description
- (minus) Left-justified Left-justify the argument value within the given field width.
+ (plus) Sign prefix Preface the argument value with a plus (+) or minus (-) if the value is of a signed type.
0 (zero) Zero padding Preface the output with zeros until the minimum width is reached. When 0 and the minus sign (-) appear, 0 is ignored.
# (number) 0x prefix for hexadecimal type of x or X When used with the o, x, or X format, the number sign (#) flag prefaces any nonzero value with 0, 0x, or 0X, respectively. When d, i, or u are prefaced by the number sign (#) flag, the flag is ignored.
‘ ‘ (blank) Space padding Preface the output value with blank spaces if the value is signed and positive. This is ignored when included with the plus sign (+) flag.

width

Is an integer that defines the minimum width for the field into which the argument value is placed. If the length of the argument value is equal to or longer than width, the value is printed with no padding. If the value is shorter than width, the value is padded to the length specified in width.

An asterisk (*) means that the width is specified by the associated argument in the argument list, which must be an integer value.

precision
Is the maximum number of characters taken from the argument value for string values. For example, if a string has five characters and precision is 3, only the first three characters of the string value are used.

For integer values, precision is the minimum number of digits printed.

An asterisk (*) means that the precision is specified by the associated argument in the argument list, which must be an integer value.

{h | l} type

Is used with character types d, i, o, s, x, X, or u, and creates shortint (h) or longint (l) values.

Type specification Represents
d or i Signed integer
o Unsigned octal
s String
u Unsigned integer
x or X Unsigned hexadecimal

 

option
Is a custom option for the error and can be one of the values in the following table.

Value Description
LOG Logs the error in the error log and the application log for the instance of the Microsoft SQL Server Database Engine. Errors logged in the error log are currently limited to a maximum of 440 bytes. Only a member of the sysadmin fixed server role or a user with ALTER TRACE permissions can specify WITH LOG.
NOWAIT Sends messages immediately to the client.
SETERROR Sets the @@ERROR and ERROR_NUMBER values to msg_id or 50000, regardless of the severity level.

 

 

Note:

Raiserror will not work if it is used

  • Outside the scope of any TRY block.
  • With a severity of 10 or lower in a TRY block.
  • With a severity of 20 or higher that terminates the database connection.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Code:

DECLARE @LOGINNAME VARCHAR(50) = ‘testuser’
DECLARE @SERVER VARCHAR(50) = ‘testserver’
–SELECT @LOGINNAME = ORIGINAL_LOGIN()
–A custom error message using arguments
RAISERROR (‘This is a custom error message.
Login: %s,
Language: %s,
SPID: %u,
Server Name: %s’, 5,1,
@LOGINNAME, @@LANGUAGE, @@SPID, @SERVER) WITH LOG

Managing User-Defined Errors – Part 2

in the earlier post we learned about error messages in SQL Server

http://sqljunkieshare.com/2012/03/22/implimenting-error-handling-in-sql-server-day-1/

In this post we will look at how to Implement and Manage User Defined errors

as we learned in the first post that it is very important for any organization to Implement User defined error messages in any business logic  or for any front end application as some times user may not understand the cryptic system error messages

We can solve this problem by adding and managing our own error messages to the SYS.MESSAGES Catalog using a

system stored procedures  SP_ADDMESSAGE, SP_ALTERMESSAGE, SP_DROPMESSAGE

To add User defined error message to SYS.MESSAGES , we use the SP_ADDMESSAGE

sp_addmessage [ @msgnum = ] msg_id

,[ @severity = ] severity

,[ @msgtext = ] ‘msg’
[, [ @lang = ] ‘language’ ]
[, [ @with_log = ] { ‘TRUE’ | ‘FALSE’ } ]
[, [ @replace = ] ‘replace’ ]

SP_ADDMESSAGE has six input parameters

@msgnum is the Message_id (int) column in SYS.MESSAGES , @msgnum can be any number between 50,001 to 2,147,483,647

1 to 50,000 are reserved for system error messages

@severity is the level of severity scope of the error message , 1 to 25

@msgtxt is a NVARCHAR(255)  text column and it supports arguments , we will talk about arguments later in this post

the above three parameters are mandatory to add a message

Message_id and langid in the sys.messages form a composite key , so there can be entries in SYS.MESSAFES

with same messageid’s in different langauges

@lang is the type of language the message will be in , sys.syslanguages has the list of languages the sql server supports , SQL Server supports 32

different languages.

@with_log this parameter accepts either ‘TRUE’ or ‘FALSE’

For a user to specify this parameter user should be a member of SYSADMIN role

There two logs that we will deal with is one windows application log and the second one is database engine log

When this parameter is set to true the error is written to windows application log and database engine log

When this parameter is set to false the error is written to only database engine log

@replace  is used to alter the existing messages

we can Severity , Message text and With log option of error messages

SP_ALTERMESSAGE

CODE:

– To add a UDF error message to sys.messages
–@Msgnum : range 50,001 to 2,147,483,647
–@Severity: 1 to 25
–@Msgtext : NVARCHAR(255) , accpets arguments
–[optional]
–@with_log : TRUE/FALSE
–@lang     : by default it’s the sessions default lanuguage or language in sys.syslangusges
–@replace  : is used to alter the messages, we can alter Severity,Message text,With_log
–columns

sp_addmessage @msgnum =70000, @severity = 24, @msgtext = ‘test error log’

– Using @replace option we can update Severity
– ,with_log and message text

sp_addmessage @msgnum = 70000, @severity = 20
, @msgtext = ‘updated error log’,@with_log =’true’, @replace =’replace’

Select * from sys.messages where message_id = 70000
– SP_ALTERMESSAGE is used only to update the WITH_lOG option
– Access to only members of SYSADMIN roles

sp_altermessage 70000,’WITH_LOG’,'FALSE’
Select * from sys.messages where message_id = 70000

sp_dropmessage @msgnum = 70000 , @lang = ‘us_english’

–@lang is optional , if a message with different languages should be
– droped then ALL is specified in the lang parameter

Implimenting Error Handling in SQL Server – Part 1

There are two types of Messages SQL Server returns

  1. Information Message
  2. Error Message

An informational Message as the name suggests SQL server sends the messages to the End user or client,

this can be anything including the output of DBCC commands or status of the server or it can also be a

output from the print command An Error message can be a warning message or error message from

SYS.MESSAGES catalog or from a RAISEEROR statement

As we go ahead we will be concentrating mostly on Error Messages in SQL server

There two types of Error Messages in SQL server

  1. System
  2. User Defined

if there is any exception during the execution,

The database engine picks up the information in the SYS.MESSAGES and sends it to the end user.

As there is need to customize the error messages so that errors are better understood at business

level these are called user defined error.

All the  System error messages and User defined error messages are stored in SYS.MESSAGES

SQL Server stores system and user-defined errors inside a catalog view called sys.messages.

This catalog contains all system error messages, which are available in many languages.

 

Each message has five properties arranged in columns:

■■ message_id:

This column stores the ID of a message and its value.  Together with language_id, this is unique across the instance. Messages with
IDs less than 50000 are system error messages.
■■ language_id:

This column indicates the language used in the message text, for example English or Spanish, as defined in the name column of the
sys.syslanguages system table. This is unique for a specified message_id.
■■ severity:

This column contains the severity level of the message. Whe you are creating errors, keep in mind that the severity must be the same for
all message languages within the same message_id.
■■ is_event_logged:

When you want to log the event when an error israised, set the value of this column equal to1. Like severity, this column must
be the same for all message languages within the same message_id.

■■ text: 

This column stores the text of the message. This text is written in the language indicated by the language_id property.

Severity Levels :  This indicates the severity of the error and this will help us the scope of the problem and how quick you need to respond

these levels are categorized into three main levels

  1.   Information Messages:  Level (0 to 10 )

Errors in this group are purely informational and are not severe. Status information or error reports are returned with a specific error code. SQL Server doesn’t use levels 1–9 these levels are available for user-defined errors.

2.     User Errors Level (11- 16)

These do not effect service or System or Server, most of these are corrected by the user and these are errors are raised when database Engine parses the query or Tsql statement for execution

3.  Software Errors (Levels 17–19):

Errors in this group are severe and require system administrator attention as well as yours. They are related to
problems in the Database Engine service and can’t be solved by users.

4.System Errors (Levels 20–25):

Errors in this group are critical since they indicate system problems. They are fatal errors that force the end of
the statement or batch in the Database Engine. Errors in this group record information about what occurred and then terminates. You must be aware that these errors can close the application connection to the instance of SQL Server. Error messages in this severity level group are written to the error log.

Mr  Pinal Dave had explained it in detail take a look at this post

http://blog.sqlauthority.com/2007/04/25/sql-server-error-messages-sysmessages-error-severity-level/

More detailed explanation from BOL

Severity Levels 0 through 19

Error messages with a severity level of 10 are informational. Error messages with severity levels from 11 through 16 are generated by user and can be corrected by the user. Severity levels from 17 and 18 are generated by resource or system errors; the user’s session is not interrupted.

Using sp_addmessage, user-defined messages with severities from 1 through 25 can be added to sysmessages. Only the system administrator can add messages with severities from 19 through 25.

Error messages with severity levels 17 and higher should be reported to the system administrator.

Severity Level 10: Status Information

Severity level 10 is an informational message and indicates a problem caused by mistakes in the information you have entered. Severity level 10 is not visible in SQL Server 7.0.

Severity Levels 11 through 16

These messages indicate errors that can be corrected by the user.

Severity Level 17: Insufficient Resources

These messages indicate that the statement caused SQL Server to run out of resources (such as locks or disk space for the database) or to exceed some limit set by the system administrator.

Severity Level 18: Nonfatal Internal Error Detected

These messages indicate that there is some type of internal software problem, but the statement finishes, and the connection to SQL Server is maintained. For example, a level 18 message occurs when the SQL Server query processor detects an internal error during query optimization.The system administrator should be informed every time a severity level 18 message occurs.

Severity Level 19: SQL Server Error in Resource

These messages indicate that some nonconfigurable internal limit has been exceeded and the current batch process is terminated. Level 19 errors occur rarely; however, they must be corrected by the system administrator or your primary support provider.The administrator should be informed every time a severity level 19 message occurs.

Severity Levels 20 through 25

Severity levels from 20 through 25 indicate system problems. These are fatal errors, which means that the process (the program code that accomplishes the task specified in your statement) is no longer running. The process freezes before it stops, records information about what occurred, and then terminates. The client connection to SQL Server closes, and depending on the problem, the client might not be able to reconnect.

Error messages with a severity level of 19 or higher stop the current batch. Errors messages with a severity level of 20 or higher are considered fatal errors and terminate the client connection. Errors messages in this range may affect all of the processes in the database, and may indicate that a database or object is damaged. Error messages with a severity level from 19 through 25 are written to the error log.

Severity Level 20: SQL Server Fatal Error in Current Process

These messages indicate that a statement has encountered a problem. Because the problem has affected only the current process, it is unlikely that the database itself has been damaged.

Severity Level 21: SQL Server Fatal Error in Database (dbid) Processes

These messages indicate that you have encountered a problem that affects all processes in the current database. However, it is unlikely that the database itself has been damaged.

Severity Level 22: SQL Server Fatal Error Table Integrity Suspect

These messages indicate that the table or index specified in the message has been damaged by a software or hardware problem.Level 22 errors occur rarely; however, if you should encounter one, run DBCC CHECKDB to determine if other objects in the database are also damaged. It is possible that the problem is in the cache only and not on the disk itself. If so, restarting SQL Server corrects the problem. To continue working, you must reconnect to SQL Server. Otherwise, use DBCC to repair the problem. In some cases, it may be necessary to restore the database.If restarting does not help, the problem is on the disk. Sometimes it can be solved by destroying the object specified in the error message. For example, if the message tells you that SQL Server has found a row with a length of 0 in a nonclustered index, delete the index and rebuild it.

Severity Level 23: SQL Server Fatal Error: Database Integrity Suspect

These messages indicate that the integrity of the entire database is suspect due to damage caused by a hardware or software problem.Level 23 errors occur rarely; however, if you should encounter one, run DBCC CHECKDB to determine the extent of the damage. It is possible that the problem is in the cache only and not on the disk itself. If so, restarting SQL Server corrects the problem. To continue working, you must reconnect to SQL Server. Otherwise, use DBCC to repair the problem. In some cases, it may be necessary to restore the database.

Severity Level 24: Hardware Error

These messages indicate some type of media failure. The system administrator might have to reload the database. It might also be necessary to call your hardware vendor.

Implementing CLR User defined Aggregates in SQL Server

In this post we will learn how to implement UDF CLR Aggregates ,

In order to implement CLR UDF aggegate assembly we need to follow set of rules  or requirements in technical terms, if any of rules are broken then it is not a valid

CLR UDF

  1. Like any other CLR assembles in SQL Server, a CLR UDF  Aggregate should have a attribute called SqlUserDefinedAggregat

  2.    In order to save the Intermediate state of Aggregation we use Four methods in the Aggregation

    1. Init
    2. Accumulate
    3. Merge
    4. Terminate

Init : 

Syntax :  public void Init();

Once this method is invoked the Query processor Initializes the aggregation , for every aggregation we need to invoke this method.

Accumulate:

Syntax: public void Accumulate ( input-type value[, input-type value, ...]);

Input type is any managed sql server data types

This where actual data aggregation occurs , it may be summing or multiplying , counting.

Merge:

This method can be used to merge another instance of this aggregate class with the current instance. The query processor uses this method to merge multiple partial computations of an aggregation.

Terminate:

public return_type Terminate();

This method completes the aggregate computation and returns the result of the aggregation. The return_type should be a managed SQL Server data type.

The basic aggregate also has four attributes which are described in Andy Novak’s article.

Attribute Comment
Format This is a compulsory enum with options Native, Unknown and UserDefined.
IsInvariantToNulls Setting this to true tells the aggregate that it doesn’t care whether or not values are null or not, it will always produce the same result.
An example of an aggregate where the value for this is false is COUNT(field) which only counts instances that are NOT NULL
IsInvariantToDuplicates Setting this to true tells the aggregate that it will produce the same results even if there are duplicate entries.
MAX() is a good example of this as no matter how many duplicates there are the MAX value will still be the MAX value.
IsInvariantToOrder Setting this to true affects how SQL Server calls the Merge method of the aggregate.
If you had a string concatenation aggregate that was parallelised then it is vitally important that the streams merge back together in the correct order.

 

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

[Serializable]
[SqlUserDefinedAggregate(Format.Native)]
public struct CountVowels
{
    // count only the vowels in the passed-in strings
    private SqlInt32 countOfVowels;

    public void Init()
    {
        countOfVowels = 0;
    }

    public void Accumulate(SqlString value)
    {
        // list of vowels to look for
        string vowels = "aeiou";

        // for each character in the given parameter
        for (int i=0; i < value.ToString().Length; i++)
        {
            // for each character in the vowels string
            for (int j=0; j < vowels.Length; j++)
            {
                // convert parameter character to lowercase and compare to vowel
                if (value.Value.Substring(i,1).ToLower() == vowels.Substring(j,1))
                {
                    // it is a vowel, increment the count
                    countOfVowels+=1;
                }
            }
        }
    }

    public void Merge(CountVowels value)
    {
        Accumulate(value.Terminate());
    }

    public SqlString Terminate()
    {
        return countOfVowels.ToString();
    }
}

After deploying the aggregate, test it by executing it on the SQL Server and verifying the correct data is returned.

This query returns a result set of the vowel count for all the values in the LastNames column in the Contact table.

SELECT LastName, COUNT(LastName) AS CountOfLastName, dbo.CountVowels(LastName) AS CountOfVowels
FROM Person.Contact
GROUP BY LastName
ORDER BY LastName