Data Cube: A Relational Aggregation Operator : Microsoft research paper

Very nice article on group by clause , how it works and what are its limitations and how can we overcome with a CUBE operator

Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or 1-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube, or simply cube. The cube operator generalizes the histogram, cross-tabulation, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points form an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an “infinite value”, ALL. For example, the point would represent the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation

Source :


Download PDF file by clicking below link

MIcrosoft cube research


How to summarize your data using ROLLUP and CUBE in TSQL

We all know when reporiting any financial or sales information adding sub totals and grand totals to report makes it more effective but most offten develop0rs dont realize TSQL provides a effective way in doing this is using rollup and cube operators in group by clause.

So for this demo I am using Adventure works 2008 OLTP sample database

Three tables

  1. Production.Product
  2. Production.ProductCategory
  3. Production.ProductSubcategory

here is a simple querying joining three tables and grouping by product category and subcategory




AVG(ListPrice)as Avglistprice,

MAX (ListPrice)as MAxlistprice,

MAX(StandardCost)as Maxstandardcost

from Production.Product p

inner join Production.ProductSubcategory s

on  p.ProductSubcategoryID = s.ProductSubcategoryID

inner join Production.ProductCategory pc

on  pc.ProductCategoryID = s.ProductCategoryID

group by pc.ProductCategoryID,s.ProductSubcategoryID


You can see we are missing some things in the above result i.e  subtotals and grand totals

change group by cluase to include rollup operator like this

group by ROLLUP(pc.ProductCategoryID,s.ProductSubcategoryID)

trying running it with out errors

Result looks like this

CUBE is used as same way but it gives us a little different result , CUBE summarizes all the combinations of  columns in group by clause and it also gives us grandtotal which does not make sense for above example because each sub category exists once in each category.

you will ‘%_[^LIKE]%’ like

Like cluase allows you to match a character string found in a column to a specified pattern in the where clause

lets take an example to better understand its wild cards

How to use like in where clause

DECLARE@TTABLE (charcvarchar(20))










Percent % :

Underscore _ :

Square Brackets [] :









watch out for nulls .. know ur environment

ANSI_NULLS is a user option in sql server , which is like a env variable when a user establishes a connection to sqlserver

ANSI_NULLS specifies the behavior of nulls when using equality and inequality comparision against null



























as ANSI_NULLS is turned on , any query having comparing nulls will result in unknown result

Once we turn it off we will be able to query




Worst enemy to a developer (nolock) , Why shouldn’t you use

As I said before this is the exact same session that I learned from Randy Knight , when I saw his presentation I’ve decide not to use any where unless required especially in fianancial and critical decsion making reports , here we will not be going in to details of different isolation levels but we will be specifically focusing on  readuncommited isolation level or using nolock as tablehint , dirty reads and pagesplits

My first time at SQL saturday (#129 Rochester)


Unfortunately I didn’t had enough time to update my experience at Sql Saturday #129 Rochester but here it is

First as I live in Pennsylvania so it’s a 3 hour journey to the north I packed my bags and reserved a hotel near Rochester institute of Technology that’s where SQL Saturday was held

I was very excited, normally I like travelling so anyway , I started here on Friday evening and reached there around 11 O’clock and had a good sleep and woke up around 7 O clock got my things together took a snapshot of the map and was at RIT by 8 30 , registration supposed to close around 8 45 , as soon as I got out of parking I met a guy who said he was also heading to SQL Saturday and this was his first time too and then we went inside took our badges and schedule, as I already made my schedule using schedule manager in sql saturday , so I pretty much knew what I am gona listen on that day , atmosphere was good and there were quite a few vendors for that event like redgate, confio…. for marketing there DBMS tools and other stuff where we can have a chance to get some freebies like eBooks, and books


Here  is my schedule for that day

What to look for Execution plans

Grant Fitchey

also a author of a book called SQL Server Execution plans , free eBook available to download on redgate

My first session , I was very excited because I was reading his book for a while (First chapter) lol, any way his presentation was pretty bold concentrating more on the first operator (Select insert update delete) and important properties of first operator

Compile time

Compile CPU

Cache plan size

Estimated number of rows

Estimated Subtree cost

Reason for early termination

Optimization Level


and we also looked at why seek or scan , and how to suppress key lookup , on the whole pretty good learned some new things

Next session

Top Tips writing better TSQL queries (beginer level)

How to format tsql for better readability

Naming standards (procedures(get update delete), naming variables)

Error handling

Unnecessary Explicit data conversion

Improper use of functions, query hints

On the whole pretty good topic to discuss , I’ve added a link to download his presentation

Top Tips for Better TSQL

Getting started with MDX

Willima E. Pearson III

Believe me most toughest thing to understand in SQL BI stack is MDX (Multi dimensional Expression language), its was the best presentation I’ve seen on the MDXeasy to understand for beginners , and we went through some beginner stuff like Dimensions, facts, members , SET, TUPLE . I havent had a chance to email him for asking his presentation once I get it, I will upload here so hold on and other two sessions you can see is from Randy Knight , well known sql guru I will post his session information in an another post because those are worth spending more time.

so that’s all from the sessions , at the end of the day they do this raffle drawing and I did not won anything and we finished our sql saturday there, and I made some new friends we went to movie it was fun , learned lot of stuff and am looking forward to attend the next sql saturday at Philadelphia soon.

Thank you

Implementing Error handling raising User Defined Errors ( part 3):

In the last post we seen how to manage UDF errors (Add , Update , delete)  UDF error messages

in SYS.MESSAGES catalog. In this post we will look in-detail in to the Use RAISERROR function

in SQL Server to raise UDF errors. When I decided to write this, I then remembered that I written

a post for SQL server 2o12 Throw statement which will explain the differences of  raiserror and throw

, disadvantages of raiserrors and how to implement raiserror and throw, and how to raise UDF

errors , so have look at this post


but I will go through  some aspects of Raiserror that I didn’t cover  in the above post

lets start with syntax of raiseerror

RAISERROR ( { msg_id | msg_str | @local_variable }
{,severity,state }
[,argument [,…n ] ] )
[ WITH option [,…n ] g


We talked about Msg_id in the earlier post, we will look at the msg_str

Is a user-defined message with formatting similar to the printf function in the C standard library. The error message can have a maximum of 2,047 characters. If the message contains 2,048 or more characters, only the first 2,044 are displayed and an ellipsis is added to indicate that the message has been truncated.


msg_str is a string of characters with optional embedded conversion specifications. Each conversion specification defines how a value in the argument list is formatted and placed into a field at the location of the conversion specification in msg_str. Conversion specifications have this format:

% [[flag] [width] [. precision] [{h | l}]] type


Is a code that determines the spacing and justification of the substituted value.

Code Prefix or justification Description
– (minus) Left-justified Left-justify the argument value within the given field width.
+ (plus) Sign prefix Preface the argument value with a plus (+) or minus (-) if the value is of a signed type.
0 (zero) Zero padding Preface the output with zeros until the minimum width is reached. When 0 and the minus sign (-) appear, 0 is ignored.
# (number) 0x prefix for hexadecimal type of x or X When used with the o, x, or X format, the number sign (#) flag prefaces any nonzero value with 0, 0x, or 0X, respectively. When d, i, or u are prefaced by the number sign (#) flag, the flag is ignored.
‘ ‘ (blank) Space padding Preface the output value with blank spaces if the value is signed and positive. This is ignored when included with the plus sign (+) flag.


Is an integer that defines the minimum width for the field into which the argument value is placed. If the length of the argument value is equal to or longer than width, the value is printed with no padding. If the value is shorter than width, the value is padded to the length specified in width.

An asterisk (*) means that the width is specified by the associated argument in the argument list, which must be an integer value.

Is the maximum number of characters taken from the argument value for string values. For example, if a string has five characters and precision is 3, only the first three characters of the string value are used.

For integer values, precision is the minimum number of digits printed.

An asterisk (*) means that the precision is specified by the associated argument in the argument list, which must be an integer value.

{h | l} type

Is used with character types d, i, o, s, x, X, or u, and creates shortint (h) or longint (l) values.

Type specification Represents
d or i Signed integer
o Unsigned octal
s String
u Unsigned integer
x or X Unsigned hexadecimal


Is a custom option for the error and can be one of the values in the following table.

Value Description
LOG Logs the error in the error log and the application log for the instance of the Microsoft SQL Server Database Engine. Errors logged in the error log are currently limited to a maximum of 440 bytes. Only a member of the sysadmin fixed server role or a user with ALTER TRACE permissions can specify WITH LOG.
NOWAIT Sends messages immediately to the client.
SETERROR Sets the @@ERROR and ERROR_NUMBER values to msg_id or 50000, regardless of the severity level.




Raiserror will not work if it is used

  • Outside the scope of any TRY block.
  • With a severity of 10 or lower in a TRY block.
  • With a severity of 20 or higher that terminates the database connection.

















DECLARE @SERVER VARCHAR(50) = ‘testserver’
–A custom error message using arguments
RAISERROR (‘This is a custom error message.
Login: %s,
Language: %s,
SPID: %u,
Server Name: %s’, 5,1,