grouping sets rollup, cube,
grouping_id(),grouping()
GROUP BY and CUBE(), ROLLUP(), GROUPING() and
GROUPING_ID() functions
Posted by decipherinfosys
on November 2, 2007
You must already be aware of the GROUP BY clause. It is used for grouping
the rows by a selected set of attributes and is typically done to get some
aggregated information for that group of columns./*********************************************************************************************************
Creation of a TEST table and population of some dummy data
**********************************************************************************************************/
CREATE TABLE TEST_TBL (COL1 INT, COL2 VARCHAR(10), COL3 INT)
GO
INSERT INTO TEST_TBL VALUES (1, ‘A’, 10)
INSERT INTO TEST_TBL VALUES (1, ‘A’, 20)
INSERT INTO TEST_TBL VALUES (1, ‘A’, 30)
INSERT INTO TEST_TBL VALUES (1, ‘B’, 90)
INSERT INTO TEST_TBL VALUES (2, ‘A’, 30)
INSERT INTO TEST_TBL VALUES (2, ‘A’, 100)
INSERT INTO TEST_TBL VALUES (3, ‘C’, 110)
INSERT INTO TEST_TBL VALUES (3, ‘C’, 120)
GO
/*********************************************************************************************************
Before starting, let’s take a look at the data that exists in the table
**********************************************************************************************************/
SELECT * FROM TEST_TBL
GO
COL1 COL2 COL3
----------- ---------- -----------
1 A 10
1 A 20
1 A 30
1 B 90
2 A 30
2 A 100
3 C 110
3 C 120/*********************************************************************************************************
Simple Example of a GROUP BY Clause. The grouping is being done over COL1 and COL2
and the aggregate functions are used to display the total, the average, the max and min values, etc.
**********************************************************************************************************/
SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL, AVG(COL3) AVG_VAL, MAX(COL3) MAX_VAL, MIN(COL3) MIN_VAL
FROM TEST_TBL
GROUP BY COL1, COL2
GO
COL1 COL2 CNT_RECORD TOTAL_VAL AVG_VAL MAX_VAL MIN_VAL
----------- ---------- ----------- ----------- ----------- ----------- -----------
1 A 3 60 20 30 10
2 A 2 130 65 100 30
1 B 1 90 90 90 90
3 C 2 230 115 120 110/*********************************************************************************************************
Now, let us take the same SQL and use the RollUP() function in addition to the GROUP BY clause:
**********************************************************************************************************/
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO
Usage of the ROLLUP() function generates the GROUP BY aggregate rows PLUS super-aggregate (cumulative) rows and a final grand total row as well. If you see below, one row with a sub-total is generated for each unique combination of values of (COL1, COL2), and (COL1). As should be clear from the preceding statement, the order of the columns in the ROLLUP() function can change the output as well as the number of rows in the final result set.
COL1 COL2 TOTAL_VAL
----------- ---------- -----------
1 A 60 ==> Same as in the representation above
1 B 90 ==> Same as in the representation above
1 NULL 150 ==> Aggregation of the records from above
2 A 130 ==> Same as in the representation above
2 NULL 130 ==> Aggregation of the record from above
3 C 230 ==> Same as in the representation above
3 NULL 230 ==> Aggregation of the record from above
NULL NULL 510 ==> The grand totalThe above SQL can also be written as:
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY COL1, COL2 WITH ROLLUP
/*********************************************************************************************************
Now, let us take the same SQL and use the CUBE() function in addition to the GROUP BY clause:
**********************************************************************************************************/
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY CUBE (COL1, COL2)
GO
A CUBE() as the name suggests generates data for the grouping of all permutations of expressions i.e. since we have 2 columns in our example that we are using the CUBE() function on, we have 2^2 which means 4 grouping sets:
COL1, COL2
COL2, COL1
COL1
COL2
Thus, one row will be produced for each unique grouping set from above and in addition, a sub-total row is generated for each row and an aggregated grand total row is produced with NULL values in all the other columns. You can see the output below:
COL1 COL2 TOTAL_VAL
----------- ---------- -----------
1 A 60
2 A 130
NULL A 190
1 B 90
NULL B 90
3 C 230
NULL C 230
NULL NULL 510
1 NULL 150
2 NULL 130
3 NULL 230And as is obvious from the output, the order of the columns has no bearing on the output in the case of the CUBE() function.
The above SQL can also be written as:
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY COL1, COL2 WITH CUBE
In order to distinguish the NULL values that are returned because of the usage of the ROLLUP() and/or CUBE() functions versus actual NULL values in the tables, one can make use of the GROUPING function. Example:
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL2) AS GRP_VALUE
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO
COL1 COL2 TOTAL_VAL GRP_VALUE
----------- ---------- ----------- ---------
1 A 60 0
1 B 90 0
1 NULL 150 1
2 A 130 0
2 NULL 130 1
3 C 230 0
3 NULL 230 1
NULL NULL 510 1As seen from above, whereever the GRP_VALUE is marked as 1, those are the records that were generated because of the ROLLUP() function usage. Since the GROUPING() function takes in only one argument, we made usage of the outermost column in order to filter out all those records where the NULL value was being generated because of the usage of the function. In the case of the CUBE() function, you can use the GROUPING() function multiple times to filter out those records.
Another function to be aware of is the GROUPING_ID() function. This function can be used to compute the level of grouping. So, if there are two columns like we have in our example in this post, the GROUPING_ID() will be computed as:
Column(s) GROUPING_ID(COL1, COL2) = GROUPING(COL1) + GROUPING(COL2) GROUPING_ID() Output
COL1 10 2
COL2 01 1
COL1, COL2 11 3
COL2, COL1 11 3Now, let us see this with the actual SQL execution:
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL1) AS C1, GROUPING(COL2) AS C2, GROUPING_ID(COL1, COL2) AS GRP_ID_VALUE
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO
COL1 COL2 TOTAL_VAL C1 C2 GRP_ID_VALUE
----------- ---------- ----------- ---- ---- ------------
1 A 60 0 0 0
1 B 90 0 0 0
1 NULL 150 0 1 1
2 A 130 0 0 0
2 NULL 130 0 1 1
3 C 230 0 0 0
3 NULL 230 0 1 1
NULL NULL 510 1 1 3In the next post, we will talk about the new feature of GROUPING SETS that has been introduced in SQL Server 2008. Using the GROUPING SETS, one can aggregate only the specified groups instead of the full set of aggregations as generated by the CUBE() or ROLLUP() functions
No comments:
Post a Comment