Nathan Griffiths

Posted on Sep 15, 2019 • Edited on Sep 16, 2019

SQL: ROLLUP Like A Boss

#sql #database #cube #dataanalysis

So, after reading Helen's great post here:

Article No Longer Available

I decided it was high time I should contribute a post or two to DEV, so here goes with my first effort - a look at an underused feature of SQL, the ability to add totals and subtotals to results using the ROLLUP and CUBE operators.

First, a disclaimer. The SQL language is defined by an ISO standard (9075), but as with many "standards" few implementations adhere completely to the standard. Some database systems omit some features, others add their own extensions. That said, what follows is applicable to many of the common database platforms you might use, such as Microsoft SQL Server, Oracle, PostgreSQL and Presto/AWS Athena.

SQL GROUP BY 101

A simple SELECT * FROM <table>; SQL statement will return rows of data from a table in a form that conceptually matches how those rows and columns of data are stored in the table. When designing an application that works with a database, you'll typically want to work with the data like this, querying rows for display or adding or updating rows with new information.

However, when using SQL for analytics we often don't want the information in every single row of a table. We want to aggregate it, for example retrieving the SUM or AVERAGE of the values in one of the columns:



SELECT SUM(UnitsSold) AS TotalUnitsSold
FROM Orders;

TotalUnitsSold
550

Great! So now we have an aggregate value for the whole table. But more realistically we don't want just the total for the table, we want the total broken down for each value in one of the other columns, such as Supplier. So we do something like this:



SELECT Supplier, SUM(UnitsSold)  AS TotalUnitsSold
FROM Orders
GROUP BY Supplier;

Supplier	TotalUnitsSold
Wonka's Chocolate Factory	100
Trotters Independent Trading	200
Kwik-E-Mart	250

We've used the GROUP BY clause of the SQL statement to tell the database to SUM the values of the UnitsSold column for each unique value of Supplier.

We can extend this to group by two or more columns, so we get a total for each unique combination of values in the grouping columns:



SELECT Supplier, City, SUM(UnitsSold)  AS TotalUnitsSold
FROM Orders
GROUP BY Supplier, City;

Supplier	City	TotalUnitsSold
Wonka's Chocolate Factory	Munich	100
Trotters Independent Trading	New York	10
Trotters Independent Trading	Paris	20
Trotters Independent Trading	Peckham	170
Kwik-E-Mart	Springfield	150
Kwik-E-Mart	Shelbyville	100

Nice. Now we've got the basis of a quick report or data extract.

Hold on though, how many units did "Trotters Independent Trading" sell? Now we've added multiple levels of grouping (Supplier -> City) we have to add those up again to get totals for the higher levels of grouping (e.g. Supplier).

If you're going to put this query into a reporting or visualisation tool such as Tableau, Power BI or MicroStrategy then your work here is probably done, as those tools can be configured to calculate totals for you when the data gets displayed.

However, if you're just trying to get some answers quickly out of a database (the boss wants those monthly sales figures stat!) then you're either going to export the data to another tool like Excel to work that out, or, horror of horrors, manually calculate the totals yourself to append to the results (I've seen it done). If you're a bit more confident you might try creating multiple SQL statements with different levels of grouping an appending them together using UNION ALL.

ROLLUP, ROLLUP! The GROUP BY circus is in town

Don't panic, SQL is here to save your sanity with the often overlooked ROLLUP and CUBE clauses. Essentially what these SQL statement clauses allow you to do is get the database to do the heavy lifting for you and calculate totals and subtotals in the same set of results.

Let's go back to our example and say we want to get subtotals for each Supplier and also an overall total for all orders, so we add ROLLUP to our GROUP BY clause:



SELECT Supplier, City, SUM(UnitsSold)  AS TotalUnitsSold
FROM Orders
GROUP BY ROLLUP(Supplier, City);

Supplier	City	TotalUnitsSold
Wonka's Chocolate Factory	Munich	100
Wonka's Chocolate Factory	NULL	100
Trotters Independent Trading	New York	10
Trotters Independent Trading	Paris	20
Trotters Independent Trading	Peckham	170
Trotters Independent Trading	NULL	200
Kwik-E-Mart	Springfield	150
Kwik-E-Mart	Shelbyville	100
Kwik-E-Mart	NULL	250
NULL	NULL	550

Notice we now have an extra row for each Supplier, with a NULL value in the City column. This represents the sum of all UnitsSold for each Supplier. There's also a row at the bottom that has a NULL value for both Supplier and City, this is the total UnitsSold for all Suppliers and all Cities (i.e. the total for the whole table) - notice this matches our total from the first simple SUM query we wrote.

So that's it?

Well that's great in that we now have our subtotals, but all those NULL values make this a bit hard to read (and confusing to any non-SQL people you show this to). Luckily most database platforms also implement a function called GROUPING (sorry Presto/Athena users, you're in the unlucky camp on this one). This function accepts a column name and simply returns 1 (true) or 0 (false) to indicate if the current row contains a subtotal for that column e.g. GROUPING(Supplier) will return 1 if the row contains a subtotal for Supplier or 0 if it's a regular row.

One of the things we can do with this GROUPING function is to make our results look at bit clearer by replacing those NULL values with something a bit more meaningful:



SELECT 
   CASE 
      WHEN GROUPING(Supplier) = 1 
         THEN '-All Suppliers-' 
      ELSE Supplier 
   END AS Supplier
   , CASE 
      WHEN GROUPING(City) = 1
         THEN '-All Cities-'
      ELSE City
    END AS City
   , SUM(UnitsSold)  AS TotalUnitsSold
FROM Orders
GROUP BY ROLLUP(Supplier, City);

Supplier	City	TotalUnitsSold
Wonka's Chocolate Factory	Munich	100
Wonka's Chocolate Factory	-All Cities-	100
Trotters Independent Trading	New York	10
Trotters Independent Trading	Paris	20
Trotters Independent Trading	Peckham	170
Trotters Independent Trading	-All Cities-	200
Kwik-E-Mart	Springfield	150
Kwik-E-Mart	Shelbyville	100
Kwik-E-Mart	-All Cities-	250
-All Suppliers-	-All Cities-	550

Now that is much easier to read! Time to email the numbers to the boss and nip down to the pub for a celebratory pint.

Wait, didn't you mention something about CUBE?

Ah yes, CUBE. To be honest, the use cases for CUBE are rarer than for ROLLUP but it's good to know about it just in case.

CUBE is basically ROLLUP on steroids - ROLLUP creates subtotals for each sublevel of grouping based on the order of the grouping columns (i.e. in our example there are subtotals for Supplier and then "all rows", but not City because that's the "bottom" level of the grouping).

CUBE on other hand creates subtotals for every combination of values in the grouping columns.

How is this useful? Let's look at our example again. Up to now these examples have been assuming that there's a natural hierarchy between Supplier and City i.e. one supplier to many cities (a one-to-many or 1:N relationship), but let's say that actually these are independent attributes of an order so that one Supplier can have many Cities and one City can be related to many Suppliers (a many-to-many or M:N relationship). In that case we don't just want subtotals for Supplier, we want subtotals for each City as well.

Simply replace ROLLUP with CUBE in our SQL statement and we now get this result:



SELECT 
   CASE 
      WHEN GROUPING(Supplier) = 1 
         THEN '-All Suppliers-' 
      ELSE Supplier 
   END AS Supplier
   , CASE 
      WHEN GROUPING(City) = 1
         THEN '-All Cities-'
      ELSE City
    END AS City
   , SUM(UnitsSold)  AS TotalUnitsSold
FROM Orders
GROUP BY CUBE(Supplier, City);

Supplier	City	TotalUnitsSold
Wonka's Chocolate Factory	Munich	100
Wonka's Chocolate Factory	-All Cities-	100
Trotters Independent Trading	New York	10
Trotters Independent Trading	Paris	20
Trotters Independent Trading	Peckham	170
Trotters Independent Trading	-All Cities-	200
Kwik-E-Mart	Springfield	150
Kwik-E-Mart	Shelbyville	100
Kwik-E-Mart	-All Cities-	250
-All Suppliers-	Munich	100
-All Suppliers-	New York	10
-All Suppliers-	Paris	20
-All Suppliers-	Peckham	170
-All Suppliers-	Springfield	150
-All Suppliers-	Shelbyville	100
-All Suppliers-	-All Cities-	550

Now we have subtotals for each Supplier and also separate subtotals for each City, plus the overall total!

Fantastic! Now I'm going to add ROLLUP to all my queries!

Woah there cowboy! So as I briefly mentioned, if you're writing a query that's going to be used by a reporting tool like Tableau or whatever then this technique isn't really helpful as those tools will usually do the totalling for you. In fact, this is usually an anti-pattern and not advisable because your reporting tool won't be able to differentiate the subtotal rows from the regular rows and that will make formatting your report a lot harder.

However, if you're working directly in SQL and just want to generate some data for a quick ad-hoc report or analysis then ROLLUP or CUBE can save you some time and effort in getting the results you need.

Top comments (5)

DevWorksSimone • Sep 15 '19

I love sql tut like this and @helen Anderson. I started digging into sql as I did for excel and VBA to make myself faster and help collegue who are not tech familiar with loads of data. Even if I am not going use this is the next future, is always good to see there is a vanilla way to do these things. Thank you for share!