OLAP: CUBE, ROLLUP, AND GROUPING SETS

#database #mysql #sql #postgres

In the realm of data analysis and business intelligence, OLAP (Online Analytical Processing) stands as a cornerstone technology, the facilitating multidimensional analysis of vast datasets. OLAP enables users to gain insights from data by analyzing it across multiple dimensions, providing a comprehensive view of business metrics and performance.

So by now we are going to discuss the three key features of OLAP which are the CUBE, ROLLUP, AND GROUPING SETS operators. That will play a pivotal roles in summarizing and aggregating data in various way.

Let's first discuss the first one which is:

CUBE

The cube is a powerful tool in OLAP, it can generate all possible combinations of dimension and their aggregate. It provides a comprehensive multidimensional view of the data by computing subtotals and totals for every combination of dimensions, enabling users to explore data from different perspectives and levels of granularity.

Okay it's kind of technical on first but it will surely you will understand it when we are going to use it in query.

Let's say we want to know the count of each person who are renting in a country, and what gender they are.

SELECT country, 
       gender,
       COUNT(*)
FROM rentals
GROUP BY CUBE(country, gender) -- This one will group them by its country and gender.

So the output of that will be this

| country    | gender   | COUNT(*) |
|------------|----------|----------|
| USA        | Male     | 100      |
| USA        | Female   | 150      |
| UK         | Male     | 120      |
| UK         | Female   | 130      |
| NULL       | Male     | 220      |  <-- Subtotal for Male across all countries
| NULL       | Female   | 280      |  <-- Subtotal for Female across all countries
| USA        | NULL     | 250      |  <-- Subtotal for USA across all genders
| UK         | NULL     | 250      |  <-- Subtotal for UK across all genders
| NULL       | NULL     | 500      |  <-- Grand total

See each row represents a unique combination of "country" and "gender". That's the uses of CUBE it will generate all possible combination base on what you aggregate. we have subtotal for each category and the grandtotal.

ROLLUP

ROLLUP operator is used to generate hierarchical roll-up data. It computes subtotals for a predefined hierarchy of dimensions, typically moving from the most detailed level of granularity to the least detailed. It generates subtotals for each level of the hierarchy, producing a result set that includes subtotals for each combination of the specified columns, as well as grand totals. Its like more organize then lets make an example again.

SELECT 
    country, 
    gender,
    COUNT(*) AS rental_count
FROM 
    rentals
GROUP BY 
    ROLLUP (country, gender);

Then the output will be this

| country  | gender  | rental_count |
|----------|---------|--------------|
| NULL     | NULL    | 8            |  <-- Grand total
| NULL     | Male    | 3            |  <-- Subtotal for Male across all countries
| NULL     | Female  | 5            |  <-- Subtotal for Female across all countries
| USA      | NULL    | 3            |  <-- Subtotal for USA across all genders
| USA      | Male    | 2            |
| USA      | Female  | 1            |
| UK       | NULL    | 3            |  <-- Subtotal for UK across all genders
| UK       | Male    | 1            |
| UK       | Female  | 2            |

you see its organize after the count of each category it will print the sub total of that then next will be the grand total. You may notice that the subtotals for each category often appear adjacent to the corresponding category rows. This is because the ROLLUP operator is designed to provide intuitive and readable output, placing subtotals in proximity to the data they summarize.

GROUPING SETS

We use GROUPING SETS operator to group the groupings by the arguments we put inside its like grouping but the difference is that we can put as many group inside of that like
GROUP BY GROUPING SET ((country, gender), (country), (gender)) We can group the data into combination of country and gender in the first argument and in second arguments we want to also group it in country only and so in the third argument.

It's like we are combining 3 groups in 1 query which is first one is (country, gender), then the second is (country), then the third is (gender). so if you want to achieve the same result of that without using the grouping sets you need 3 query for that.

Let's proceed to example of this grouping sets so we can understand it.

SELECT 
    country, 
    gender,
    COUNT(*) AS rental_count
FROM 
    rentals
GROUP BY 
    GROUPING SETS (country, gender, ());

Then the output would be

| country  | gender  | rental_count |
|----------|---------|--------------|
| NULL     | NULL    | 8            |  <-- Grand total
| NULL     | Male    | 3            |  <-- Subtotal for Male across all countries
| NULL     | Female  | 5            |  <-- Subtotal for Female across all countries
| USA      | NULL    | 3            |  <-- Subtotal for USA across all genders
| USA      | Male    | 2            |
| USA      | Female  | 1            |
| UK       | NULL    | 3            |  <-- Subtotal for UK across all genders
| UK       | Male    | 1            |
| UK       | Female  | 2            |

So the explanation here is each unique combination of country, and gender is shown in the table then you are wondering why there is () then that's a empty bracket meaning it will total all the count in the output but if we remove that it will only shown the sub total of each category.

You are wondering why the CUBE and GROUPING SETS have a similarity then you're right but the cube is getting all the possible combination but the grouping sets is we can group as many as we want in one query.

I know guys that's a lot to learn but if you keep using this in your query over the time you will understand each of this. Thanks a lot :)