zchtodd

Posted on Oct 9, 2019

#SQL30 Day 7: Grouping Sets

#sql #database #postgres #challenge

Welcome to the SQL showdown series!

What is this and how does it work?

I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.

Write your own solution in the comments! Let's see who can come up with the most creative solutions.

I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!

Challenge #7: Grouping Sets

In this challenge we'll explore the idea of grouping sets by rolling up data in several different ways.

The question for today is:

Given supermarket sales history, can you display sales not only by location and category, but also at the month and yearly level?

Here's an example to give you a better idea of the output you're after:

As you can see above, sales are summed up at the month level, but each location/product combination also has a subtotal row for the year.

The day7 schema contains only the supermarket table. Here's a sample of that table:

Sandbox Connection Details

I have a PostgreSQL database ready for you to play with.

The password is the same as the username! To query the **** table:

SELECT * FROM day7.supermarket;

Solution for Challenge #6

This is the question we were trying to answer with yesterday's SQL challenge:

Given supermarket sales history by location and category, can you produce a report without gaps? In other words, if the Mandalay location had no sales in the home and lifestyle category on a certain day, we'd still like to see a row with zero.

If you've done data analysis for a while, you've probably had this requirement come up. Understandably, users would rather see a row with a default zero value than a missing time range on a report.

My solution is to generate all of the possible combinations first, and then join to the actual sales data to display that when it's available.

In this case, we have locations, categories, and dates. There are 3 stores and 6 categories, so that should produce 18 rows. Generating the dates is probably the trickiest part of this, but PostgreSQL provides a handy function that makes life easier in that regard.

First, let's take a look at the solution and then we'll work through it.

SELECT
  keys.city,
  keys.product_line,
  coalesce(total, 0) AS total,
  keys.sale_date
FROM
  (
    WITH date_range AS (
      SELECT
        min(sale_date) AS min_date,
        max(sale_date) AS max_date
      FROM
        day6.supermarket
    )
    SELECT
      *
    FROM
      (
        SELECT
          distinct city
        FROM
          day6.supermarket
      ) s1
      CROSS JOIN (
        SELECT
          distinct product_line
        FROM
          day6.supermarket
      ) AS s2
      CROSS JOIN (
        SELECT
          generate_series(min_date, max_date, INTERVAL '1 day') AS sale_date
        FROM
          date_range
      ) AS s3
  ) AS keys
  LEFT JOIN day6.supermarket s ON keys.city = s.city
  AND keys.product_line = s.product_line
  AND keys.sale_date = s.sale_date;

The final select uses coalesce to turn NULL into zero where there was no actual sales data. I then use a common table expression to get the first and last day of sales, which is fed into generate_series to produce every day between those dates.

That gives us every day, but we still need to pair that up with store locations and categories. This is where the reclusive, rarely seen CROSS JOIN comes into play. If you've never seen one, it's equivalent to writing a JOIN with an ON clause of 1=1 or true, which gives us every possible combination.

Now that we have every combination, we can LEFT JOIN to the sales data. I'm using a LEFT JOIN because I want a row even where there was no matching sale.

Good luck!

Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.

Top comments (1)

Sam Mason • Oct 31 '19

couple of variants here, one using old style GROUP BY statements:

(
  SELECT city, product_line,
    date_part('year', sale_date)::INT AS year, date_part('month', sale_date)::INT AS month,
    SUM(total) AS sales
  FROM supermarket
  GROUP BY 1, 2, 3, 4
UNION ALL
  SELECT city, product_line,
    date_part('year', sale_date)::INT, NULL,
    SUM(total) AS sales
  FROM supermarket
  GROUP BY 1, 2, 3
)
ORDER BY 1, 2, 3, 4;

and a version (after actually googling about GROUPING SETS) that probably does what you were expecting:

SELECT city, product_line,
  date_part('year',  sale_date)::INT AS year,
  date_part('month', sale_date)::INT AS month,
  SUM(total) AS sales
FROM supermarket
GROUP BY GROUPING SETS (
  (1, 2, 3),
  (1, 2, 3, 4)
)
ORDER BY 1, 2, 3, 4;

be interesting to see if you did anything different!