zchtodd

Posted on Oct 8, 2019

#SQL30 Day 6: Supermarket Sales

#sql #database #postgres #challenge

Welcome to the SQL showdown series!

What is this and how does it work?

I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.

Write your own solution in the comments! Let's see who can come up with the most creative solutions.

I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!

Challenge #6: Supermarket Sales

In this challenge we'll take sales data from three supermarkets and produce a report that includes default values wherever there are gaps in the data. This is a pretty common requirement, and there are many ways to go about solving it!

The question for today is:

Given supermarket sales history by location and category, can you produce a report without gaps? In other words, if the Mandalay location had no sales in the home and lifestyle category on a certain day, we'd still like to see a row with zero.

Here's an example to give you a better idea of the output you're after:

(These are all rows that would not have appeared otherwise, because there were no sales for that city, category, and date.)

Sandbox Connection Details

I have a PostgreSQL database ready for you to play with.

The password is the same as the username! To query the supermarket table:

SELECT * FROM day6.supermarket;

Solution for Challenge #5

This is the question we were trying to answer with yesterday's SQL challenge:

Can you take a specific employee ID and return a path up the org chart to the CEO, listing all of the employees on that path?

My initial reaction to the question is to think of joins. If asked to start with a certain employee, you could begin self-joining to the employee table to work your way up the tree.

That approach, however, does have a few problems. The first issue is that we don't know how big this tree is -- we can't possibly write enough joins to ensure that our technique will work for all possible data.

The other issue is that joining is going to produce more columns, and not more rows like in the format of the example output.

Recursion would definitely be the way to approach this if I were writing a function to solve the problem in a language like Python.

Can we do recursion in SQL? It turns out you can!

First, here's my solution for the challenge:

WITH RECURSIVE manager AS (
  SELECT * FROM day5.employee WHERE employee_id = 1
  UNION ALL
  SELECT e.* FROM day5.employee e JOIN manager m ON e.employee_id = m.manager_id
)
SELECT * FROM manager;

This is a common table expression, albeit kind of a special one. The RECURSIVE keyword indicates that this CTE will end up calling itself.

Let's step through how this works, starting from the select outside of the CTE.

The database will first execute the initial query inside the CTE (i.e. WHERE employee_id = 1) to kick things off. The result of that query will be added to the final output, but it's also fed into the second query as manager.

The result of the second query is then added to the final output, and fed into the CTE to run the second query again, and so on until the second query returns nothing.

The initial query would return Rudy, then the employee table joins to manager on Rudy's manager_id, which returns Julieann.

Good luck!

Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.

Top comments (1)

Sam Mason • Oct 30 '19 • Edited

another day, another query! few different ways to do this one, the tidiest version I could come up with is:

WITH loc_cat(city, product_line) AS (
  VALUES ('Mandalay', 'Home and lifestyle')
), dates(sale_date) AS (
  SELECT generate_series('2019-01-01', CURRENT_DATE, '1 day')::DATE
)
SELECT city, product_line, sale_date, COALESCE(total, 0) AS total
FROM (loc_cat CROSS JOIN dates)
  LEFT JOIN supermarket USING (city, product_line, sale_date)
ORDER BY 1, 2, 3

it's not the shortest, but means the location and category values only need to be written once and it's easy to extend to multiple combinations

edit: have now read the "solution" and think I misinterpreted the task! the single city/product was just an example, you actually wanted the cartesian product of all cities, products and the whole date range. I still think I'd do something similar to what I had, e.g:

WITH loc AS (
  SELECT DISTINCT city FROM supermarket
), cat AS (
  SELECT DISTINCT product_line FROM supermarket
), dates AS (
  SELECT generate_series(dstart, dend, '1 day')::DATE AS sale_date FROM (
    SELECT MIN(sale_date) AS dstart, MAX(sale_date) AS dend FROM supermarket
  ) x
)
SELECT city, product_line, sale_date, COALESCE(total, 0) AS total
FROM (loc CROSS JOIN cat CROSS JOIN dates)
  LEFT JOIN supermarket USING (city, product_line, sale_date)
ORDER BY 1, 2, 3;

which is pretty similar to your solution, but less spaced out

DEV Community

#SQL30 Day 6: Supermarket Sales

What is this and how does it work?

Challenge #6: Supermarket Sales

Sandbox Connection Details

Solution for Challenge #5

Good luck!

Top comments (1)

Read next

Best Practices for Bulk Optimization of Queries in PostgreSQL

A Playground for SQL RDBMS, custom Ollama API Interaction with RAG on Timescale DB and download of query plus result

FastAPI, Pydantic, Psycopg3: the holy trinity for Python web APIs

Load Testing PostgreSQL on Kubernetes: A YAML-Only Approach