SQL30 (16 Part Series)
Welcome to the SQL showdown series!
I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.
Write your own solution in the comments! Let's see who can come up with the most creative solutions.
I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!
In this challenge we'll take sales data from three supermarkets and produce a report that includes default values wherever there are gaps in the data. This is a pretty common requirement, and there are many ways to go about solving it!
The question for today is:
Given supermarket sales history by location and category, can you produce a report without gaps? In other words, if the Mandalay location had no sales in the home and lifestyle category on a certain day, we'd still like to see a row with zero.
Here's an example to give you a better idea of the output you're after:
(These are all rows that would not have appeared otherwise, because there were no sales for that city, category, and date.)
I have a PostgreSQL database ready for you to play with.
The password is the same as the username! To query the supermarket table:
SELECT * FROM day6.supermarket;
This is the question we were trying to answer with yesterday's SQL challenge:
Can you take a specific employee ID and return a path up the org chart to the CEO, listing all of the employees on that path?
My initial reaction to the question is to think of joins. If asked to start with a certain employee, you could begin self-joining to the employee table to work your way up the tree.
That approach, however, does have a few problems. The first issue is that we don't know how big this tree is -- we can't possibly write enough joins to ensure that our technique will work for all possible data.
The other issue is that joining is going to produce more columns, and not more rows like in the format of the example output.
Recursion would definitely be the way to approach this if I were writing a function to solve the problem in a language like Python.
Can we do recursion in SQL? It turns out you can!
First, here's my solution for the challenge:
WITH RECURSIVE manager AS ( SELECT * FROM day5.employee WHERE employee_id = 1 UNION ALL SELECT e.* FROM day5.employee e JOIN manager m ON e.employee_id = m.manager_id ) SELECT * FROM manager;
This is a common table expression, albeit kind of a special one. The RECURSIVE keyword indicates that this CTE will end up calling itself.
Let's step through how this works, starting from the select outside of the CTE.
The database will first execute the initial query inside the CTE (i.e. WHERE employee_id = 1) to kick things off. The result of that query will be added to the final output, but it's also fed into the second query as manager.
The result of the second query is then added to the final output, and fed into the CTE to run the second query again, and so on until the second query returns nothing.
The initial query would return Rudy, then the employee table joins to manager on Rudy's manager_id, which returns Julieann.
Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.