Welcome to the SQL showdown series!
What is this and how does it work?
I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.
Write your own solution in the comments! Let's see who can come up with the most creative solutions.
I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!
Challenge #5: Hierarchies
Today we're going to tackle dealing with hierarchical or graph-like data. Imagine something like the Oracle of Bacon that finds the most direct relationship from one actor to Kevin Bacon. In our case, we'll use a made-up corporate org chart to represent a hierarchy of relationships.
Here's the challenge:
Can you take a specific employee ID and return a path up the org chart to the CEO, listing all of the employees on that path?
The only table in this challenge is the employee table. These are the columns that make up the employee table:
- employee_id
- name
- manager_id
Here's an example to give you a better idea of the output you're after:
In this case, we start with Rudy and proceed up the chain until we reach Tonie, who has no manager, and who we assume is the CEO.
Sandbox Connection Details
I have a PostgreSQL database ready for you to play with.
The password is the same as the username! To query the employee table:
SELECT * FROM day5.employee;
Solution for Challenge #4
This is the question we were trying to answer with yesterday's SQL challenge about video game sales:
Can you produce a report that displays one year per row, and the aggregated global sales by genre for that year as columns?
The task is to turn rows into columns, so that we end up with every genre as a column. This is kind of like an Excel pivot table.
Some databases, such as SQL Server, have a PIVOT function that does exactly this. PostgreSQL, however, does not have the PIVOT function. The tablefunc extension module provides a CROSSTAB function for accomplishing the same thing, but let's stick with a vanilla PostgreSQL installation.
Here's how I went about solving this challenge:
SELECT
year AS "Year",
SUM(CASE WHEN genre = 'Action' THEN global_sales END) AS "Action",
SUM(CASE WHEN genre = 'Adventure' THEN global_sales END) AS "Adventure",
SUM(CASE WHEN genre = 'Fighting' THEN global_sales END) AS "Fighting",
SUM(CASE WHEN genre = 'Misc' THEN global_sales END) AS "Misc",
SUM(CASE WHEN genre = 'Platform' THEN global_sales END) AS "Platform",
SUM(CASE WHEN genre = 'Puzzle' THEN global_sales END) AS "Puzzle",
SUM(CASE WHEN genre = 'Racing' THEN global_sales END) AS "Racing",
SUM(CASE WHEN genre = 'Role-Playing' THEN global_sales END) AS "Role-Playing",
SUM(CASE WHEN genre = 'Shooter' THEN global_sales END) AS "Shooter",
SUM(CASE WHEN genre = 'Simulation' THEN global_sales END) AS "Simulation",
SUM(CASE WHEN genre = 'Sports' THEN global_sales END) AS "Sports",
SUM(CASE WHEN genre = 'Strategy' THEN global_sales END) AS "Strategy"
FROM day4.videogame GROUP BY year ORDER BY year;
If you've never seen CASE statements before, they're similar to ternaries in a language like JavaScript. If we take the first column, it would be similar to the following in JavaScript:
genre == "Action" ? global_sales : 0;
Usually you'll see CASE statements that have an ELSE clause, but here I'm relying on the fact that the default ELSE clause returns NULL, which when passed to the SUM function is equivalent to zero.
By grouping on year, each of the SUM functions is only adding up the sales for one genre over that year after the CASE statement filters out other genres.
This approach is obviously a little tedious, and only works when you know your desired columns ahead of time.
Good luck!
Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.
Top comments (1)
found a bit more time! have known about recursive queries for a while but never had to use them outside of exercises like this.
the direct answer to your question would be something like:
but it's more interesting to handle the case of more than one employee, and how you do something sensible with the results. if you're expecting this to be handled by code that's good with trees/graphs like this then, the following seems nice:
the only meaningful change is to use a
UNION
instead of aUNION ALL
, but suppose a plainUNION
could have been used in the first query.if you wanted something that's directly nice for presentation then it might be useful to track the path up the tree: