Welcome to the SQL showdown series!
I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.
Write your own solution in the comments! Let's see who can come up with the most creative solutions.
I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!
This challenge uses a data-set that contains aggregated video game sales dating back to 1980. This data-set comes from the Kaggle open data-sets archive.
Here's the challenge:
Can you produce a report that displays one year per row, and the aggregated global sales by genre for that year as columns?
There is only one table in the day4 schema. The videogame table contains info on the game title, published year, genre, and sales for that game.
The videogame table has the following columns:
- global_sales (in millions)
Here's an example to give you a better idea of the output you're after:
I have a PostgreSQL database ready for you to play with.
The password is the same as the username! To query the videogame table:
SELECT * FROM day4.videogame;
This is the question we were trying to answer with yesterday's SQL challenge:
Can you find the average temperature per month for every county in the United States, as well as the coldest and hottest temperatures that county has experienced over the year?
Part of this question sounds like a candidate for window functions, but how would we do this by county? The station and measurement tables can be joined through the idx column, but there doesn't appear to be a way to join to county.
Or is there?
WITH cte AS ( SELECT geo_id, s.idx, NAME, value, taken, date_trunc('month', taken) AS month FROM day3.county c JOIN day3.station s ON ST_Contains(c.wkb_geometry, ST_SetSRID(St_MakePoint(s.lat, s.long), 4326)) JOIN day3.measurement m ON s.idx = m.idx ) SELECT cte.*, avg(value) OVER geo_m, min(value) OVER geo_y, max(value) OVER geo_y FROM cte WINDOW geo_m AS (PARTITION BY (geo_id, month)), geo_y AS (PARTITION BY geo_id) ORDER BY geo_id, taken;
The interesting part is the spatial join that connects weather polling stations in the station table with their respective county. PostGIS provides the ST_Contains function, and I use it here to determine if the county polygon contains the point defined by the station latitude and longitude.
The ST_SetSRID function basically tells PostGIS what coordinate system the point we just created is going to be in. 4326 is known as the World Geodetic System and is one of the most common SRID values that you'll see.
GIS in SQL is a topic that I've barely touched on myself, but there are quite a few great resources out there on the subject.
Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.