Welcome to the SQL showdown series!
I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.
Write your own solution in the comments! Let's see who can come up with the most creative solutions.
I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!
In this challenge we'll use a small part of the huge stock market data-set from Kaggle. I've loaded the daily trading history for a single company to keep things reasonable.
The question for today is:
Can you output the day's closing price, the closing price from yesterday, the variance between the two, as well as the average closing price for the current month?
The stockprice table has the following columns:
Here's an example to give you a better idea of the output you're after:
I have a PostgreSQL database ready for you to play with.
The password is the same as the username! To query the stockprice table:
SELECT * FROM day2.stockprice;
This is the question we were trying to answer with yesterday's SQL challenge:
What are the details of the largest recorded wildfires (by acreage) for each potential cause of wildfires?
At first you might think, let's do a GROUP BY and a sum, right? Job's done boss. Well, almost. We also wanted to return the details for each individual fire.
I'm sure you've seen this issue before if you're an old hand at SQL. The problem is that we can't do this:
SELECT w.*, sum(w.acres) FROM day1.wildfire w GROUP BY cause;
At least in PostgreSQL, we're greeted with this lovely message.
[Code: 0, SQL State: 42803] ERROR: column "w.fire_name" must appear in the GROUP BY clause or be used in an aggregate function
This issue is brought up so much that StackOverflow has created the greatest-n-per-group tag just for it.
One possible solution involves joining to a subquery that finds the max value.
SELECT * FROM day1.wildfire w1 JOIN ( SELECT cause, max(acres) AS max_acres FROM day1.wildfire GROUP BY cause ) w2 ON w1.cause = w2.cause AND w1.acres = w2.max_acres ORDER BY w1.acres desc;
There are some kind of interesting exceptions to the rules around GROUP BY. Prior to MySQL 8.0, you could arbitrarily select columns that aren't part of an aggregate or the group by clause. Which row does it pull the rest of the columns from? I'm not sure anyone knows.
SQLite will also allow you to do the same, but only if the aggregate is a min or max, in which case the values come from the corresponding row.
Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.