Welcome to the SQL showdown series!
What is this and how does it work?
I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.
Write your own solution in the comments! Let's see who can come up with the most creative solutions.
I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!
Challenge #1: Wildfire Causes
In this challenge we'll use the 1.88 Million Wildfires data-set from Kaggle.
The question is:
What are the details of the largest recorded wildfires (by acreage) for each potential cause of wildfires?
We want the name of the fire, its cause, the state it happened in, the date it was discovered, and the date at which it was considered contained. Each output row should be the largest fire that has ever occurred for that particular cause of wildfire.
The wildfire table has the following columns:
- name
- cause
- acres
- disc_date
- cont_date
- state
Sandbox Connection Details
I have a PostgreSQL database ready for you to play with.
The password is the same as the username! To query the wildfire table:
SELECT * FROM day1.wildfire;
Good luck!
Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.
Top comments (8)
What a great idea, thanks for setting this up :D
This may be a little convoluted but ....
I would check out how many causes there were first:
Then use a Window Function to order the fires by their cause and assign a rank in descending order.
Then use a CTE to isolate the Window Function results and return all rows where the rank = 1.
Then do some checks to make sure I've got the right results
That's an awesome solution, thanks for writing that. I don't think it's convoluted at all.
In my solution I went without window functions and did a join to a subquery. The interesting thing is that I could potentially join to the wrong row if there were more than 1 fire with the same cause and max acres burned, whereas here I think any ties would end up with the same rank, but it's guaranteed that there's no mixing and matching of data.
I hope you like the rest of the series! I started it because I was one of those developers that thought they knew SQL, when really I knew SQL-92, and modern SQL has so much more to offer.
bit late to the party! google pointed me to here and it seems interesting, hopefully I can get to the end...
given this is Postgres I'd use the
DISTINCT ON
operator, as it seems perfectly suited to the task:in this case you can also use an
IN
query:which gives me the same thing. but has different semantics so does different things in the presence of non-unique values. for some reason I thought it could do unexpected things when
NULL
values were involved, but a quick read suggests that's justNOT IN
I think my query is quite a bit easier than WINDOWing and CTE:
SELECT w1.*
FROM day1.wildfire w1
INNER JOIN ( SELECT cause, MAX(acres) as max_acres FROM day1.wildfire GROUP BY cause ) w2
ON w1.cause = w2.cause AND w1.acres = w2.max_acres;
Nice! I think of window functions as the heavy artillery... not always necessary when a group by might do the job.
Very well explained and helpful!
Thanks!
The database didnt work for me
Is that still work ?