Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the third puzzle.
- Doing (Datasette)
- Doing (Pandas)
Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).
In the third puzzle, we learn that the rug passed from a contractor to someone in his neighborhood:
“At last I couldn’t deal with the rug taking up my whole bathtub, so I gave it to this guy who lived in my neighborhood. He said that he was naturally assertive because he was a Aries born in the year of the Dog, so maybe he was able to clean it.
Extracting some facts about the guy in the contractor's neighborhood:
- Birth date between March 21 and April 19
- Born in the year of the dog
- 1958, 1970, etc
- Lives in the contractor's neighborhood
- To start, look at the same zip code
VisiData would work great for this puzzle, but I was having fun practicing with the combo of Datasette and Pandas so I kept rolling with it for the rest of the challenge.
Joining and Selecting
Here we're only looking at customer data, so there are no joins required. A select/from block like this works just fine:
select c.name, c.birthdate, c.address, c.citystatezip, c.phone from customers c
We're looking for someone in the same neighborhood as the contractor from the previous puzzle. So let's look in that zip code:
where c.citystatezip like '%11420'
...who was born in the year of the dog. A search took me to this page which says that's every 12 years, most recently 2018. In a Python REPL,
2018 % 12 == 2 so:
and strftime('%Y', c.birthdate) % 12 = 2
The last bit looking for an Aries is a nice touch - a fun subtle bit of puzzle design from the Devottys. Because horoscope borders cross months, it makes things just a bit trickier than looking at birth month alone. My first attempt was to convert the birthdate to "date of the year" (with a bit of padding to cover leap years), but that felt a bit quirky afterward. Tweaking it to use this felt a little clearer and more explicit:
-- March 21 - April 19 and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
All Together Now
That combines into a tidier/simpler than Puzzle 2:
select distinct c.name, c.birthdate, c.address, c.citystatezip, c.phone from customers c join orders o on c.customerid = o.customerid join orders_items i on o.orderid = i.orderid join products p on i.sku = p.sku where c.citystatezip like '%11420' and strftime('%Y', c.birthdate) % 12 = 2 and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
After doing all the thinking and building a SQL query, the Pandas piece feels a little anti-climactic:
df[ (df.birthdate.dt.day_of_year.between(79,111)) & (df.birthdate.dt.year % 12 == 2) & (df.citystatezip.str.contains('11420')) ]
Using the arguably clearer "month + day" formulation felt a bit more awkward in Pandas, but maybe that's because there's a better way to do this than faithfully porting my SQL 🤔:
df[ (df.birthdate.dt.strftime('%m%d').transform(int).between(321,419)) & (df.birthdate.dt.year % 12 == 2) & (df.citystatezip.str.contains('11420')) ]
Top comments (0)