Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the third puzzle.
Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).
Thinking
In the third puzzle, we learn that the rug passed from a contractor to someone in his neighborhood:
“At last I couldn’t deal with the rug taking up my whole bathtub, so I gave it to this guy who lived in my neighborhood. He said that he was naturally assertive because he was a Aries born in the year of the Dog, so maybe he was able to clean it.
Extracting some facts about the guy in the contractor's neighborhood:
- Aries
- Birth date between March 21 and April 19
- Born in the year of the dog
- 1958, 1970, etc
- Lives in the contractor's neighborhood
- To start, look at the same zip code
Tool Choices
VisiData would work great for this puzzle, but I was having fun practicing with the combo of Datasette and Pandas so I kept rolling with it for the rest of the challenge.
Doing (Datasette)
Joining and Selecting
Here we're only looking at customer data, so there are no joins required. A select/from block like this works just fine:
select
c.name,
c.birthdate,
c.address,
c.citystatezip,
c.phone
from
customers c
Filtering
We're looking for someone in the same neighborhood as the contractor from the previous puzzle. So let's look in that zip code:
where
c.citystatezip like '%11420'
...who was born in the year of the dog. A search took me to this page which says that's every 12 years, most recently 2018. In a Python REPL, 2018 % 12 == 2
so:
and strftime('%Y', c.birthdate) % 12 = 2
The last bit looking for an Aries is a nice touch - a fun subtle bit of puzzle design from the Devottys. Because horoscope borders cross months, it makes things just a bit trickier than looking at birth month alone. My first attempt was to convert the birthdate to "date of the year" (with a bit of padding to cover leap years), but that felt a bit quirky afterward. Tweaking it to use this felt a little clearer and more explicit:
-- March 21 - April 19
and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
All Together Now
That combines into a tidier/simpler than Puzzle 2:
select
distinct c.name,
c.birthdate,
c.address,
c.citystatezip,
c.phone
from
customers c
join orders o on c.customerid = o.customerid
join orders_items i on o.orderid = i.orderid
join products p on i.sku = p.sku
where
c.citystatezip like '%11420'
and strftime('%Y', c.birthdate) % 12 = 2
and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
Doing (Pandas)
After doing all the thinking and building a SQL query, the Pandas piece feels a little anti-climactic:
df[
(df.birthdate.dt.day_of_year.between(79,111))
& (df.birthdate.dt.year % 12 == 2)
& (df.citystatezip.str.contains('11420'))
]
Using the arguably clearer "month + day" formulation felt a bit more awkward in Pandas, but maybe that's because there's a better way to do this than faithfully porting my SQL 🤔:
df[
(df.birthdate.dt.strftime('%m%d').transform(int).between(321,419))
& (df.birthdate.dt.year % 12 == 2)
& (df.citystatezip.str.contains('11420'))
]
Top comments (0)