AJ Kerrigan

Posted on

# Hanukkah of Data 2022 - Puzzle 3

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the third puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

## Thinking

In the third puzzle, we learn that the rug passed from a contractor to someone in his neighborhood:

“At last I couldn’t deal with the rug taking up my whole bathtub, so I gave it to this guy who lived in my neighborhood. He said that he was naturally assertive because he was a Aries born in the year of the Dog, so maybe he was able to clean it.

Extracting some facts about the guy in the contractor's neighborhood:

• Aries
• Birth date between March 21 and April 19
• Born in the year of the dog
• 1958, 1970, etc
• Lives in the contractor's neighborhood
• To start, look at the same zip code

### Tool Choices

VisiData would work great for this puzzle, but I was having fun practicing with the combo of Datasette and Pandas so I kept rolling with it for the rest of the challenge.

## Doing (Datasette)

### Joining and Selecting

Here we're only looking at customer data, so there are no joins required. A select/from block like this works just fine:

``````select
c.name,
c.birthdate,
c.citystatezip,
c.phone
from
customers c
``````

### Filtering

We're looking for someone in the same neighborhood as the contractor from the previous puzzle. So let's look in that zip code:

``````where
c.citystatezip like '%11420'
``````

...who was born in the year of the dog. A search took me to this page which says that's every 12 years, most recently 2018. In a Python REPL, `2018 % 12 == 2` so:

``````  and strftime('%Y', c.birthdate) % 12 = 2
``````

The last bit looking for an Aries is a nice touch - a fun subtle bit of puzzle design from the Devottys. Because horoscope borders cross months, it makes things just a bit trickier than looking at birth month alone. My first attempt was to convert the birthdate to "date of the year" (with a bit of padding to cover leap years), but that felt a bit quirky afterward. Tweaking it to use this felt a little clearer and more explicit:

``````  -- March 21 - April 19
and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
``````

### All Together Now

That combines into a tidier/simpler than Puzzle 2:

``````select
distinct c.name,
c.birthdate,
c.citystatezip,
c.phone
from
customers c
join orders o on c.customerid = o.customerid
join orders_items i on o.orderid = i.orderid
join products p on i.sku = p.sku
where
c.citystatezip like '%11420'
and strftime('%Y', c.birthdate) % 12 = 2
and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
``````

## Doing (Pandas)

After doing all the thinking and building a SQL query, the Pandas piece feels a little anti-climactic:

``````df[
(df.birthdate.dt.day_of_year.between(79,111))
& (df.birthdate.dt.year % 12 == 2)
& (df.citystatezip.str.contains('11420'))
]
``````

Using the arguably clearer "month + day" formulation felt a bit more awkward in Pandas, but maybe that's because there's a better way to do this than faithfully porting my SQL 🤔:

``````df[
(df.birthdate.dt.strftime('%m%d').transform(int).between(321,419))
& (df.birthdate.dt.year % 12 == 2)
& (df.citystatezip.str.contains('11420'))
]
``````