DEV Community

AJ Kerrigan
AJ Kerrigan

Posted on

Hanukkah of Data 2022 - Puzzle 3

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the third puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

Thinking

In the third puzzle, we learn that the rug passed from a contractor to someone in his neighborhood:

“At last I couldn’t deal with the rug taking up my whole bathtub, so I gave it to this guy who lived in my neighborhood. He said that he was naturally assertive because he was a Aries born in the year of the Dog, so maybe he was able to clean it.

Extracting some facts about the guy in the contractor's neighborhood:

  • Aries
    • Birth date between March 21 and April 19
  • Born in the year of the dog
    • 1958, 1970, etc
  • Lives in the contractor's neighborhood
    • To start, look at the same zip code

Tool Choices

VisiData would work great for this puzzle, but I was having fun practicing with the combo of Datasette and Pandas so I kept rolling with it for the rest of the challenge.

Doing (Datasette)

Joining and Selecting

Here we're only looking at customer data, so there are no joins required. A select/from block like this works just fine:

select
  c.name,
  c.birthdate,
  c.address,
  c.citystatezip,
  c.phone
from
  customers c
Enter fullscreen mode Exit fullscreen mode

Filtering

We're looking for someone in the same neighborhood as the contractor from the previous puzzle. So let's look in that zip code:

where
  c.citystatezip like '%11420'
Enter fullscreen mode Exit fullscreen mode

...who was born in the year of the dog. A search took me to this page which says that's every 12 years, most recently 2018. In a Python REPL, 2018 % 12 == 2 so:

  and strftime('%Y', c.birthdate) % 12 = 2
Enter fullscreen mode Exit fullscreen mode

The last bit looking for an Aries is a nice touch - a fun subtle bit of puzzle design from the Devottys. Because horoscope borders cross months, it makes things just a bit trickier than looking at birth month alone. My first attempt was to convert the birthdate to "date of the year" (with a bit of padding to cover leap years), but that felt a bit quirky afterward. Tweaking it to use this felt a little clearer and more explicit:

  -- March 21 - April 19
  and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
Enter fullscreen mode Exit fullscreen mode

All Together Now

That combines into a tidier/simpler than Puzzle 2:

select
  distinct c.name,
  c.birthdate,
  c.address,
  c.citystatezip,
  c.phone
from
  customers c
  join orders o on c.customerid = o.customerid
  join orders_items i on o.orderid = i.orderid
  join products p on i.sku = p.sku
where
  c.citystatezip like '%11420'
  and strftime('%Y', c.birthdate) % 12 = 2
  and cast(strftime('%m%d', c.birthdate) as int) between 321 and 419
Enter fullscreen mode Exit fullscreen mode

Doing (Pandas)

After doing all the thinking and building a SQL query, the Pandas piece feels a little anti-climactic:

df[
    (df.birthdate.dt.day_of_year.between(79,111))
    & (df.birthdate.dt.year % 12 == 2)
    & (df.citystatezip.str.contains('11420'))
]
Enter fullscreen mode Exit fullscreen mode

Using the arguably clearer "month + day" formulation felt a bit more awkward in Pandas, but maybe that's because there's a better way to do this than faithfully porting my SQL 🤔:

df[
    (df.birthdate.dt.strftime('%m%d').transform(int).between(321,419))
    & (df.birthdate.dt.year % 12 == 2)
    & (df.citystatezip.str.contains('11420'))
]
Enter fullscreen mode Exit fullscreen mode

Latest comments (0)