DEV Community

AJ Kerrigan
AJ Kerrigan

Posted on

Hanukkah of Data 2022 - Puzzle 8 and Wrap-Up

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the eighth puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

Thinking

For the last puzzle, we're looking for the person who still probably has the rug! Turns out he's a bit of a pack rat:

She wound up getting a newer and more expensive rug, so she gave it to an acquaintance of hers who collects all sorts of junk. Apparently he owns an entire set of Noah’s collectibles! He probably still has the rug, even.

The plan here will be find what counts as a Noah's collectible and see who has the most of them.

Doing (Datasette)

Based on a peek at the products table, it looks like collectibles all have a 'COL' sku prefix. So we should be able to find customers who have ordered the most distinct items with that prefix:

select
  name,
  phone,
  count(distinct sku) as colcount
from
  noahs_order_detail
where
  sku like 'COL%'
group by
  name,
  phone
order by
  colcount desc
Enter fullscreen mode Exit fullscreen mode

And yes, Travis has more than 2x the collectible count of any other customer!

Doing (Pandas)

A pretty straight SQL-->pandas translation seems to work fine here:

df[
  df.sku.str.contains('^COL')
].groupby(
  ['name','phone']
).agg(
  {'sku':'nunique'}
).sort_values(by='sku').tail()
Enter fullscreen mode Exit fullscreen mode

Wrap-Up

This was a lot of fun! Some specific takeaways:

  • Huge thanks to The Devottys for putting this challenge together
    • Saul & Anja pour a lot of heart and smarts into VisiData and the data/terminal-loving community
    • I'm consistently impressed by Dwimmer's artwork, he's magic at the terminal
  • There's something dangerous or funky about using category dtypes in pandas with groupby()
  • Working through problems with multiple tools can sometimes help you rethink your approach in all of them
  • You don't have to be Simon Willison to get a lot out of SQLite

I should also point out the official wrap-up post here, which links to other community posts.

Oldest comments (0)