DEV Community

AJ Kerrigan
AJ Kerrigan

Posted on

Hanukkah of Data 2022 - Puzzle 5

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the fifth puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

Thinking

You know what Hanukkah of Data was missing? A cat lady! And that's where the rug went next:

“I listed it on Freecycle, and a woman in Queens Village came to pick it up. She was wearing a ‘Noah’s Market’ sweatshirt, and it was just covered in cat hair. When I suggested that a clowder of cats might ruin such a fine tapestry, she looked at me funny and said she only had ten or eleven cats and they were getting quite old and had cataracts now so they probably wouldn’t notice some old rug anyway.

Extracting some facts

  • The next rug owner had a bunch of cats
  • The cats are old
  • She had a Noah's Market sweatshirt (seemed relevant initially)
  • Lives in Queens Village

Doing (Pandas)

So who's buying cat stuff?

df[(df.desc.str.contains(r'cat',case=False))]
Enter fullscreen mode Exit fullscreen mode

Lots of people, cool. But who from Queens Village?

df[
  (df.desc.str.contains(r'cat',case=False))
  & (df.citystatezip.str.contains('Queens Village'))
]
Enter fullscreen mode Exit fullscreen mode

Yikes Anita, that's a ton of cat stuff. Eyeballing the results it also looks like she's specifically buying senior cat food, which jives with the clues. We can bake that into the query and roll up by quantity though, to make things even clearer:

df[
  (df.desc.str.contains(r'senior.*cat', case=False, regex=True))
  & (df.citystatezip.str.contains('Queens Village'))
].groupby(['name', 'citystatezip', 'phone']).agg({'qty': sum})
Enter fullscreen mode Exit fullscreen mode

Doing (Datasette)

Again the second tool feels more straightforward. The pandas work turns into this SQL query:

select
  c.name,
  c.citystatezip,
  c.phone,
  sum(i.qty) as total_items
from
  customers c
  join orders o on c.customerid = o.customerid
  join orders_items i on o.orderid = i.orderid
  join products p on i.sku = p.sku
where
  c.citystatezip like 'Queens Village%'
  and p.desc like '%senior%cat%'
group by
  c.name,
  c.citystatezip,
  c.phone
order by
  total_items desc
Enter fullscreen mode Exit fullscreen mode

Though I did bump my head on this at first, as I was using count() rather than sum(). Whoops!

Refining / Reusing

By this point I was reusing the same core query with the same joins a lot, and realized I probably should have created a view for it already. So:

create view noahs_order_detail as
select
  c.*,
  o.orderid,
  o.ordered,
  o.shipped,
  o.total,
  i.qty,
  i.unit_price,
  p.sku,
  p.desc,
  p.wholesale_cost
from
  customers c
  join orders o on c.customerid = o.customerid
  join orders_items i on o.orderid = i.orderid
  join products p on i.sku = p.sku
Enter fullscreen mode Exit fullscreen mode

Oldest comments (0)