Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the fourth puzzle.
Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).
On the fourth day, we find a woman who likes to eat pastries early in the morning:
“A few weeks later my bike chain broke on the way home, and I needed to get it fixed before work the next day. Thankfully, this woman I met on Tinder came over at 5am with her bike chain repair kit and some pastries from Noah’s. Apparently she liked to get up before dawn and claim the first pastries that came out of the oven.
This time I turned things around and started with Pandas. Because I had explored the data a bit prior to starting puzzles, I noticed that the
sku column started with a category/department prefix. Baked goods all seemed to have a bakery prefix of
BKY, so finding orders with that sku prefix seemed like a good start:
That was far too many results (as expected), so the next step was to find bakery orders early in the morning. Let's say between 4am and 9am, which should cover cases where someone got up before dawn and got the first pastries out of the oven:
df[(df.sku.str.contains('BKY')) & (df.ordered.dt.hour.between(4,9))]
That was still a bunch of rows! But the clues make it sound like this is a habit. So how about looking at which customers get those early morning pastries most often?
df[ (df.sku.str.contains('BKY')) & (df.ordered.dt.hour.between(4,9)) ].groupby(['name','phone']).size().sort_values().tail()
That suggests Cristina Booker as the clear winner, and narrowing that time range from 4-9am to 4-6am makes it even clearer.
Adapting the pandas logic to SQL left me with this query:
select c.name, c.phone, count(*) as ordercount from customers c join orders o on c.customerid = o.customerid join orders_items i on o.orderid = i.orderid join products p on i.sku = p.sku where p.sku like 'BKY%' and cast(strftime('%H', o.ordered) as int) <= 5 group by c.name, c.phone order by ordercount desc
Top comments (0)