Massy

Posted on

Problem solving with SQL: Case Study #1 — Danny’s Diner

Thank you Danny Ma for the excellent case study! You can find it here and try it yourself. While at it, you should give Danny Ma a follow on LinkedIn and support his posts if you aren’t already doing so!

I’ve posted the solution to this case study as a raw SQL script file on GitHub too.

Introduction

Danny seriously loves Japanese food so at the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner needs your assistance to help the restaurant stay afloat — the restaurant has captured some very basic data from its few months of operation but has no idea how to use its data to help them run the business.

Problem Statement

Danny wants to use the data to answer a few simple questions about his customers, especially about their

• visiting patterns,
• how much money they’ve spent, and
• which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer loyalty program — additionally, he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

The data set contains the following 3 tables which you may refer to the relationship diagram below to understand the connection.

sales
members

Case Study Questions

1. What is the total amount each customer spent at the restaurant?
2. How many days has each customer visited the restaurant?
3. What was the first item from the menu purchased by each customer?
4. What is the most purchased item on the menu and how many times was it purchased by all customers?
5. Which item was the most popular for each customer?
6. Which item was purchased first by the customer after they became a member?
7. Which item was purchased just before the customer became a member?
8. What are the total items and amount spent for each member before they became a member?
9. If each \$1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?
10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customers A and B have at the end of January?

Solution

I used MySQL Workbench and these are the particular functions I employed:

• Aggregate functions — SUM, COUNT
• Joins — Inner join, left join
• Temp tables (CTE)
• Window function

Before attempting the questions I used the Entity Relationship Diagram as a guide to determine the logical structure of this database. I then went ahead to create a schema and tables to which tables I inserted the respective values

CREATE SCHEMA dannys_diner;
USE dannys_diner;

product_id INT NOT NULL,
product_name VARCHAR(5),
price INT,
PRIMARY KEY (product_id)
);

(product_id, product_name, price)
VALUES
('1', 'sushi', '10'),
('2', 'curry', '15'),
('3', 'ramen', '12');

CREATE TABLE members (
customer_id VARCHAR(1) NOT NULL,
join_date DATE,
PRIMARY KEY (customer_id)
);

INSERT INTO members
(customer_id, join_date)
VALUES
('A', '2021-01-07'),
('B', '2021-01-09');

CREATE TABLE sales (
customer_id VARCHAR(1) NOT NULL,
order_date DATE,
product_id INTEGER NOT NULL
);

INSERT INTO sales
(customer_id, order_date, product_id)
VALUES
('A', '2021-01-01', '1'),
('A', '2021-01-01', '2'),
('A', '2021-01-07', '2'),
('A', '2021-01-10', '3'),
('A', '2021-01-11', '3'),
('A', '2021-01-11', '3'),
('B', '2021-01-01', '2'),
('B', '2021-01-02', '2'),
('B', '2021-01-04', '1'),
('B', '2021-01-11', '1'),
('B', '2021-01-16', '3'),
('B', '2021-02-01', '3'),
('C', '2021-01-01', '3'),
('C', '2021-01-01', '3'),
('C', '2021-01-07', '3');

Questions deep dive

Q1. What is the total amount each customer spent at the restaurant?
I use the SUM and GROUP BY functions to find out the total amount spent for each customer. I added the JOIN function because customer_id is from sales table and price is from menu table.

SELECT customer_id , SUM(price) amount_spent
FROM sales
GROUP BY customer_id;

• Customer A spent \$76.
• Customer B spent \$74.
• Customer C spent \$36.

Q2. How many days has each customer visited the restaurant?
I wrapped the COUNT function around the DISTINCTfunction to find out the number of days each customer visited the restaurant.

If I did not use DISTINCT for order_date, the number of days could be repeated. For example, if customer A visited the restaurant twice on ‘2021–01–07’, then the number of days may have counted as 2 instead of 1 day.

SELECT customer_id, COUNT(DISTINCT(order_date)) no_of_visits
FROM sales
GROUP BY customer_id;

• Customer A visited 4 times.
• Customer B visited 6 times.
• Customer C visited 2 times.

Q3. What was the first item from the menu purchased by each customer?
I first ran a query to find out the earliest order_date and used the answer to filter for only purchases on that date.

SELECT customer_id , product_name, order_date
FROM sales
WHERE order_date = '2021-01-01'
GROUP BY customer_id;

• Customer A’s first order was sushi.
• Customer B’s first order was curry.
• Customer C’s first order was ramen.

Q4. What is the most purchased item on the menu and how many times was it purchased by all customers?

SELECT product_name, COUNT(product_name) times_purchased
FROM sales
GROUP BY product_name
ORDER BY times_purchased DESC
LIMIT 1;

The most purchased item on the menu is ramen.

Q5. Which item was the most popular for each customer?

SELECT customer_id, product_name, COUNT(product_name) times_purchased
FROM sales
GROUP BY customer_id, product_name
ORDER BY times_purchased DESC;

• Customer A and C’s favourite item was ramen.
• Customer B equally enjoyed all items on the menu.

Q6. Which item was purchased first by the customer after they became a member?
Only two customers were members. I ran independent queries to find out the first item they purchased.

-- Customer A
SELECT customer_id, order_date, product_name
FROM sales
WHERE customer_id = 'A' AND order_date > '2021-01-07' -- date after membership
ORDER BY order_date
LIMIT 1

-- Customer B
SELECT customer_id, order_date, product_name
FROM sales
WHERE customer_id = 'B' AND order_date > '    2021-01-09' -- date after membership
ORDER BY order_date
LIMIT 1;

After Customer A became a member, his/her first order was ramen, whereas it was sushi for Customer B.

Q7. Which item was purchased just before the customer became a member?
I also did the same here. I ran independent queries to find out the first item they purchased because only two customers were members.

-- Customer A
SELECT customer_id, order_date, product_name
FROM sales
WHERE customer_id = 'A' AND order_date < '2021-01-07' -- dates before membership
ORDER BY order_date DESC

-- Customer B?
SELECT customer_id, order_date, product_name
FROM sales
WHERE customer_id = 'B' AND order_date < '2021-01-09' -- get dates before membership
ORDER BY order_date DESC -- to capture closest date before membership
LIMIT 1;

Customer A’s order before he/she became a member was sushi and curry whereas Customer B’s order was sushi.

Q8. What are the total items and amount spent for each member before they became a member?

-- Customer A
SELECT customer_id, order_date, COUNT(product_name) total_items,             SUM(price) amount_spent
FROM sales
WHERE customer_id = 'A' AND order_date < '2021-01-07' -- get dates before membership
GROUP BY customer_id
ORDER BY order_date;

-- Customer B
SELECT customer_id, order_date, COUNT(product_name) total_items, SUM(price) amount_spent
FROM sales
WHERE customer_id = 'B' AND order_date < '2021-01-09' -- dates before membership
GROUP BY customer_id
ORDER BY order_date;

Customer A spent \$ 25 on 2 items.
Customer B spent \$40 on 3 items.

Q9. If each \$1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?

Let’s break down the question.

Each \$1 spent = 10 points.
But, sushi (product_id 1) gets 2x points, meaning each \$1 spent = 20 points
So, we use CASE WHEN to create conditional statements

If product_id = 1, then every \$1 price multiply by 20 points
All other product_id that is not 1, multiply \$1 by 10 points
So, you can see the table below with the new column, total_points.

SELECT customer_id,
SUM(CASE
WHEN product_name = 'sushi' THEN 20 * price
ELSE 10 * price
END) total_points
FROM sales
GROUP BY customer_id;

Total points for Customer A, B and C are 860, 940 and 360 respectively.

Q10. If the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customers A and B have at the end of January?

The build up to my final query

1. Found out the customer’s validity date (which is 6 days after join_date and inclusive of join_date) and the last day of Jan 2021 (‘2021–01–21’). I made the result of this query a CTE because I was going to query further from this result in the following CASE WHEN statement .
2. Used CASE WHEN to allocate points by dates and product_name.
3. Filtered by the first day of February to get only points that apply to January.
4. Wrapped the CASE WHEN statement into the SUM function to add up the points for each customer. It’s at this point I dropped all the columns that were originally present in my CTE except for the customer_id column. This is because retaining those other columns was not going to display their actual 'group' representation while I grouped by the customer_id only which I was interested in. And also I had retained them previously to help me check if my query results were right.
WITH cte_OfferValidity AS
(SELECT s.customer_id, m.join_date, s.order_date,
FROM sales s
LEFT JOIN members m
ON s.customer_id = m.customer_id
SELECT customer_id,
SUM(CASE
WHEN order_date BETWEEN join_date AND firstweek_ends THEN 20 * price
WHEN (order_date NOT BETWEEN join_date AND firstweek_ends) AND product_name = 'sushi' THEN 20 * price
ELSE 10 * price
END) points
FROM cte_OfferValidity
WHERE order_date < '2021-02-01' -- filter jan points only
GROUP BY customer_id;

Customer A and Customer B have 1370 points and 820 points respectively by the end of January 2021.

Bonus Questions

Join All The Things
Recreate the table with: customer_id, order_date, product_name, price, member (Y/N)

CASE
WHEN s.order_date >= '2021-01-07' AND m.join_date IS NOT NULL THEN 'Y'
WHEN s.order_date >= '2021-01-09' AND m.join_date IS NOT NULL THEN 'Y'
ELSE 'N'
END AS member
FROM sales s
LEFT JOIN members m
ON s.customer_id = m.customer_id;

Rank All The Things
Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program.

WITH cte AS
CASE
WHEN s.order_date >= '2021-01-07' AND m.join_date IS NOT NULL THEN 'Y'
WHEN s.order_date >= '2021-01-09' AND m.join_date IS NOT NULL THEN 'Y'
ELSE 'N'
END AS member
FROM sales s
LEFT JOIN members m
ON s.customer_id = m.customer_id)
SELECT *,
CASE
WHEN member = 'N' THEN NULL
ELSE RANK() OVER w
END AS ranking
FROM cte
WINDOW w AS (PARTITION BY s.customer_id, member ORDER BY s.order_date)

Summary of insights

From the analysis, I discovered a few interesting insights that would be certainly useful for Danny.

• Customer B is the most frequent visitor with 6 visits in Jan 2021.
• Danny’s Diner’s most popular item is ramen.
• Customer A and C loves ramen whereas Customer B seems to enjoy sushi, curry and ramen equally.
• Customer A is the 1st member of Danny’s Diner and his first order is curry.
• Before they became members, Customer A and Customer B spent \$25 and \$40 respectively.
• Throughout Jan 2021, Customer A, Customer B and Customer C had 860 points, 940 points and Customer C: 360 points respectively.
• Assuming that members can earn 2x points a week from the day they became a member — not just sushi, Customer A has 1370 points and Customer B has 820 points by the end of Jan 2021.

It’s a wrap!

Feel free to share your opinion about my analysis in the comments. Suggestions on how to optimize my SQL code for performance are also welcome.

Happy querying folks 👋