Introduction
Danny was scrolling through his Instagram feed when something really caught his eye — “80s Retro Styling 🎸 and Pizza 🍕 Is The Future!”
Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire — so he had one more genius idea to combine with it — he was going to Uberize it — and so Pizza Runner was launched!
Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
Table Relationship
- customer_orders — Customers’ pizza orders with 1 row each for individual pizza with topping exclusions and extras, and order time.
- runner_orders — Orders assigned to runners documenting the pickup time, distance and duration from Pizza Runner HQ to customer, and cancellation remark. runners — Runner IDs and registration date
- pizza_names — Pizza IDs and name
- pizza_recipes — Pizza IDs and topping names
- pizza_toppings — Topping IDs and name
Case Study Questions
This case study has LOTS of questions — they are broken up by area of focus including:
- A. Pizza Metrics
- B. Runner and Customer Experience
- C. Ingredient Optimisation
- D. Pricing and Ratings
- E. Bonus DML Challenges (DML = Data Manipulation Language)
Data Cleaning and Transformation
Before I start with the solutions, I investigate the data and found that there are some cleaning and transformation to do, specifically on the
-
null
values and data types in thecustomer_orders
table -
null
values and data types in therunner_orders
table - Alter data type in
pizza_names
table
Firstly, to clean up exclusions
and extras
in the customer_orders
— we create TEMP TABLE #customer_orders
and use CASE WHEN.
SELECT order_id, customer_id, pizza_id,
CASE
WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
ELSE exclusions
END AS exclusions,
CASE
WHEN extras IS NULL or extras LIKE 'null' THEN ' '
ELSE extras
END AS extras,
order_time
INTO #customer_orders -- create TEMP TABLE
FROM customer_orders;
Then, we clean the runner_orders
table with CASE WHEN and TRIM and create TEMP TABLE #runner_orders
.
In summary,
- pickup_time — Remove nulls and replace with ‘ ‘
- distance — Remove ‘km’ and nulls
- duration — Remove ‘minutes’ and nulls
- cancellation — Remove NULL and null and replace with ‘ ‘
SELECT order_id, runner_id,
CASE
WHEN pickup_time LIKE 'null' THEN ' '
ELSE pickup_time
END AS pickup_time,
CASE
WHEN distance LIKE 'null' THEN ' '
WHEN distance LIKE '%km' THEN TRIM('km' from distance)
ELSE distance END AS distance,
CASE
WHEN duration LIKE 'null' THEN ' '
WHEN duration LIKE '%mins' THEN TRIM('mins' from duration)
WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)
WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)
ELSE duration END AS duration,
CASE
WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ''
ELSE cancellation END AS cancellation
INTO #runner_orders
FROM runner_orders;
Then, we alter the date according to its correct data type.
- pickup_time to DATETIME type
- distance to FLOAT type
- duration to INT type
ALTER TABLE #runner_orders
ALTER COLUMN pickup_time DATETIME,
ALTER COLUMN distance FLOAT,
ALTER COLUMN duration INT;
Now that the data has been cleaned and transformed, let’s move on solving the questions! 😉
Solution
A. Pizza Metrics
How many pizzas were ordered?
SELECT COUNT(*) AS pizza_order_count
FROM #customer_orders;
Answer:
- Total pizzas ordered are 14.
- How many unique customer orders were made?
SELECT COUNT(DISTINCT order_id) AS unique_order_count
FROM #customer_orders;
Answer:
- There are 10 unique customer orders made.
- How many successful orders were delivered by each runner?
SELECT runner_id, COUNT(order_id) AS successful_orders
FROM #runner_orders
WHERE distance != 0
GROUP BY runner_id;
Answer:
- Runner 1 has 4 successful delivered orders.
- Runner 2 has 3 successful delivered orders.
- Runner 3 has 1 successful delivered order.
- How many of each type of pizza was delivered?
SELECT p.pizza_name, COUNT(c.pizza_id) AS delivered_pizza_count
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
JOIN pizza_names AS p
ON c.pizza_id = p.pizza_id
WHERE r.distance != 0
GROUP BY p.pizza_name;
Answer:
- There are 9 delivered Meatlovers pizzas.
- There are 3 delivered Vegetarian pizzas.
- How many Vegetarian and Meatlovers were ordered by each customer?
SELECT c.customer_id, p.pizza_name, COUNT(p.pizza_name) AS order_count
FROM #customer_orders AS c
JOIN pizza_names AS p
ON c.pizza_id= p.pizza_id
GROUP BY c.customer_id, p.pizza_name
ORDER BY c.customer_id;
Answer:
- Customer 101 ordered 2 Meatlovers pizzas and 1 Vegetarian pizza.
- Customer 102 ordered 2 Meatlovers pizzas and 2 Vegetarian pizzas.
- Customer 103 ordered 3 Meatlovers pizzas and 1 Vegetarian pizza.
- Customer 104 ordered 1 Meatlovers pizza.
- Customer 105 ordered 1 Vegetarian pizza.
- What was the maximum number of pizzas delivered in a single order?
WITH pizza_count_cte AS
(
SELECT c.order_id, COUNT(c.pizza_id) AS pizza_per_order
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id
)
SELECT MAX(pizza_per_order) AS pizza_count
FROM pizza_count_cte;
Answer:
- Maximum number of pizza delivered in a single order is 3 pizzas.
- For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
SELECT c.customer_id,
SUM(CASE
WHEN c.exclusions <> ' ' OR c.extras <> ' ' THEN 1
ELSE 0
END) AS at_least_1_change,
SUM(CASE
WHEN c.exclusions = ' ' AND c.extras = ' ' THEN 1
ELSE 0
END) AS no_change
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.customer_id
ORDER BY c.customer_id;
Answer:
- Customer 101 and 102 likes his/her pizzas per the original recipe.
- Customer 103, 104 and 105 have their own preference for pizza topping and requested at least 1 change (extra or exclusion topping) on their pizza.
- How many pizzas were delivered that had both exclusions and extras?
SELECT
SUM(CASE
WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1
ELSE 0
END) AS pizza_count_w_exclusions_extras
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance >= 1
AND exclusions <> ' '
AND extras <> ' ';
Answer:
- Only 1 pizza delivered that had both extra and exclusion topping. That’s one fussy customer!
- What was the total volume of pizzas ordered for each hour of the day?
SELECT DATEPART(HOUR, [order_time]) AS hour_of_day,
COUNT(order_id) AS pizza_count
FROM #customer_orders
GROUP BY DATEPART(HOUR, [order_time]);
Answer:
- Highest volume of pizza ordered is at 13 (1:00 pm), 18 (6:00 pm) and 21 (9:00 pm).
- Lowest volume of pizza ordered is at 11 (11:00 am), 19 (7:00 pm) and 23 (11:00 pm).
- What was the volume of orders for each day of the week?
SELECT FORMAT(DATEADD(DAY, 2, order_time),'dddd') AS day_of_week,
-- add 2 to adjust 1st day of the week as Monday
COUNT(order_id) AS total_pizzas_ordered
FROM #customer_orders
GROUP BY FORMAT(DATEADD(DAY, 2, order_time),'dddd');
Answer:
- There are 5 pizzas ordered on Friday and Monday.
- There are 3 pizzas ordered on Saturday.
- There is 1 pizza ordered on Sunday.
B. Runner and Customer Experience
How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
SELECT DATEPART(WEEK, registration_date) AS registration_week,
COUNT(runner_id) AS runner_signup
FROM runners
GROUP BY DATEPART(WEEK, registration_date);
Answer:
- On Week 1 of Jan 2021, 2 new runners signed up.
- On Week 2 and 3 of Jan 2021, 1 new runner signed up.
- What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
WITH time_taken_cte AS
(
SELECT c.order_id, c.order_time, r.pickup_time,
DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS pickup_minutes
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id, c.order_time, r.pickup_time
)
SELECT AVG(pickup_minutes) AS avg_pickup_minutes
FROM time_taken_cte
WHERE pickup_minutes > 1;
Answer:
- The average time taken in minutes by runners to arrive at Pizza Runner HQ to pick up the order is 15 minutes.
- Is there any relationship between the number of pizzas and how long the order takes to prepare?
WITH prep_time_cte AS
(
SELECT c.order_id, COUNT(c.order_id) AS pizza_order,
c.order_time, r.pickup_time,
DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS prep_time_minutes
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id, c.order_time, r.pickup_time
)
SELECT pizza_order, AVG(prep_time_minutes) AS avg_prep_time_minutes
FROM prep_time_cte
WHERE prep_time_minutes > 1
GROUP BY pizza_order;
Answer:
- On average, a single pizza order takes 12 minutes to prepare.
- An order with 3 pizzas takes 30 minutes at an average of 10 minutes per pizza.
- It takes 16 minutes to prepare an order with 2 pizzas which is 8 minutes per pizza — making 2 pizzas in a single order the ultimate efficiency rate.
- What was the average distance travelled for each customer?
SELECT c.customer_id, AVG(r.distance) AS avg_distance
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.duration != 0
GROUP BY c.customer_id;
Answer:
(Assuming that distance is calculated from Pizza Runner HQ to customer’s place)
- Customer 104 stays the nearest to Pizza Runner HQ at average distance of 10km, whereas Customer 105 stays the furthest at 25km.
- What was the difference between the longest and shortest delivery times for all orders?
Firstly, let’s see all the durations for the orders.
SELECT order_id, duration
FROM #runner_orders
WHERE duration not like ' ';
Then, we find the difference by deducting the shortest (MIN) from the longest (MAX) delivery times.
SELECT
MAX(duration::NUMERIC) - MIN(duration::NUMERIC) AS delivery_time_difference
FROM #runner_orders
WHERE duration not like '% %'
Answer:
- The difference between longest (40 minutes) and shortest (10 minutes) delivery time for all orders is 30 minutes.
- What was the average speed for each runner for each delivery and do you notice any trend for these values?
SELECT r.runner_id, c.customer_id, c.order_id,
COUNT(c.order_id) AS pizza_count,
r.distance, (r.duration / 60) AS duration_hr ,
ROUND((r.distance/r.duration * 60), 2) AS avg_speed
FROM #runner_orders AS r
JOIN #customer_orders AS c
ON r.order_id = c.order_id
WHERE distance != 0
GROUP BY r.runner_id, c.customer_id, c.order_id, r.distance, r.duration
ORDER BY c.order_id;
Answer:
(Average speed = Distance in km / Duration in hour)
- Runner 1’s average speed runs from 37.5km/h to 60km/h.
- Runner 2’s average speed runs from 35.1km/h to 93.6km/h. Danny should investigate Runner 2 as the average speed has a 300% fluctuation rate!
- Runner 3’s average speed is 40km/h
- What is the successful delivery percentage for each runner?
SELECT runner_id,
ROUND(100 * SUM
(CASE WHEN distance = 0 THEN 0
ELSE 1
END) / COUNT(*), 0) AS success_perc
FROM #runner_orders
GROUP BY runner_id;
Answer:
- Runner 1 has 100% successful delivery.
- Runner 2 has 75% successful delivery.
Runner 3 has 50% successful delivery
(It’s not right to attribute successful delivery to runners as order cancellations are out of the runner’s control.)-
I will continue with Part A, B and C soon!
C. Ingredient Optimisation
What are the standard ingredients for each pizza?
What was the most commonly added extra?
What was the most common exclusion?
Generate an order item for each record in the customers_orders table in the format of one of the following:
Meat Lovers
Meat Lovers
-Exclude Beef
Meat Lovers - Extra Bacon
Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
- Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
- What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
D. Pricing and Ratings
If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes — how much money has Pizza Runner made so far if there are no delivery fees?
What if there was an additional $1 charge for any pizza extras?
- Add cheese is $1 extra
The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset — generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
Using your newly generated table — can you join all of the information together to form a table which has the following information for successful deliveries?
customer_id
order_id
-
runner_id
-rating
-order_time
pickup_time
- Time between order and pickup
- Delivery duration
- Average speed
- Total number of pizzas
- If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre travelled — how much money does Pizza Runner have left over after these deliveries?
E. Bonus Questions
If Danny wants to expand his range of pizzas — how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?
Top comments (2)
I've not read the whole article (too long!) But your first two code blocks could have been achieved using ISNULL(), REPLACE() and CAST() and avoided the CASE statements and ALTER column types. Cleaner code and less steps.
Hi Aaron thanks for your suggestion, i'm just started MySql .