DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

yaswanthteja
yaswanthteja

Posted on

8 Week SQL Challenge: Case Study #2 Pizza Runner

Image description

Introduction

Danny was scrolling through his Instagram feed when something really caught his eye β€” β€œ80s Retro Styling 🎸 and Pizza πŸ• Is The Future!”

Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire β€” so he had one more genius idea to combine with it β€” he was going to Uberize it β€” and so Pizza Runner was launched!

Danny started by recruiting β€œrunners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Table Relationship

  • customer_orders β€” Customers’ pizza orders with 1 row each for individual pizza with topping exclusions and extras, and order time.
  • runner_orders β€” Orders assigned to runners documenting the pickup time, distance and duration from Pizza Runner HQ to customer, and cancellation remark. runners β€” Runner IDs and registration date
  • pizza_names β€” Pizza IDs and name
  • pizza_recipes β€” Pizza IDs and topping names
  • pizza_toppings β€” Topping IDs and name

Image description

Case Study Questions

This case study has LOTS of questions β€” they are broken up by area of focus including:

  • A. Pizza Metrics
  • B. Runner and Customer Experience
  • C. Ingredient Optimisation
  • D. Pricing and Ratings
  • E. Bonus DML Challenges (DML = Data Manipulation Language)

Data Cleaning and Transformation

Before I start with the solutions, I investigate the data and found that there are some cleaning and transformation to do, specifically on the

  • null values and data types in the customer_orders table
  • null values and data types in the runner_orders table
  • Alter data type in pizza_names table

Firstly, to clean up exclusions and extras in the customer_orders β€” we create TEMP TABLE #customer_orders and use CASE WHEN.

SELECT order_id, customer_id, pizza_id, 
  CASE 
    WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
    ELSE exclusions
    END AS exclusions,
  CASE 
    WHEN extras IS NULL or extras LIKE 'null' THEN ' '
    ELSE extras 
    END AS extras, 
  order_time
INTO #customer_orders -- create TEMP TABLE
FROM customer_orders;
Enter fullscreen mode Exit fullscreen mode

Then, we clean the runner_orders table with CASE WHEN and TRIM and create TEMP TABLE #runner_orders.

In summary,

  • pickup_time β€” Remove nulls and replace with β€˜ β€˜
  • distance β€” Remove β€˜km’ and nulls
  • duration β€” Remove β€˜minutes’ and nulls
  • cancellation β€” Remove NULL and null and replace with β€˜ β€˜
SELECT order_id, runner_id,
  CASE 
    WHEN pickup_time LIKE 'null' THEN ' '
    ELSE pickup_time 
    END AS pickup_time,
  CASE 
    WHEN distance LIKE 'null' THEN ' '
    WHEN distance LIKE '%km' THEN TRIM('km' from distance) 
    ELSE distance END AS distance,
  CASE 
    WHEN duration LIKE 'null' THEN ' ' 
    WHEN duration LIKE '%mins' THEN TRIM('mins' from duration) 
    WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)        
    WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)       
    ELSE duration END AS duration,
  CASE 
    WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ''
    ELSE cancellation END AS cancellation
INTO #runner_orders
FROM runner_orders;
Enter fullscreen mode Exit fullscreen mode

Then, we alter the date according to its correct data type.

  • pickup_time to DATETIME type
  • distance to FLOAT type
  • duration to INT type
ALTER TABLE #runner_orders
ALTER COLUMN pickup_time DATETIME,
ALTER COLUMN distance FLOAT, 
ALTER COLUMN duration INT;
Enter fullscreen mode Exit fullscreen mode

Now that the data has been cleaned and transformed, let’s move on solving the questions! πŸ˜‰

Solution

A. Pizza Metrics

How many pizzas were ordered?

SELECT COUNT(*) AS pizza_order_count
FROM #customer_orders;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Total pizzas ordered are 14.
  • How many unique customer orders were made?
SELECT COUNT(DISTINCT order_id) AS unique_order_count
FROM #customer_orders;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • There are 10 unique customer orders made.
  1. How many successful orders were delivered by each runner?
SELECT runner_id, COUNT(order_id) AS successful_orders
FROM #runner_orders
WHERE distance != 0
GROUP BY runner_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Runner 1 has 4 successful delivered orders.
  • Runner 2 has 3 successful delivered orders.
  • Runner 3 has 1 successful delivered order.
  1. How many of each type of pizza was delivered?
SELECT p.pizza_name, COUNT(c.pizza_id) AS delivered_pizza_count
FROM #customer_orders AS c
JOIN #runner_orders AS r
 ON c.order_id = r.order_id
JOIN pizza_names AS p
 ON c.pizza_id = p.pizza_id
WHERE r.distance != 0
GROUP BY p.pizza_name;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • There are 9 delivered Meatlovers pizzas.
  • There are 3 delivered Vegetarian pizzas.
  1. How many Vegetarian and Meatlovers were ordered by each customer?
SELECT c.customer_id, p.pizza_name, COUNT(p.pizza_name) AS order_count
FROM #customer_orders AS c
JOIN pizza_names AS p
 ON c.pizza_id= p.pizza_id
GROUP BY c.customer_id, p.pizza_name
ORDER BY c.customer_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Customer 101 ordered 2 Meatlovers pizzas and 1 Vegetarian pizza.
  • Customer 102 ordered 2 Meatlovers pizzas and 2 Vegetarian pizzas.
  • Customer 103 ordered 3 Meatlovers pizzas and 1 Vegetarian pizza.
  • Customer 104 ordered 1 Meatlovers pizza.
  • Customer 105 ordered 1 Vegetarian pizza.
  1. What was the maximum number of pizzas delivered in a single order?
WITH pizza_count_cte AS
(
 SELECT c.order_id, COUNT(c.pizza_id) AS pizza_per_order
 FROM #customer_orders AS c
 JOIN #runner_orders AS r
  ON c.order_id = r.order_id
 WHERE r.distance != 0
 GROUP BY c.order_id
)
Enter fullscreen mode Exit fullscreen mode

Image description

SELECT MAX(pizza_per_order) AS pizza_count
FROM pizza_count_cte;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Maximum number of pizza delivered in a single order is 3 pizzas.
  1. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
SELECT c.customer_id,
 SUM(CASE 
  WHEN c.exclusions <> ' ' OR c.extras <> ' ' THEN 1
  ELSE 0
  END) AS at_least_1_change,
 SUM(CASE 
  WHEN c.exclusions = ' ' AND c.extras = ' ' THEN 1 
  ELSE 0
  END) AS no_change
FROM #customer_orders AS c
JOIN #runner_orders AS r
 ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.customer_id
ORDER BY c.customer_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Customer 101 and 102 likes his/her pizzas per the original recipe.
  • Customer 103, 104 and 105 have their own preference for pizza topping and requested at least 1 change (extra or exclusion topping) on their pizza.
  1. How many pizzas were delivered that had both exclusions and extras?
SELECT  
 SUM(CASE
  WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1
  ELSE 0
  END) AS pizza_count_w_exclusions_extras
FROM #customer_orders AS c
JOIN #runner_orders AS r
 ON c.order_id = r.order_id
WHERE r.distance >= 1 
 AND exclusions <> ' ' 
 AND extras <> ' ';
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Only 1 pizza delivered that had both extra and exclusion topping. That’s one fussy customer!
  1. What was the total volume of pizzas ordered for each hour of the day?
SELECT DATEPART(HOUR, [order_time]) AS hour_of_day, 
 COUNT(order_id) AS pizza_count
FROM #customer_orders
GROUP BY DATEPART(HOUR, [order_time]);
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Highest volume of pizza ordered is at 13 (1:00 pm), 18 (6:00 pm) and 21 (9:00 pm).
  • Lowest volume of pizza ordered is at 11 (11:00 am), 19 (7:00 pm) and 23 (11:00 pm).
  • What was the volume of orders for each day of the week?
SELECT FORMAT(DATEADD(DAY, 2, order_time),'dddd') AS day_of_week, 
-- add 2 to adjust 1st day of the week as Monday
 COUNT(order_id) AS total_pizzas_ordered
FROM #customer_orders
GROUP BY FORMAT(DATEADD(DAY, 2, order_time),'dddd');
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • There are 5 pizzas ordered on Friday and Monday.
  • There are 3 pizzas ordered on Saturday.
  • There is 1 pizza ordered on Sunday.

B. Runner and Customer Experience

How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)

SELECT DATEPART(WEEK, registration_date) AS registration_week,
 COUNT(runner_id) AS runner_signup
FROM runners
GROUP BY DATEPART(WEEK, registration_date);
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • On Week 1 of Jan 2021, 2 new runners signed up.
  • On Week 2 and 3 of Jan 2021, 1 new runner signed up.
  1. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
WITH time_taken_cte AS
(
 SELECT c.order_id, c.order_time, r.pickup_time, 
  DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS pickup_minutes
 FROM #customer_orders AS c
 JOIN #runner_orders AS r
  ON c.order_id = r.order_id
 WHERE r.distance != 0
 GROUP BY c.order_id, c.order_time, r.pickup_time
)
Enter fullscreen mode Exit fullscreen mode

Image description

SELECT AVG(pickup_minutes) AS avg_pickup_minutes
FROM time_taken_cte
WHERE pickup_minutes > 1;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • The average time taken in minutes by runners to arrive at Pizza Runner HQ to pick up the order is 15 minutes.
  1. Is there any relationship between the number of pizzas and how long the order takes to prepare?
WITH prep_time_cte AS
(
 SELECT c.order_id, COUNT(c.order_id) AS pizza_order, 
  c.order_time, r.pickup_time, 
  DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS prep_time_minutes
 FROM #customer_orders AS c
 JOIN #runner_orders AS r
  ON c.order_id = r.order_id
 WHERE r.distance != 0
 GROUP BY c.order_id, c.order_time, r.pickup_time
)
Enter fullscreen mode Exit fullscreen mode

Image description

SELECT pizza_order, AVG(prep_time_minutes) AS avg_prep_time_minutes
FROM prep_time_cte
WHERE prep_time_minutes > 1
GROUP BY pizza_order;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • On average, a single pizza order takes 12 minutes to prepare.
  • An order with 3 pizzas takes 30 minutes at an average of 10 minutes per pizza.
  • It takes 16 minutes to prepare an order with 2 pizzas which is 8 minutes per pizza β€” making 2 pizzas in a single order the ultimate efficiency rate.
  1. What was the average distance travelled for each customer?
SELECT c.customer_id, AVG(r.distance) AS avg_distance
FROM #customer_orders AS c
JOIN #runner_orders AS r
 ON c.order_id = r.order_id
WHERE r.duration != 0
GROUP BY c.customer_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

(Assuming that distance is calculated from Pizza Runner HQ to customer’s place)

  • Customer 104 stays the nearest to Pizza Runner HQ at average distance of 10km, whereas Customer 105 stays the furthest at 25km.
  1. What was the difference between the longest and shortest delivery times for all orders?

Firstly, let’s see all the durations for the orders.

SELECT order_id, duration
FROM #runner_orders
WHERE duration not like ' ';
Enter fullscreen mode Exit fullscreen mode

Image description

Then, we find the difference by deducting the shortest (MIN) from the longest (MAX) delivery times.

SELECT 
    MAX(duration::NUMERIC) - MIN(duration::NUMERIC) AS delivery_time_difference
FROM #runner_orders
WHERE duration not like '% %'
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • The difference between longest (40 minutes) and shortest (10 minutes) delivery time for all orders is 30 minutes.
  1. What was the average speed for each runner for each delivery and do you notice any trend for these values?
SELECT r.runner_id, c.customer_id, c.order_id, 
 COUNT(c.order_id) AS pizza_count, 
 r.distance, (r.duration / 60) AS duration_hr , 
 ROUND((r.distance/r.duration * 60), 2) AS avg_speed
FROM #runner_orders AS r
JOIN #customer_orders AS c
 ON r.order_id = c.order_id
WHERE distance != 0
GROUP BY r.runner_id, c.customer_id, c.order_id, r.distance, r.duration
ORDER BY c.order_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

(Average speed = Distance in km / Duration in hour)

  • Runner 1’s average speed runs from 37.5km/h to 60km/h.
  • Runner 2’s average speed runs from 35.1km/h to 93.6km/h. Danny should investigate Runner 2 as the average speed has a 300% fluctuation rate!
  • Runner 3’s average speed is 40km/h
  1. What is the successful delivery percentage for each runner?
SELECT runner_id, 
 ROUND(100 * SUM
  (CASE WHEN distance = 0 THEN 0
  ELSE 1
  END) / COUNT(*), 0) AS success_perc
FROM #runner_orders
GROUP BY runner_id;
Enter fullscreen mode Exit fullscreen mode

Image description

Answer:

  • Runner 1 has 100% successful delivery.
  • Runner 2 has 75% successful delivery.
  • Runner 3 has 50% successful delivery
    (It’s not right to attribute successful delivery to runners as order cancellations are out of the runner’s control.)

  • I will continue with Part A, B and C soon!

    C. Ingredient Optimisation

  • What are the standard ingredients for each pizza?

  • What was the most commonly added extra?

  • What was the most common exclusion?

  • Generate an order item for each record in the customers_orders table in the format of one of the following:

  • Meat Lovers

  • Meat Lovers - Exclude Beef

  • Meat Lovers - Extra Bacon

  • Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers

  1. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients

For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"

  1. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?

D. Pricing and Ratings

If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes β€” how much money has Pizza Runner made so far if there are no delivery fees?
What if there was an additional $1 charge for any pizza extras?

  • Add cheese is $1 extra
  1. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset β€” generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.

  2. Using your newly generated table β€” can you join all of the information together to form a table which has the following information for successful deliveries?

  • customer_id
  • order_id
  • runner_id -rating -order_time
  • pickup_time
  • Time between order and pickup
  • Delivery duration
  • Average speed
  • Total number of pizzas
  1. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre travelled β€” how much money does Pizza Runner have left over after these deliveries?

E. Bonus Questions
If Danny wants to expand his range of pizzas β€” how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?

Top comments (2)

Collapse
 
aarone4 profile image
Aaron Reese

I've not read the whole article (too long!) But your first two code blocks could have been achieved using ISNULL(), REPLACE() and CAST() and avoided the CASE statements and ALTER column types. Cleaner code and less steps.

Collapse
 
yaswanthteja profile image
yaswanthteja • Edited on

Hi Aaron thanks for your suggestion, i'm just started MySql .

πŸ‘‹ New to DEV?

Head over to our Welcome Thread and tell us a bit about yourself!