This case study focuses on utilizing PostgreSQL, a popular relational database management system, to analyze sales data for Tiny Shop. By leveraging various SQL functionalities such as aggregations, CASE WHEN statements, window functions, joins, date-time functions, and Common Table Expressions (CTEs), we aim to extract valuable insights and answer key questions related to product prices, customer orders, revenue, and more.
Overview of the Data:
The Tiny Shop sales database contains tables such as Products, Customers, Orders, and Order_Items. The Products table includes information about the products, including their prices. The Customers table stores customer details, while the Orders table contains order-specific information. The Order_Items table links the orders with the products and includes quantity information.
The challenge for today is provided by Data in Motion and you can find the dataset here SQL Case Study 1: Tiny Shop Sales — Data in Motion (d-i-motion.com)
CREATE DATABASE SalesDB;
Lets create table in our database:
-- Customer Table
CREATE TABLE customers (
customer_id integer PRIMARY KEY,
first_name varchar(100),
last_name varchar(100),
email varchar(100)
);
-- Products Table
CREATE TABLE products (
product_id integer PRIMARY KEY,
product_name varchar(100),
price decimal
);
-- Orders Table
CREATE TABLE orders (
order_id integer PRIMARY KEY,
customer_id integer,
order_date date
);
-- Order Items Table
CREATE TABLE order_items (
order_id integer,
product_id integer,
quantity integer
);
Lets insert data into table
-- Customers Table
INSERT INTO customers (customer_id, first_name, last_name, email) VALUES
(1, 'John', 'Doe', 'johndoe@email.com'),
(2, 'Jane', 'Smith', 'janesmith@email.com'),
(3, 'Bob', 'Johnson', 'bobjohnson@email.com'),
(4, 'Alice', 'Brown', 'alicebrown@email.com'),
(5, 'Charlie', 'Davis', 'charliedavis@email.com'),
(6, 'Eva', 'Fisher', 'evafisher@email.com'),
(7, 'George', 'Harris', 'georgeharris@email.com'),
(8, 'Ivy', 'Jones', 'ivyjones@email.com'),
(9, 'Kevin', 'Miller', 'kevinmiller@email.com'),
(10, 'Lily', 'Nelson', 'lilynelson@email.com'),
(11, 'Oliver', 'Patterson', 'oliverpatterson@email.com'),
(12, 'Quinn', 'Roberts', 'quinnroberts@email.com'),
(13, 'Sophia', 'Thomas', 'sophiathomas@email.com');
-- Products Table
INSERT INTO products (product_id, product_name, price) VALUES
(1, 'Product A', 10.00),
(2, 'Product B', 15.00),
(3, 'Product C', 20.00),
(4, 'Product D', 25.00),
(5, 'Product E', 30.00),
(6, 'Product F', 35.00),
(7, 'Product G', 40.00),
(8, 'Product H', 45.00),
(9, 'Product I', 50.00),
(10, 'Product J', 55.00),
(11, 'Product K', 60.00),
(12, 'Product L', 65.00),
(13, 'Product M', 70.00);
-- Orders Table
INSERT INTO orders (order_id, customer_id, order_date) VALUES
(1, 1, '2023-05-01'),
(2, 2, '2023-05-02'),
(3, 3, '2023-05-03'),
(4, 1, '2023-05-04'),
(5, 2, '2023-05-05'),
(6, 3, '2023-05-06'),
(7, 4, '2023-05-07'),
(8, 5, '2023-05-08'),
(9, 6, '2023-05-09'),
(10, 7, '2023-05-10'),
(11, 8, '2023-05-11'),
(12, 9, '2023-05-12'),
(13, 10, '2023-05-13'),
(14, 11, '2023-05-14'),
(15, 12, '2023-05-15'),
(16, 13, '2023-05-16');
-- Order Items
INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 2),
(1, 2, 1),
(2, 2, 1),
(2, 3, 3),
(3, 1, 1),
(3, 3, 2),
(4, 2, 4),
(4, 3, 1),
(5, 1, 1),
(5, 3, 2),
(6, 2, 3),
(6, 1, 1),
(7, 4, 1),
(7, 5, 2),
(8, 6, 3),
(8, 7, 1),
(9, 8, 2),
(9, 9, 1),
(10, 10, 3),
(10, 11, 2),
(11, 12, 1),
(11, 13, 3),
(12, 4, 2),
(12, 5, 1),
(13, 6, 3),
(13, 7, 2),
(14, 8, 1),
(14, 9, 2),
(15, 10, 3),
(15, 11, 1),
(16, 12, 2),
(16, 13, 3);
QUESTIONS
--1. Which product has the highest price? Only return a single row.
To determine the product with the highest price, a simple query using the MAX function on the price column of the Products table can provide the desired result.
SELECT
product_name,
price
FROM products
ORDER BY price DESC
LIMIT 1;
--2. Which customer has made the most orders?
By utilizing an aggregation function like COUNT, we can group the orders by customer and retrieve the customer with the highest count.
SELECT
customers.first_name,
customers.last_name,
COUNT(orders.order_id) AS order_count
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
GROUP BY customers.first_name, customers.last_name
ORDER BY order_count DESC
LIMIT 1;
--3. What's the total revenue per product?
To calculate the total revenue per product, we can use joins and aggregations to combine the necessary tables and calculate the sum of the revenue for each product.
SELECT
pro.product_name,
SUM(pro.price * items.quantity) as total_revenue
FROM products pro
JOIN order_items items ON pro.product_id = items.product_id
GROUP BY pro.product_name
ORDER BY total_revenue DESC;
--4. Find the day with the highest revenue.
The SQL query retrieves the order date and the total revenue for each order, calculated by multiplying the price of each product with its corresponding quantity.
SELECT
ord.order_date,
SUM(pro.price * items.quantity) total_revenue
FROM products pro
JOIN order_items items ON pro.product_id = items.product_id
JOIN orders ord ON items.order_id = ord.order_id
GROUP BY ord.order_date
ORDER BY total_revenue DESC
LIMIT 1;
--5. Find the first order (by date) for each customer.
The SQL query retrieves the customer details of each customer, along with the minimum order date (min(ord.order_date)) for each customer. The results are then grouped by the customer's first name, last name, and order date. Finally, the results are sorted in ascending order based on the first order date.
SELECT
cus.first_name,
cus.last_name,
min(ord.order_date) first_order
FROM customers cus
JOIN orders ord ON cus.customer_id = ord.customer_id
GROUP BY cus.first_name, cus.last_name, ord.order_date
ORDER BY first_order;
--6) Find the top 3 customers who have ordered the most distinct products
The SQL query retrieves the customer details of each customer, along with the count of distinct product names ordered by each customer.
SELECT
cust.first_name,
cust.last_name,
COUNT(DISTINCT product_name) unique_products
FROM customers cust
JOIN orders ord ON cust.customer_id = ord.customer_id
JOIN order_items items ON ord.order_id = items.order_id
JOIN products prod ON items.product_id = prod.product_id
GROUP BY cust.first_name, cust.last_name
ORDER BY unique_products DESC
LIMIT 3;
--7. Which product has been bought the least in terms of quantity?
The SQL query retrieves the product ID and the sum of quantities for each product. The results are then grouped by the product ID and ordered in ascending order based on the total quantities.
SELECT
prod.product_id,
SUM(items.quantity) Total_Quantities
FROM order_items items
JOIN products prod ON items.product_id = prod.product_id
GROUP BY prod.product_id
ORDER BY Total_Quantities
LIMIT 3;
--8) What is the median order total?
Using window functions and aggregations, we can calculate the median order total by sorting the orders by total and selecting the middle value.
WITH order_totals AS (
SELECT
ord.order_id,
SUM(prod.price * items.quantity) AS total
FROM orders ord
JOIN order_items items ON ord.order_id = items.order_id
JOIN products prod ON items.product_id = prod.product_id
GROUP BY ord.order_id
)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS median_order_total
FROM order_totals;
--9) For each order, determine if it was ‘Expensive’ (total over 300), ‘Affordable’ (total over 100), or ‘Cheap’.
Using CASE WHEN statements, we can classify each order based on its total amount into categories like 'Expensive', 'Affordable', or 'Cheap'.
SELECT
order_id,
CASE
WHEN revenue > 300 THEN 'Expensive'
WHEN revenue > 100 THEN 'Affordable'
ELSE 'Cheap'
END AS price_bracket
FROM (
SELECT
order_id,
sum((price * quantity)) as revenue
FROM products prod
LEFT JOIN order_items items ON prod.product_id = items.product_id
GROUP BY order_id
) as total_order;
--10) Find customers who have ordered the product with the highest price.
Through joins and subqueries, we can retrieve the customers who have ordered the product with the highest price by matching the product's price with the orders.
SELECT
customer_id,
cust.first_name,
cust.last_name
FROM customers cust
LEFT JOIN orders
USING (customer_id)
LEFT JOIN order_items items
USING (order_id)
LEFT JOIN products prod
USING (product_id)
WHERE price = (
SELECT MAX(price)
FROM products
);
Conclusion:
This case study highlights the practical application of PostgreSQL and SQL functionalities in analyzing sales data for Tiny Shop. By utilizing various SQL techniques, we can gain valuable insights into product pricing, customer behavior, revenue analysis, and order categorization. The versatility of PostgreSQL empowers businesses to extract meaningful information from their data, enabling informed decision-making and facilitating the growth and success of their operations.
Top comments (2)
Nice article. Good explanation. You can try this SQL on SQLIZE.online
Thank you for reading the article and sharing the resource. I'll try it.