This case study focuses on utilizing PostgreSQL, a popular relational database management system, to analyze sales data for Tiny Shop. By leveraging various SQL functionalities such as aggregations, CASE WHEN statements, window functions, joins, date-time functions, and Common Table Expressions (CTEs), we aim to extract valuable insights and answer key questions related to product prices, customer orders, revenue, and more.
Overview of the Data:
The Tiny Shop sales database contains tables such as Products, Customers, Orders, and Order_Items. The Products table includes information about the products, including their prices. The Customers table stores customer details, while the Orders table contains order-specific information. The Order_Items table links the orders with the products and includes quantity information.
The challenge for today is provided by Data in Motion and you can find the dataset here SQL Case Study 1: Tiny Shop Sales — Data in Motion (d-i-motion.com)
CREATE DATABASE SalesDB;
Lets create table in our database:
-- Customer Table CREATE TABLE customers ( customer_id integer PRIMARY KEY, first_name varchar(100), last_name varchar(100), email varchar(100) );
-- Products Table CREATE TABLE products ( product_id integer PRIMARY KEY, product_name varchar(100), price decimal );
-- Orders Table CREATE TABLE orders ( order_id integer PRIMARY KEY, customer_id integer, order_date date );
-- Order Items Table CREATE TABLE order_items ( order_id integer, product_id integer, quantity integer );
Lets insert data into table
-- Customers Table INSERT INTO customers (customer_id, first_name, last_name, email) VALUES (1, 'John', 'Doe', 'email@example.com'), (2, 'Jane', 'Smith', 'firstname.lastname@example.org'), (3, 'Bob', 'Johnson', 'email@example.com'), (4, 'Alice', 'Brown', 'firstname.lastname@example.org'), (5, 'Charlie', 'Davis', 'email@example.com'), (6, 'Eva', 'Fisher', 'firstname.lastname@example.org'), (7, 'George', 'Harris', 'email@example.com'), (8, 'Ivy', 'Jones', 'firstname.lastname@example.org'), (9, 'Kevin', 'Miller', 'email@example.com'), (10, 'Lily', 'Nelson', 'firstname.lastname@example.org'), (11, 'Oliver', 'Patterson', 'email@example.com'), (12, 'Quinn', 'Roberts', 'firstname.lastname@example.org'), (13, 'Sophia', 'Thomas', 'email@example.com');
-- Products Table INSERT INTO products (product_id, product_name, price) VALUES (1, 'Product A', 10.00), (2, 'Product B', 15.00), (3, 'Product C', 20.00), (4, 'Product D', 25.00), (5, 'Product E', 30.00), (6, 'Product F', 35.00), (7, 'Product G', 40.00), (8, 'Product H', 45.00), (9, 'Product I', 50.00), (10, 'Product J', 55.00), (11, 'Product K', 60.00), (12, 'Product L', 65.00), (13, 'Product M', 70.00);
-- Orders Table INSERT INTO orders (order_id, customer_id, order_date) VALUES (1, 1, '2023-05-01'), (2, 2, '2023-05-02'), (3, 3, '2023-05-03'), (4, 1, '2023-05-04'), (5, 2, '2023-05-05'), (6, 3, '2023-05-06'), (7, 4, '2023-05-07'), (8, 5, '2023-05-08'), (9, 6, '2023-05-09'), (10, 7, '2023-05-10'), (11, 8, '2023-05-11'), (12, 9, '2023-05-12'), (13, 10, '2023-05-13'), (14, 11, '2023-05-14'), (15, 12, '2023-05-15'), (16, 13, '2023-05-16');
-- Order Items INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 1, 2), (1, 2, 1), (2, 2, 1), (2, 3, 3), (3, 1, 1), (3, 3, 2), (4, 2, 4), (4, 3, 1), (5, 1, 1), (5, 3, 2), (6, 2, 3), (6, 1, 1), (7, 4, 1), (7, 5, 2), (8, 6, 3), (8, 7, 1), (9, 8, 2), (9, 9, 1), (10, 10, 3), (10, 11, 2), (11, 12, 1), (11, 13, 3), (12, 4, 2), (12, 5, 1), (13, 6, 3), (13, 7, 2), (14, 8, 1), (14, 9, 2), (15, 10, 3), (15, 11, 1), (16, 12, 2), (16, 13, 3);
--1. Which product has the highest price? Only return a single row.
To determine the product with the highest price, a simple query using the MAX function on the price column of the Products table can provide the desired result.
SELECT product_name, price FROM products ORDER BY price DESC LIMIT 1;
--2. Which customer has made the most orders?
By utilizing an aggregation function like COUNT, we can group the orders by customer and retrieve the customer with the highest count.
SELECT customers.first_name, customers.last_name, COUNT(orders.order_id) AS order_count FROM customers JOIN orders ON customers.customer_id = orders.customer_id GROUP BY customers.first_name, customers.last_name ORDER BY order_count DESC LIMIT 1;
--3. What's the total revenue per product?
To calculate the total revenue per product, we can use joins and aggregations to combine the necessary tables and calculate the sum of the revenue for each product.
SELECT pro.product_name, SUM(pro.price * items.quantity) as total_revenue FROM products pro JOIN order_items items ON pro.product_id = items.product_id GROUP BY pro.product_name ORDER BY total_revenue DESC;
--4. Find the day with the highest revenue.
The SQL query retrieves the order date and the total revenue for each order, calculated by multiplying the price of each product with its corresponding quantity.
SELECT ord.order_date, SUM(pro.price * items.quantity) total_revenue FROM products pro JOIN order_items items ON pro.product_id = items.product_id JOIN orders ord ON items.order_id = ord.order_id GROUP BY ord.order_date ORDER BY total_revenue DESC LIMIT 1;
--5. Find the first order (by date) for each customer.
The SQL query retrieves the customer details of each customer, along with the minimum order date (min(ord.order_date)) for each customer. The results are then grouped by the customer's first name, last name, and order date. Finally, the results are sorted in ascending order based on the first order date.
SELECT cus.first_name, cus.last_name, min(ord.order_date) first_order FROM customers cus JOIN orders ord ON cus.customer_id = ord.customer_id GROUP BY cus.first_name, cus.last_name, ord.order_date ORDER BY first_order;
--6) Find the top 3 customers who have ordered the most distinct products
The SQL query retrieves the customer details of each customer, along with the count of distinct product names ordered by each customer.
SELECT cust.first_name, cust.last_name, COUNT(DISTINCT product_name) unique_products FROM customers cust JOIN orders ord ON cust.customer_id = ord.customer_id JOIN order_items items ON ord.order_id = items.order_id JOIN products prod ON items.product_id = prod.product_id GROUP BY cust.first_name, cust.last_name ORDER BY unique_products DESC LIMIT 3;
--7. Which product has been bought the least in terms of quantity?
The SQL query retrieves the product ID and the sum of quantities for each product. The results are then grouped by the product ID and ordered in ascending order based on the total quantities.
SELECT prod.product_id, SUM(items.quantity) Total_Quantities FROM order_items items JOIN products prod ON items.product_id = prod.product_id GROUP BY prod.product_id ORDER BY Total_Quantities LIMIT 3;
--8) What is the median order total?
Using window functions and aggregations, we can calculate the median order total by sorting the orders by total and selecting the middle value.
WITH order_totals AS ( SELECT ord.order_id, SUM(prod.price * items.quantity) AS total FROM orders ord JOIN order_items items ON ord.order_id = items.order_id JOIN products prod ON items.product_id = prod.product_id GROUP BY ord.order_id ) SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS median_order_total FROM order_totals;
--9) For each order, determine if it was ‘Expensive’ (total over 300), ‘Affordable’ (total over 100), or ‘Cheap’.
Using CASE WHEN statements, we can classify each order based on its total amount into categories like 'Expensive', 'Affordable', or 'Cheap'.
SELECT order_id, CASE WHEN revenue > 300 THEN 'Expensive' WHEN revenue > 100 THEN 'Affordable' ELSE 'Cheap' END AS price_bracket FROM ( SELECT order_id, sum((price * quantity)) as revenue FROM products prod LEFT JOIN order_items items ON prod.product_id = items.product_id GROUP BY order_id ) as total_order;
--10) Find customers who have ordered the product with the highest price.
Through joins and subqueries, we can retrieve the customers who have ordered the product with the highest price by matching the product's price with the orders.
SELECT customer_id, cust.first_name, cust.last_name FROM customers cust LEFT JOIN orders USING (customer_id) LEFT JOIN order_items items USING (order_id) LEFT JOIN products prod USING (product_id) WHERE price = ( SELECT MAX(price) FROM products );
This case study highlights the practical application of PostgreSQL and SQL functionalities in analyzing sales data for Tiny Shop. By utilizing various SQL techniques, we can gain valuable insights into product pricing, customer behavior, revenue analysis, and order categorization. The versatility of PostgreSQL empowers businesses to extract meaningful information from their data, enabling informed decision-making and facilitating the growth and success of their operations.