The Problem
The problem at hand involves two tables, Products
and Orders
.
Table: Products
Column Name | Type |
---|---|
product_id | int |
product_name | varchar |
product_category | varchar |
In this table, product_id
is the primary key, and it contains data about the company's products.
Table: Orders
Column Name | Type |
---|---|
product_id | int |
order_date | date |
unit | int |
The Orders
table does not have a primary key, which means it can contain duplicate rows. product_id
in this table is a foreign key referencing the Products
table. The unit
column represents the number of products ordered on order_date
.
The task is to write an SQL query to get the names of products that have at least 100 units ordered in February 2020 and their amount. The result can be returned in any order.
Explanation
Let's consider the following input data:
Products table:
product_id | product_name | product_category |
---|---|---|
1 | Leetcode Solutions | Book |
2 | Jewels of Stringology | Book |
3 | HP | Laptop |
4 | Lenovo | Laptop |
5 | Leetcode Kit | T-shirt |
Orders table:
product_id | order_date | unit |
---|---|---|
1 | 2020-02-05 | 60 |
1 | 2020-02-10 | 70 |
2 | 2020-01-18 | 30 |
2 | 2020-02-11 | 80 |
3 | 2020-02-17 | 2 |
3 | 2020-02-24 | 3 |
4 | 2020-03-01 | 20 |
4 | 2020-03-04 | 30 |
4 | 2020-03-04 | 60 |
5 | 2020-02-25 | 50 |
5 | 2020-02-27 | 50 |
5 | 2020-03-01 | 50 |
From the above input, we can deduce the following:
- Products with
product_id = 1
are ordered in February a total of (60 + 70) = 130. - Products with
product_id = 2
are ordered in February a total of 80. - Products with
product_id = 3
are ordered in February a total of (2 + 3) = 5. - Products with
product_id = 4
were not ordered in February 2020. - Products with
product_id = 5
are ordered in February a total of (50 + 50) = 100.
Thus, the expected output is:
product_name | unit |
---|---|
Leetcode Solutions | 130 |
Leetcode Kit | 100 |
The Solution
There are several ways to solve this problem. We will be discussing three different approaches: basic JOIN and aggregate functions, using the DISTINCT keyword with window function, and using a subquery. Each method has its own advantages and disadvantages in terms of readability, scalability, and execution speed.
Source Code 1
The first method uses a simple JOIN operation between the two tables on product_id
, filters the Orders
based on the month and year of order_date
, groups by product_name
, and sums the unit
values. Products with a total of 100 units or more are included in the result set.
SELECT
p.product_name,
SUM(o.unit) [unit]
FROM Products p JOIN Orders o ON p.product_id = o.product_id
WHERE MONTH(o.order_date) = 2
AND YEAR(o.order_date) = 2020
GROUP BY
p.product_name
HAVING SUM(o.unit) >= 100
The runtime for this solution is 856ms, beating 19.90% of submissions on LeetCode.
Source Code 2
The second approach uses the DISTINCT keyword along with a window function, providing a more sophisticated solution that can be useful in more complex scenarios.
SELECT
p.product_name,
sum_unit [unit]
FROM Products p JOIN (
SELECT DISTINCT
product_id,
SUM(unit) OVER (PARTITION BY product_id) [sum_unit]
FROM Orders
WHERE MONTH(order_date) = 2
AND YEAR(order_date) = 2020
) o ON p.product_id = o.product_id
WHERE sum_unit >= 100
This solution performs faster than the previous one, with a runtime of 613ms, beating 62.56% of submissions on LeetCode.
Source Code 3
The third approach uses a subquery in the JOIN operation. This method makes it easier to read and understand the code, but it can lead to slower execution in certain cases.
SELECT
p.product_name,
sum_unit [unit]
FROM Products p JOIN (
SELECT
product_id,
SUM(unit) [sum_unit]
FROM Orders
WHERE MONTH(order_date) = 2
AND YEAR(order_date) = 2020
GROUP BY product_id
) o ON p.product_id = o.product_id
WHERE sum_unit >= 100
This solution has a runtime of 1231ms, beating 7.42% of submissions on LeetCode.
Conclusion
Each of the above solutions provides a valid answer to the problem, albeit with different performance results on LeetCode.
According to the LeetCode metrics, the second solution (Source Code 2) performs the best, followed by the first solution (Source Code 1), and then the third solution (Source Code 3). However, it's important to note that in real-world scenarios, the performance might vary based on the specific RDBMS and the database structure and data volume.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)