We are given two tables:
Prices table structure:
(product_id, start_date, end_date) is the primary key for this table. Each row indicates the price of the
product_id in the period from
end_date. There are no overlapping periods for the same
UnitsSold table structure:
This table may contain duplicates. Each row indicates the
product_id of each product sold.
The task is to write an SQL query to find the average selling price for each product, with
average_price rounded to 2 decimal places.
The average selling price is calculated by dividing the total price of a product by the number of products sold. For example, the average selling price for
product_id 1 is calculated as ((100 * 5) + (15 * 20)) / 115 = 6.96.
We'll discuss three SQL solutions with varying degrees of complexity. The primary differences among these solutions lie in the SQL features they use, such as aggregate functions, window functions, and subqueries. Each strategy has its strengths and weaknesses.
This approach uses a simple JOIN operation combined with aggregate functions to calculate the average selling price.
SELECT p.product_id, ROUND(SUM(p.price*u.units) * 1.0 / SUM(u.units), 2) AS average_price FROM Prices p JOIN UnitsSold u ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date) GROUP BY p.product_id
In this solution, we join the
UnitsSold tables on
product_id and the condition that
purchase_date lies within the
end_date. We then group the results by
product_id and calculate the average selling price.
Runtime: 995ms, beats 7.10% of submissions on LeetCode.
This approach makes use of window functions, which allow operations on a set of rows related to the current row. This technique can simplify complex queries.
SELECT DISTINCT p.product_id, ROUND( SUM(p.price*u.units) OVER (PARTITION BY p.product_id) * 1.0 / SUM(u.units) OVER (PARTITION BY p.product_id) , 2) AS average_price FROM Prices p JOIN UnitsSold u ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
OVER (PARTITION BY p.product_id) clause divides the result set into partitions based on
product_id, and the aggregate function
SUM is applied to each partition separately. This strategy enhances performance by avoiding repetitive calculations.
Runtime: 690ms, beats 25.97% of submissions on LeetCode.
This approach uses subqueries to calculate the total price and total units separately before joining the results to compute the average price.
SELECT prices.product_id, ROUND(total_price * 1.0 / total_units, 2) AS average_price FROM ( SELECT p.product_id, SUM(price * units) AS total_price FROM Prices p JOIN UnitsSold u ON p.product_id = u.product_id AND u.purchase_date BETWEEN p.start_date AND p.end_date GROUP BY p.product_id ) prices JOIN ( SELECT product_id, SUM(units) AS total_units FROM UnitsSold GROUP BY product_id ) units ON prices.product_id = units.product_id
While subqueries can enhance readability, they might impact performance, as seen in this solution's slower runtime.
Runtime: 3141ms, beats 5.8% of submissions on LeetCode.
These solutions offer different ways to approach the same problem, with trade-offs between readability and performance. Based on the LeetCode environment, Source Code 2 has the best performance, followed by Source Code 1, and then Source Code 3. However, real-world RDBMS performance may vary due to factors such as data distribution and indexing.
Ranked from best to worst based on LeetCode performance:
- Source Code 2
- Source Code 1
- Source Code 3
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.