## The Problem

We are given two tables: `Prices`

and `UnitsSold`

.

`Prices`

table structure:

Column Name | Type |
---|---|

product_id | int |

start_date | date |

end_date | date |

price | int |

The `(product_id, start_date, end_date)`

is the primary key for this table. Each row indicates the price of the `product_id`

in the period from `start_date`

to `end_date`

. There are no overlapping periods for the same `product_id`

.

`UnitsSold`

table structure:

Column Name | Type |
---|---|

product_id | int |

purchase_date | date |

units | int |

This table may contain duplicates. Each row indicates the `purchase_date`

, `units`

, and `product_id`

of each product sold.

The task is to write an SQL query to find the average selling price for each product, with `average_price`

rounded to 2 decimal places.

## Explanation

For example:

`Prices`

table:

product_id | start_date | end_date | price |
---|---|---|---|

1 | 2019-02-17 | 2019-02-28 | 5 |

1 | 2019-03-01 | 2019-03-22 | 20 |

2 | 2019-02-01 | 2019-02-20 | 15 |

2 | 2019-02-21 | 2019-03-31 | 30 |

`UnitsSold`

table:

product_id | purchase_date | units |
---|---|---|

1 | 2019-02-25 | 100 |

1 | 2019-03-01 | 15 |

2 | 2019-02-10 | 200 |

2 | 2019-03-22 | 30 |

Expected Output:

product_id | average_price |
---|---|

1 | 6.96 |

2 | 16.96 |

The average selling price is calculated by dividing the total price of a product by the number of products sold. For example, the average selling price for `product_id 1`

is calculated as ((100 * 5) + (15 * 20)) / 115 = 6.96.

## The Solution

We'll discuss three SQL solutions with varying degrees of complexity. The primary differences among these solutions lie in the SQL features they use, such as aggregate functions, window functions, and subqueries. Each strategy has its strengths and weaknesses.

### Source Code 1: Join with Aggregate Functions

This approach uses a simple JOIN operation combined with aggregate functions to calculate the average selling price.

```
SELECT
p.product_id,
ROUND(SUM(p.price*u.units) * 1.0 / SUM(u.units), 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
GROUP BY p.product_id
```

In this solution, we join the `Prices`

and `UnitsSold`

tables on `product_id`

and the condition that `purchase_date`

lies within the `start_date`

and `end_date`

. We then group the results by `product_id`

and calculate the average selling price.

Runtime: 995ms, beats 7.10% of submissions on LeetCode.

### Source Code 2: Window Functions

This approach makes use of window functions, which allow operations on a set of rows related to the current row. This technique can simplify complex queries.

```
SELECT DISTINCT
p.product_id,
ROUND(
SUM(p.price*u.units) OVER (PARTITION BY p.product_id)
* 1.0 / SUM(u.units) OVER (PARTITION BY p.product_id)
, 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
```

Here, the `OVER (PARTITION BY p.product_id)`

clause divides the result set into partitions based on `product_id`

, and the aggregate function `SUM`

is applied to each partition separately. This strategy enhances performance by avoiding repetitive calculations.

Runtime: 690ms, beats 25.97% of submissions on LeetCode.

### Source Code 3: Subqueries

This approach uses subqueries to calculate the total price and total units separately before joining the results to compute the average price.

```
SELECT
prices.product_id,
ROUND(total_price * 1.0 / total_units, 2) AS average_price
FROM
(
SELECT
p.product_id,
SUM(price * units) AS total_price
FROM
Prices p
JOIN
UnitsSold u
ON
p.product_id = u.product_id
AND u.purchase_date BETWEEN p.start_date AND p.end_date
GROUP BY
p.product_id
) prices
JOIN
(
SELECT
product_id,
SUM(units) AS total_units
FROM
UnitsSold
GROUP BY
product_id
) units
ON
prices.product_id = units.product_id
```

While subqueries can enhance readability, they might impact performance, as seen in this solution's slower runtime.

Runtime: 3141ms, beats 5.8% of submissions on LeetCode.

## Conclusion

These solutions offer different ways to approach the same problem, with trade-offs between readability and performance. Based on the LeetCode environment, Source Code 2 has the best performance, followed by Source Code 1, and then Source Code 3. However, real-world RDBMS performance may vary due to factors such as data distribution and indexing.

Ranked from best to worst based on LeetCode performance:

- Source Code 2
- Source Code 1
- Source Code 3

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

## Top comments (0)