Retiago Drago

Posted on

# Daily Leads and Partners | LeetCode | MSSQL

## The Problem

Consider a table `DailySales` with the following schema:

Column Name Type
date_id date
make_name varchar
partner_id int

The table contains the date and the name of the product sold as well as the IDs of the lead and partner to whom it was sold. The name consists of only lowercase English letters. For each `date_id` and `make_name`, you need to find the number of distinct `lead_id`'s and distinct `partner_id`'s. The result table should be returned in any order.

## Explanation

Consider the following sample `DailySales` table:

2020-12-8 toyota 0 1
2020-12-8 toyota 1 0
2020-12-8 toyota 1 2
2020-12-7 toyota 0 2
2020-12-7 toyota 0 1
2020-12-8 honda 1 2
2020-12-8 honda 2 1
2020-12-7 honda 0 1
2020-12-7 honda 1 2
2020-12-7 honda 2 1

For this input, the expected output would be:

2020-12-8 toyota 2 3
2020-12-7 toyota 1 2
2020-12-8 honda 2 2
2020-12-7 honda 3 2

Here, each row represents a distinct date and make_name, and the count of unique leads and partners for each combination.

## The Solution

Let's discuss two different approaches to solve this problem, highlight their strengths and weaknesses, and explain their underlying structures.

### Using COUNT(DISTINCT)

This approach directly counts the distinct `lead_id` and `partner_id` for each combination of `date_id` and `make_name`.

``````SELECT
date_id,
make_name,
COUNT(DISTINCT partner_id) [unique_partners]
FROM DailySales
GROUP BY
date_id,
make_name
``````

This query is fairly straightforward and easy to understand. It works by grouping the data by `date_id` and `make_name`, and then it counts the distinct leads and partners within each group. However, `COUNT(DISTINCT)` can be slow on large data sets. This query runtime is 1073ms, beating 22.3% of other submissions on LeetCode.

### Using DENSE_RANK()

This approach first ranks `lead_id` and `partner_id` within each group defined by `date_id` and `make_name`, and then it selects the maximum rank for each group as the count of distinct leads or partners.

``````WITH ranks AS (
SELECT
date_id,
make_name,
DENSE_RANK() OVER (PARTITION BY date_id, make_name ORDER BY partner_id) [partner_rank]
FROM DailySales
)
SELECT
date_id,
make_name,
(SELECT MAX(lead_rank) FROM ranks r WHERE r.date_id = ds.date_id AND r.make_name = ds.make_name) [unique_leads],
(SELECT MAX(partner_rank) FROM ranks r WHERE r.date_id = ds.date_id AND r.make_name = ds.make_name) [unique_partners]
FROM DailySales ds
GROUP BY
date_id,
make_name
``````

This approach, while more complex, eliminates the need for `COUNT(DISTINCT)`, which can improve performance in some situations. However, the inclusion of the subqueries in the SELECT clause may cause performance degradation when handling large datasets. This query runtime is 2467ms, beating 5.17% of other submissions on LeetCode.

## Conclusion

Both methods solve the problem, but their performances can vary based on the data set size and distribution. The `COUNT(DISTINCT)` method is simpler and performed better on LeetCode's platform, making it the preferred solution in this case. However, the performance may vary in real-world RDBMS depending on several factors such as data distribution and the database engine's optimizations.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.