The task is to identify all duplicate emails from the
Person table where each row contains an email that has no uppercase letters. The
Person table has the following structure:
id is the primary key, and it's guaranteed that the
Consider the following sample input for the
In this case, the output will be:
email@example.com is a duplicate email as it occurs twice.
We present two SQL solutions with varying strategies and trade-offs.
The first solution creates a Common Table Expression (CTE) called
duplicate, which assigns a row number to each email, partitioning by the email itself. This effectively counts the instances of each email. The outer query then selects distinct emails where the row number (i.e., the count) is greater than one.
WITH duplicate AS ( SELECT email AS Email, ROW_NUMBER() OVER (PARTITION BY email ORDER BY email) AS rn FROM Person ) SELECT DISTINCT Email FROM duplicate WHERE rn > 1
This solution is relatively complex due to the use of window function and may be less performant, with a runtime of 790ms, which ranks in the 40.90% percentile.
The second solution uses
GROUP BY to group all emails and
HAVING to filter those with a count greater than one. It's simpler and more performant than the first, with a runtime of 641ms, which ranks in the 78.49% percentile.
SELECT email AS Email FROM Person GROUP BY email HAVING COUNT(*) > 1
Both solutions successfully identify duplicate emails. The first solution is more complex and potentially less performant but demonstrates the use of window functions. The second solution is simpler, more readable, and more performant, showing the power of grouping and aggregation in SQL. When writing SQL queries, consider trade-offs between complexity, performance, and readability.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.