DEV Community

Abhay Singh Kathayat
Abhay Singh Kathayat

Posted on

Mastering SQL DISTINCT: Removing Duplicates Made Simple

What is the Purpose of the DISTINCT Keyword in SQL?

The DISTINCT keyword in SQL is used to remove duplicate rows from the result set of a query. It ensures that the query returns only unique records for the specified columns.


How Does DISTINCT Work?

When a SELECT query retrieves data, there may be duplicate rows in the output. By adding the DISTINCT keyword, SQL filters out these duplicates, keeping only one occurrence of each unique combination of values in the specified columns.


Syntax:

SELECT DISTINCT column1, column2, ...
FROM table_name;
Enter fullscreen mode Exit fullscreen mode

Examples:

1. Removing Duplicates

Consider a table Employees:

EmployeeID Department
1 HR
2 IT
3 HR
4 Sales

Query:

SELECT DISTINCT Department
FROM Employees;
Enter fullscreen mode Exit fullscreen mode

Result:

Department
HR
IT
Sales

Here, duplicates in the Department column are removed.


2. Selecting Unique Combinations

Consider another table Orders:

OrderID CustomerID ProductID
101 1 A
102 1 B
103 1 A
104 2 C

Query:

SELECT DISTINCT CustomerID, ProductID
FROM Orders;
Enter fullscreen mode Exit fullscreen mode

Result:

CustomerID ProductID
1 A
1 B
2 C

Here, DISTINCT filters out duplicate rows based on the combination of CustomerID and ProductID.


When to Use DISTINCT

  1. To Find Unique Values:

    When you want to know all the unique values in a column or combination of columns.

    Example: Listing all distinct product categories in an inventory.

  2. To Eliminate Redundancy:

    When duplicate rows in the result set are not needed for analysis or reporting.

    Example: Fetching unique department names from an employee database.

  3. For Data Cleanup:

    Helps in preparing clean datasets by removing duplicates.


Limitations of DISTINCT

  1. Performance Impact:

    Using DISTINCT can slow down queries, especially on large datasets, as SQL must scan and compare rows to filter duplicates.

  2. Not for Conditional Deduplication:

    If you want conditional deduplication (e.g., keeping the latest row for each unique value), you need other techniques like ROW_NUMBER().


Tips for Using DISTINCT

  1. Always use DISTINCT only when necessary, as it can add overhead to query execution.
  2. For complex filtering, consider alternatives like aggregations (GROUP BY) or analytic functions.

Conclusion

The DISTINCT keyword is a simple yet powerful tool to eliminate duplicate rows in SQL query results. It ensures that your output contains only unique values, making it ideal for data reporting and analysis tasks. However, use it judiciously to balance performance and accuracy.

Top comments (0)