What is the Purpose of the DISTINCT
Keyword in SQL?
The DISTINCT
keyword in SQL is used to remove duplicate rows from the result set of a query. It ensures that the query returns only unique records for the specified columns.
How Does DISTINCT
Work?
When a SELECT
query retrieves data, there may be duplicate rows in the output. By adding the DISTINCT
keyword, SQL filters out these duplicates, keeping only one occurrence of each unique combination of values in the specified columns.
Syntax:
SELECT DISTINCT column1, column2, ...
FROM table_name;
Examples:
1. Removing Duplicates
Consider a table Employees
:
EmployeeID | Department |
---|---|
1 | HR |
2 | IT |
3 | HR |
4 | Sales |
Query:
SELECT DISTINCT Department
FROM Employees;
Result:
Department |
---|
HR |
IT |
Sales |
Here, duplicates in the Department
column are removed.
2. Selecting Unique Combinations
Consider another table Orders
:
OrderID | CustomerID | ProductID |
---|---|---|
101 | 1 | A |
102 | 1 | B |
103 | 1 | A |
104 | 2 | C |
Query:
SELECT DISTINCT CustomerID, ProductID
FROM Orders;
Result:
CustomerID | ProductID |
---|---|
1 | A |
1 | B |
2 | C |
Here, DISTINCT
filters out duplicate rows based on the combination of CustomerID
and ProductID
.
When to Use DISTINCT
To Find Unique Values:
When you want to know all the unique values in a column or combination of columns.
Example: Listing all distinct product categories in an inventory.To Eliminate Redundancy:
When duplicate rows in the result set are not needed for analysis or reporting.
Example: Fetching unique department names from an employee database.For Data Cleanup:
Helps in preparing clean datasets by removing duplicates.
Limitations of DISTINCT
Performance Impact:
UsingDISTINCT
can slow down queries, especially on large datasets, as SQL must scan and compare rows to filter duplicates.Not for Conditional Deduplication:
If you want conditional deduplication (e.g., keeping the latest row for each unique value), you need other techniques likeROW_NUMBER()
.
Tips for Using DISTINCT
- Always use
DISTINCT
only when necessary, as it can add overhead to query execution. - For complex filtering, consider alternatives like aggregations (
GROUP BY
) or analytic functions.
Conclusion
The DISTINCT
keyword is a simple yet powerful tool to eliminate duplicate rows in SQL query results. It ensures that your output contains only unique values, making it ideal for data reporting and analysis tasks. However, use it judiciously to balance performance and accuracy.
Top comments (0)