Deleting Duplicates in SQL

In SQL handlingn duplicate records is essential for maintaining data accuracy, optimizing query performance, and ensuring consistent results.This article explores some practicle techniques to identify and delete duplicate rows using SQL queries.

Delete Duplicates Using a Unique Identifier

Consider the above code block

DELETE FROM cars
WHERE id IN (
    SELECT MAX(id)
    FROM cars
    GROUP BY model, brand
    HAVING COUNT(1) > 1
);

The above code selects the records from cars and in the above code we want to delete the record that are duplicate in the model and brand we use the id to identify the MAX(we can only get one max) and we delete the only max record

Delete Duplicates Using a Self-Join

In this approach, we use a self-join to identify and delete duplicate rows in the cars table, keeping only the row with the smallest id for each unique combination of model and brand

DELETE FROM cars
WHERE id IN (
    SELECT c2.id
    FROM cars c1
    JOIN cars c2 ON c1.model = c2.model
                 AND c1.brand = c2.brand
    WHERE c1.id < c2.id
);

Delete Duplicates Using a Window Function

DELETE FROM cars
WHERE id IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY model, brand ORDER BY id) AS rn
        FROM cars
    ) AS x
    WHERE x.rn > 1
);

In the above case in the inner subquery, the ROW_NUMBER() function assigns a unique row number to each row within each group of duplicates (defined by model and brand)

Using MIN function

delete from cars
where id not in ( select min(id)
                  from cars
                  group by model, brand);

The inner subquery SELECT MIN(id) FROM cars GROUP BY model, brand finds the lowest id for each unique combination of model and brand. This ensures that only one record for each car model and brand pair is retained
The DELETE FROM cars WHERE id NOT IN (...) statement removes records with IDs that aren't the minimum for their model and brand group. Essentially, this keeps only the oldest record and removes duplicates.

DEV Community

Deleting Duplicates in SQL

Delete Duplicates Using a Unique Identifier

Delete Duplicates Using a Self-Join

Delete Duplicates Using a Window Function

Using MIN function

Top comments (0)

Read next

MBA em Desenvolvimento de Software Avançado com foco em Ruby on Rails

Build a Conversational Generative AI Assistant Using Amazon Q Business – A Step-by-Step Guide

Is ChatGPT Losing Its Mind? The Curious Case of Declining LLM Performance

TEST