Simple query to remove duplicates from table

#sql #mysql #postgres #tutorial

In this small post I want to share the simple way to remove duplicates from table. The query works in MySQL, MariaDB and PostgreSQL databases. If you interested in such query for other RDBMS, please write me in comments.
Let's start. Assume we have simple table with two columns: id - is primary key and v simple integer value:

create table t (
    id int primary key,
    v int
);

insert into t(id, v) values 
(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 1),(7, 2),(8, 3),(9, 2),(10, 4),(11, 3);

The code above create the table and insert couple of values. As you can see id have unique values, v have several duplicates:

+====+===+
| id | v |
+====+===+
| 1  | 1 |
| 2  | 1 |
| 3  | 2 |
| 4  | 2 |
| 5  | 1 |
| 6  | 1 |
| 7  | 2 |
| 8  | 3 |
| 9  | 2 |
| 10 | 4 |
| 11 | 3 |
+----+---+

Our mission is remove rows this duplicates in column v and rest unique values with minimal id value.
How we can find the duplicates? We can use simple LEFT JOIN on field v with additional condition for prevent joins rows itself:

select * 
from t
left join t t1 on t.v = t1.v and t.id > t1.id;

The query give us next result:

+====+===+========+========+
| id | v | id     | v      |
+====+===+========+========+
| 1  | 1 | (null) | (null) |
| 2  | 1 | 1      | 1      |
| 3  | 2 | (null) | (null) |
| 4  | 2 | 3      | 2      |
| 5  | 1 | 1      | 1      |
| 5  | 1 | 2      | 1      |
| 6  | 1 | 1      | 1      |
| 6  | 1 | 2      | 1      |
| 6  | 1 | 5      | 1      |
| 7  | 2 | 3      | 2      |
| 7  | 2 | 4      | 2      |
| 8  | 3 | (null) | (null) |
| 9  | 2 | 3      | 2      |
| 9  | 2 | 4      | 2      |
| 9  | 2 | 7      | 2      |
| 10 | 4 | (null) | (null) |
| 11 | 3 | 8      | 3      |
+----+---+--------+--------+

We can see the unique rows with minimal id have (null) values in last columns. So we need to remove rest. We can done this in next simple query:

delete t.*
from t
left join t t1 on t.v = t1.v and t.id > t1.id
where t1.id is not null;

Just run in on SQLize.online and see the result

P.S. After this publication my colleague @Akina suggested a shorter version:

delete t.*
from t
join t t1 on t.v = t1.v and t.id > t1.id;

DEV Community

Simple query to remove duplicates from table

Top comments (0)