DEV Community

Cover image for Remove duplicates rows with SQL
Michel
Michel

Posted on • Originally published at blog.pagesd.info

Remove duplicates rows with SQL

Last week I made a small update error on my application and I ended up with duplicates values in a table. Of course, this would not have happened if I had a unique key, but as I check before inserting, I thought I was safe.

Unfortunately, as I couldn't delete everything and just start updating data again, I had to figure out how to delete duplicates rows.

As a first step, I run a simple query to find out how much I was in trouble.

SELECT Place_ID, Event_ID, StartDate, COUNT(*)
FROM   Showings
GROUP BY Place_ID, Event_ID, StartDate
HAVING COUNT(*) > 1
Enter fullscreen mode Exit fullscreen mode

Good news first: there are no triplets :)

Less good news: I have more than a thousand rows to delete. So no way to do this by running one request after the other...

Good thing: since my table has a primary key, I can identify duplicate data:

SELECT Place_ID, Event_ID, StartDate, MAX(Showing_ID) AS ID
FROM   Showings
GROUP BY Place_ID, Event_ID, StartDate
HAVING COUNT(*) > 1
Enter fullscreen mode Exit fullscreen mode

This way, I find the IDs of all the rows added when there was already a record with the same data (Place_ID, Event_ID and StartDate). I only have to delete these useless values (since the others were there first) :

DELETE
FROM   Showings
WHERE  Showing_ID IN (
                       SELECT MAX(Showing_ID)
                       FROM   Showings
                       GROUP BY Place_ID, Event_ID, StartDate
                       HAVING COUNT(*) > 1
                      )
Enter fullscreen mode Exit fullscreen mode

Sometimes, IT is not that complicated.


This post was originally published on my blog.
Cover image : The Lady from Shanghai - Rita Hayworth

Top comments (0)