Selecting duplicate records

#rails #postgres #sql

Selecting duplicate records might be useful in cases when you want to introduce new multi column index on existing data.

duplicates = ActiveRecord::Base.connection.execute('
  SELECT items.id, items.name
    FROM items
    WHERE items.id IN (
      SELECT
        id
        FROM (
          SELECT
          id,
          ROW_NUMBER() OVER w AS rnum
          FROM items
          WINDOW w AS (
            PARTITION BY entity_id, barcode
            ORDER BY id
            )
          ) t
          WHERE t.rnum > 1
       );
    ')

Our exported data set will have id and name, so that we can tell which rows we want to delete. And it will only show rows where entity_id and barcode would not be unique. That's an example, yours will be different.

After the cleanup, when the query above would return an empty result set we should be able to create following index without any issues:

add_index(:items, [:entity_id, :barcode])

Inspiration taken from the article Rails: migration: Adding a unique Index and deleting Duplicates

Image generated via Midjourney prompt: Heap of items where every object is represented twice; --ar 16:9

Top comments (1)

NullVoxPopuli • Apr 29 '23

A comment, by request

DEV Community

Selecting duplicate records

Top comments (1)

Read next

Rails' Partial Features You (didn't) Know

PostgreSQL as a Celery Broker

Aurora Limitless - Creation

Ensuring Data Integrity in Financial Transactions: The PostgreSQL Transaction Solution