Cover image Credit: @thesollers
- you understand rails framework
- you have duplicates in your database already
- you have existing records in your database that are duplicates and in violation of
We found that a method in one of our controllers was using
find_or_create_by , and according to this article this method does not use locking, so what can happen if two writes are happening simultaneously, then duplicates are created in the database.
This was conflicting with our business logic, but were harmless, but nonetheless bloating our tables
# for brevity this table will be called table_1 # Indexes # # index_on_specific_table_1 (something_id) # # Foreign Keys # # fk_rails_... (something_id => random_record.id) #
As you can see in this contrived example we have a index already on the column
something_id , as well as an index.
The solution as laid out by my boss was two fold.
a. create a migration that would delete all duplicate instances
b. add a
UNIQUE index field to the record
rescue the exception that was thrown for
ActiveRecord::RecordNotUnique, and retry
Inside of the migration the first task was to find a way to destroy all duplicates before running applying the
UNIQUE index key to the column in the table, because if we tried to remove the index , the existing duplicates would throw an error and fail the migration.
so in order to clean up duplicates in the tables , i found this nifty sQL command that took care of all the heavy lifting, from a Wiki Page of all places!
ActiveRecord::Base.connection.execute(' DELETE FROM table_1 WHERE id IN ( SELECT id FROM ( SELECT id, ROW_NUMBER() OVER w AS rnum FROM table_1 WINDOW w AS ( PARTITION BY something_id ORDER BY id ) ) t WHERE t.rnum > 1); ')
This SQL statement will delete all rows in violation of not having a unique
something_id field. you can also add more columns to this by comma seperation, if you have multiple fields, it would be :
PARTITION BY something_id, something2_id, something3_id
Now the next step is to add a unique index, but we have some issues , because
episode_id already has an index on place on it. Not only that but it also has a foreign key assignment.
You cannot modify an index on a table if a foreign key is on that column , and if you currently have an index on a column, you have to remove it before adding a new
so in order to add the unique index we must first remove the foreign key assignment from the column, which we will add back later:
remove_foreign_key :table_1, column: :something_id
then after the foreign key is removed we can now remove the existing (non-unique) index that is on
But instead of just removing it, we will rename it to be extra safe.
rename_index :table_1, :index_on_specific_table_1, :temp_non_unique_index
so in the above code you can see that we are renaming
the reason why we want to rename, instead of straight removing at this point is because the
add_index will take a long time, and it could sometimes fail depending on the size of the table.
if the table size is huge it is possible it could
add_index could fail during migration, and if we had used
remove_index before it , then we would be left with a column with no index on it at all, and this could change a request that takes only milliseconds, to possible taking minutes.
then, after we rename the table, we add an index to the table:
add_index :table_1, :something_id, unique: true, name: 'unque_index_table_1`
this will add a unique index to the specific column in the specific table and then it will give it . a specific name:
After this is all said and done, we will then remove our temporary index:
remove_index(:table_1, :name => 'temp_non_uniq_index') add_index :table_1, :episode_id, unique: true
now all togther it should looks something like this:
ActiveRecord::Base.connection.execute(' DELETE FROM table_1 WHERE id IN ( SELECT id FROM ( SELECT id, ROW_NUMBER() OVER w AS rnum FROM table_1 WINDOW w AS ( PARTITION BY something_id ORDER BY id ) ) t WHERE t.rnum > 1); ') remove_foreign_key :table_i1, column: :something_id rename_index :table_1, :index_on_episode_id, :temp_non_uniq_index add_index :table_1, :something_id, unique: true, name: 'unique_index_on_table_1_episode_id' remove_index(:table_1, :name => 'temp_non_uniq_index') add_foreign_key :table_1, :episodes add_index :table_1, :something_id, unique: true