Data migration is a common part of working with databases, and Ruby on Rails developers have many tools at their disposal. Still, it can be a challenge to get it right in every situation. Here are some tips for making sure your migrations go smoothly.
The easiest way is just to have data migrations alongside with schema migrations. But as your code base grows this might become a problem.
Data migrations depend on the existing code(models, service objects, etc.) over time code changes potentially breaking the migrations. Let's say you have a
Order model and want to update a recently added attribute
true for all orders created last year. You can do it with a simple migration:
class UpdateFulfilledIn2021 < ActiveRecord::Migration def up Order.all.where("created_at < '2022-01-01'") .update_all(fulfilled: true) end end
The above migration might fail:
Ordermodel gets renamed or removed completely.
- If a new validation was added to the model.
- A new side effect via callback were added to the model. For example if an email gets sent to a customer after the
Orderrecord is updated with
fulfilled = 't', and you definitely don't want to bother customers from last year with emails they don't need.
Possibly long execution times
Data migrations might take a lot of time to execute, depending on the amount of records you need to update, and the complexity of calculations to perform for each record. Long execution times can potentially break the deployment pipeline, or lead to some down time of the application.
Hard to test
It's hard to write tests for migrations ensuring that the data is updated in a correct way, since a single error might lead to whole application data corruption requiring some down time to recover DB back to correct state.
Inline model stubs
Just define models class stubs with minimal required definitions. That way original model changes won't affect the migration. For the example from above:
class UpdateFulfilledIn2021 < ActiveRecord::Migration class OrderStub < ApplicationRecord self.table_name = 'orders' end def up OrderStub.all.where("created_at < '2022-01-01'") .update_all(fulfilled: true) end end
- Is not affected by actual model changes;
- Model logic and association definitions are duplicated;
Alternatively you can use pure SQL to modify your data.
def up execute( "UPDATE user_orders SET fulfilled = 't' WHERE created_at < '2022-01-01'" ) end
- Does not depend on the code;
- Does not require any additional setup;
- Get complex fast for non-trivial data changes;
- Duplicates association logic when need to join associated records;
- No progress indication for long running updates;
task update_fulfilled_in_2021: :environment do OrderStub.all.where("created_at < '2022-01-01'") .update_all(fulfilled: true) end
- Can be run on demand any time in background without affecting deployment process;
- No clear indication that migration has already performed, so it's better be idempotent;
Data migration gem
There are a couple of gems that helps organize your data migrations in the same way as schema migrations, one of the most known data-migrate. You can simply generate a new migration:
rails g data_migration update_fulfilled_in_2021
Generated data migrations are stored in
db/data and then could be run with:
- Has versioning the same way as schema migrations;
- Can be easily run during deployment;
- Testable: via either extracting logic into dedicated classes, or setting up a test helper to run the actual migration during tests;
- Additional setup;
Each solution is good in it's own circumstances.
For easy one time updates you can utilize Rails schema migrations or execute pure SQL.
If you need to perform more complex data updates once in a month or so, then rake tasks might help you with that.
But as your code base grows and migrations get bigger & more complex it's better to utilize full data migration automation gem with version control and a way to test the updates before they hit the production database.