This task is easy to remember. I don't know why
Maybe because it is a rare task for programmers in general, or because it was a big challenge for me.
Anyway, I wrote down the details of this experience so that I would not forget it in the future, and It may be a helpful for anyone who might work on a similar task.
The reason for this migrating
To be honest I and the rest of the developers team in the company(which I was work on) weren't known anything about MongoDB at all.
In this specific app someone used an old version of MongoDB, and we don't know how to upgrade it or how to handle it if something wrong happened.
So that is why we made this decision to migrate data
because we used PostgreSQL for almost projects and had enough experience to deal with this kind of DB
The challenges I have
1- Complete the task with zero downtime
2- Convert Mongo ObjectID to UUID
Implementation steps
The task should be divided into three steps to be done with zero downtime.
step 1
1-1 Install required gems
gem 'sinatra-activerecord'
gem 'bson-objectid-to-uuid'
gem 'pg', '~> 0.20.0'
the application I worked on built by sinatra that is why I use
sinatra-activerecord
for using ActiveRecord
1-2 Create PostgreSQL Schema
I needed to create migration for every Mongoid::Document
for example, one of the Mongoid::Document was like this:
class BannedCustomer
include Mongoid::Document
include Mongoid::Timestamps
field :account_id, type: Integer
field :notes, type: String
field :system_banned, type: Boolean, default: false
field :deleted, type: Boolean, default: false
end
ruby
the ActiveRecord migration was:
class CreateBannedCustomers < ActiveRecord::Migration
def change
create_table :banned_customers, id: :uuid, default: 'gen_random_uuid()' do |t|
t.integer :account_id
t.text :notes
t.boolean :deleted, :default => false
t.boolean :system_banned, :default => false
t.timestamps null: false
end
end
end
1-3 Create ActiveRecord model for each table
here I needed to use namespace PG for recognize ActiveRecord and MongoDB models
root
├── models
│ └── banned_customers.rb
│ └── another_model.rb
│ └── pg
│ └── banned_customers.rb
│ └── another_model.rb
module PG
class BannedCustomer < ActiveRecord::Base
end
end
In this way BannedCustomer will refer to MongoDB and
PG::BannedCustomer will refer to PostgreSQL
1-4 Write rake task to move and sync data between two kinds of DB
namespace :db do
namespace :sync_postgresql do
sync_desc = <<-DESC.gsub(/ /, '')
Sync data between Mongoid and postgresql for specifce Mongoid's model
$ rake db:sync_postgresql:model MODEL='MyModel'
DESC
desc sync_desc
task :model do
batch_size = 100
models = ["BannedCustomer", ....]
not_synced_ids = []
last_record_processed = nil
updated_records_ids = []
created_records_ids = []
if ENV['MODEL'].to_s == ''
puts '='*90, 'USAGE', '='*90, sync_desc, ""
exit(1)
else
if !models.include?(ENV['MODEL'].to_s)
puts '='*90, 'USAGE', '='*90, sync_desc, ""
exit(1)
end
mongoid_klass = eval(ENV['MODEL'])
pg_klass = eval("PG::#{ENV['MODEL'].to_s}")
after_id = ENV['AFTER_ID'].present? ? ENV['AFTER_ID'].to_s : nil
if after_id.eql?("0")
last_record_synced_id = nil
else
last_record_synced_id = after_id || PG::MigrationState.find_by(key: "last_#{ENV['MODEL'].to_s.underscore}_id").try(:value)
end
puts "last_record_synced_id: #{last_record_synced_id}"
if last_record_synced_id.present?
last_record_synced = mongoid_klass.unscoped.find(last_record_synced_id) rescue nil
end
last_record_stored = mongoid_klass.unscoped.asc(:created_at).last
if last_record_synced && (last_record_synced.id == last_record_stored.id)
puts "All #{ENV['MODEL']} Records Have Been Synced"
else
if last_record_synced
count = mongoid_klass.unscoped.between(created_at: (last_record_synced.created_at..last_record_stored.created_at)).count.to_f
else
count = mongoid_klass.unscoped.all.count.to_f
end
puts "count: #{count}"
1.upto((count / batch_size).ceil) do |page|
if last_record_synced
@mongodb_records = mongoid_klass.unscoped.between(created_at: (last_record_synced.created_at..last_record_stored.created_at)).page(page).per(batch_size)
else
@mongodb_records = mongoid_klass.unscoped.all.page(page).per(batch_size)
end
@mongodb_records.each do |record|
uuid = record._id.to_uuid
mongo_json = record.to_json
mongo_attributes = JSON.parse(mongo_json)
mongo_attributes = mongo_attributes.merge({"id" => uuid})
mongo_attributes.delete("_id")
begin
pg_record = pg_klass.unscoped.find_by(id: uuid)
if pg_record
pg_attributes = pg_record.attributes
if pg_attributes != mongo_attributes
pg_record.update!(mongo_attributes)
updated_records_ids << record._id
end
else
pg_klass.create!(mongo_attributes)
created_records_ids << record._id
end
last_record_processed = record._id
rescue => e
Raven.capture_exception(e)
puts e
not_synced_ids << record._id
end
end
unless @mongodb_records.empty?
sync_state = PG::MigrationState.find_or_create_by!(key: "last_#{ENV['MODEL'].to_s.underscore}_id")
sync_state.update!(value: last_record_processed )
end
puts "#{batch_size} #{mongoid_klass} are processed."
end
not_synced_state = PG::MigrationState.find_or_create_by!(key: "not_synced_#{ENV['MODEL'].to_s.underscore}_ids")
not_synced_state_value = not_synced_ids.empty? ? "" : not_synced_ids.join(',')
not_synced_state.update(value: not_synced_state_value)
end
end
end
all_sync_desc = <<-DESC.gsub(/ /, '')
Sync data between Mongoid and postgresql for all Mongoid's models
$ rake db:sync_postgresql:model:all
DESC
desc all_sync_desc
task :all do
all_models = ["BannedCustomer", "BannedPaypal", "CreditCardCheck", "PaypalCheck"]
all_models.each do |klass|
puts "#" * 40
puts "Processing model: #{klass}..."
ENV['MODEL'] = klass.to_s
Rake::Task["db:sync_postgresql:model"].invoke
Rake::Task["db:sync_postgresql:model"].reenable
end
end
end
end
PG::MigrationState
was a table I used it for save sync state
1-5 Keep writing/Reading from MongoDB
I deploy all these changes to production, and run the sync task and added it to cron for autorun every 15min
Step 2
In this step after I made sure everything was working as expected.
I replaced Writing/Reading from MongoDB to PostgreSQL then deployed to production, and of course I executed the sync task for sync data that has written into MongoDB during the deployment
Last Step
Drop MongoDB and everything related to it when I made sure that all data has migrated without miss anything
conclusion
Large data migrations can be challenging, but with proper and thorough planning, they can be accomplished without any users or other stakeholders noticing.
I bet there are more people who already had to deal with this problem so let me know about your experience and the solutions you worked out :)
Top comments (0)