DEV Community

Cover image for Migration data from MongoDB to PostgreSQL
Hasan Basheer
Hasan Basheer

Posted on

Migration data from MongoDB to PostgreSQL

This task is easy to remember. I don't know why

Maybe because it is a rare task for programmers in general, or because it was a big challenge for me.

Anyway, I wrote down the details of this experience so that I would not forget it in the future, and It may be a helpful for anyone who might work on a similar task.

The reason for this migrating

To be honest I and the rest of the developers team in the company(which I was work on) weren't known anything about MongoDB at all.

In this specific app someone used an old version of MongoDB, and we don't know how to upgrade it or how to handle it if something wrong happened.

So that is why we made this decision to migrate data
because we used PostgreSQL for almost projects and had enough experience to deal with this kind of DB

The challenges I have

1- Complete the task with zero downtime
2- Convert Mongo ObjectID to UUID

Implementation steps

The task should be divided into three steps to be done with zero downtime.

step 1

1-1 Install required gems

gem 'sinatra-activerecord'
gem 'bson-objectid-to-uuid'
gem 'pg', '~> 0.20.0'
Enter fullscreen mode Exit fullscreen mode

the application I worked on built by sinatra that is why I use sinatra-activerecord for using ActiveRecord

1-2 Create PostgreSQL Schema

I needed to create migration for every Mongoid::Document
for example, one of the Mongoid::Document was like this:

class BannedCustomer
  include Mongoid::Document
  include Mongoid::Timestamps

  field :account_id, type: Integer
  field :notes, type: String
  field :system_banned, type: Boolean, default: false
  field :deleted, type: Boolean, default: false

 end

Enter fullscreen mode Exit fullscreen mode


ruby
the ActiveRecord migration was:

class CreateBannedCustomers < ActiveRecord::Migration
  def change
    create_table :banned_customers, id: :uuid, default: 'gen_random_uuid()' do |t|
      t.integer :account_id
      t.text :notes
      t.boolean :deleted, :default => false
      t.boolean :system_banned, :default => false
      t.timestamps null: false
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

1-3 Create ActiveRecord model for each table

here I needed to use namespace PG for recognize ActiveRecord and MongoDB models

root
├── models
│   └── banned_customers.rb
│   └── another_model.rb
│   └── pg
│      └── banned_customers.rb
│      └── another_model.rb
Enter fullscreen mode Exit fullscreen mode
module PG
  class BannedCustomer < ActiveRecord::Base
  end
end
Enter fullscreen mode Exit fullscreen mode

In this way BannedCustomer will refer to MongoDB and
PG::BannedCustomer will refer to PostgreSQL

1-4 Write rake task to move and sync data between two kinds of DB

namespace :db do
  namespace :sync_postgresql do

    sync_desc = <<-DESC.gsub(/    /, '')
      Sync data between Mongoid and postgresql for specifce Mongoid's model
        $ rake db:sync_postgresql:model MODEL='MyModel'
    DESC
    desc sync_desc
    task :model do
      batch_size = 100
      models = ["BannedCustomer", ....]
      not_synced_ids = []
      last_record_processed = nil
      updated_records_ids = []
      created_records_ids = []

      if ENV['MODEL'].to_s == ''
        puts '='*90, 'USAGE', '='*90, sync_desc, ""
        exit(1)
      else
        if !models.include?(ENV['MODEL'].to_s)
          puts '='*90, 'USAGE', '='*90, sync_desc, ""
          exit(1)
        end
        mongoid_klass  = eval(ENV['MODEL'])
        pg_klass = eval("PG::#{ENV['MODEL'].to_s}")

        after_id = ENV['AFTER_ID'].present? ? ENV['AFTER_ID'].to_s : nil

        if after_id.eql?("0")
          last_record_synced_id = nil
        else
          last_record_synced_id = after_id || PG::MigrationState.find_by(key: "last_#{ENV['MODEL'].to_s.underscore}_id").try(:value)
        end

        puts "last_record_synced_id: #{last_record_synced_id}"

        if last_record_synced_id.present?
          last_record_synced = mongoid_klass.unscoped.find(last_record_synced_id) rescue nil
        end

        last_record_stored = mongoid_klass.unscoped.asc(:created_at).last

        if last_record_synced && (last_record_synced.id == last_record_stored.id)
          puts "All #{ENV['MODEL']} Records Have Been Synced"
        else
          if last_record_synced
            count = mongoid_klass.unscoped.between(created_at: (last_record_synced.created_at..last_record_stored.created_at)).count.to_f
          else
            count = mongoid_klass.unscoped.all.count.to_f
          end

          puts "count: #{count}"

          1.upto((count / batch_size).ceil) do |page|
            if last_record_synced
              @mongodb_records = mongoid_klass.unscoped.between(created_at: (last_record_synced.created_at..last_record_stored.created_at)).page(page).per(batch_size)
            else
              @mongodb_records = mongoid_klass.unscoped.all.page(page).per(batch_size)
            end
            @mongodb_records.each do |record|
              uuid = record._id.to_uuid

              mongo_json = record.to_json
              mongo_attributes = JSON.parse(mongo_json)
              mongo_attributes = mongo_attributes.merge({"id" => uuid})

              mongo_attributes.delete("_id")

              begin
                pg_record = pg_klass.unscoped.find_by(id: uuid)
                if pg_record
                  pg_attributes = pg_record.attributes

                  if pg_attributes != mongo_attributes
                    pg_record.update!(mongo_attributes)
                    updated_records_ids << record._id
                  end
                else
                  pg_klass.create!(mongo_attributes)
                  created_records_ids << record._id
                end
                last_record_processed = record._id
              rescue => e
                Raven.capture_exception(e)
                puts e
                not_synced_ids << record._id
              end
            end

            unless @mongodb_records.empty?
              sync_state = PG::MigrationState.find_or_create_by!(key: "last_#{ENV['MODEL'].to_s.underscore}_id")
              sync_state.update!(value: last_record_processed )
            end

            puts "#{batch_size} #{mongoid_klass} are processed."
          end

          not_synced_state = PG::MigrationState.find_or_create_by!(key: "not_synced_#{ENV['MODEL'].to_s.underscore}_ids")
          not_synced_state_value = not_synced_ids.empty? ? "" : not_synced_ids.join(',')
          not_synced_state.update(value: not_synced_state_value)
        end
      end
    end

    all_sync_desc = <<-DESC.gsub(/    /, '')
      Sync data between Mongoid and postgresql for all Mongoid's models
        $ rake db:sync_postgresql:model:all
    DESC
    desc all_sync_desc
    task :all do
      all_models = ["BannedCustomer", "BannedPaypal", "CreditCardCheck", "PaypalCheck"]
      all_models.each do |klass|
        puts "#" * 40
        puts "Processing model: #{klass}..."
        ENV['MODEL'] = klass.to_s
        Rake::Task["db:sync_postgresql:model"].invoke
        Rake::Task["db:sync_postgresql:model"].reenable
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

PG::MigrationState was a table I used it for save sync state

1-5 Keep writing/Reading from MongoDB

I deploy all these changes to production, and run the sync task and added it to cron for autorun every 15min

Step 2

In this step after I made sure everything was working as expected.
I replaced Writing/Reading from MongoDB to PostgreSQL then deployed to production, and of course I executed the sync task for sync data that has written into MongoDB during the deployment

Last Step

Drop MongoDB and everything related to it when I made sure that all data has migrated without miss anything

conclusion

Large data migrations can be challenging, but with proper and thorough planning, they can be accomplished without any users or other stakeholders noticing.

I bet there are more people who already had to deal with this problem so let me know about your experience and the solutions you worked out :)

Top comments (0)