DEV Community

abbazs
abbazs

Posted on • Updated on

How to migrate Microsoft Sql database to postgresql database using pandas?

Recently had to migrate a database in mssql to postgresql. It is a flask based dashboard application. We were developing the application in a trial environment where the hosting and db instance will expire every 120 days. And we used to take back up of the application near to the trial expiry and restore the application in a new trial instance. So to take back up the database and restore it we used flask_alchemydumps. It worked fine, restoring millions of records used to take forever from local machine, but from a trial VM it was fairly quick, within an hour the db was restored.

We got the budget approvals for paid gcp instance and this time we chose postgresql for the db instead of mssql for the cost factor. mssql instance was at least twice expensive than postgresql.

Here comes the challenge, trial VM's were stopped, no more trial VM's. So restoring the db from local machine was not happening at all. And the VPN to connect to db will frequently disconnect and the restoration of db using flask_alchemydumps became impossible.

The web application is hosted in a cloud foundry instance and hence I tried with various options cloud foundry provided (creating worker only instance, a sidecar process, running a task etc..) to restore the data from flask_alchemydumps to postgresql. It failed in all the process with some or the other error message. Most frequent error message was Exit status 137 (out of memory).

Then I tried to restore the data from yet to expire mssql instance using MSSQL Server Management Studio following stackoverflow.com answers migrate data from MS SQL to PostgreSQL? and Passing ssl certificates paths to PostgreSQL ODBC driver. It worked but it was slow and stop after restoring few thousand records.

So what actually worked?

pandas read_sql and to_sql
Here is the code that did the data base migration:

# Imports first!
import pymssql
import pandas as pd
# Probably context manager is not helping much in this case
from contextlib import contextmanager
from sqlalchemy import create_engine

# Postgres instance can only be accessed with sslmode
ssl_args = {
    "sslcert": r"C:\.postgresql\postgresql.crt",
    "sslkey": r"C:\.postgresql\postgresql.key",
    "sslmode": "require",
}

# create a sqlalchemy engine
sqlengine = create_engine(
"postgresql+psycopg2://admin:password@x.x.x.x:5432/thedatabase",
connect_args=ssl_args,
)

# The mssql database connection
@contextmanager
def get_old_db_connection():
    creds = {
        "server": "x.x.x.x",
        "user": "admin",
        "password": "password",
        "database": "old_database",
        "port": "1433",
    }
    conn = pymssql.connect(**creds)
    try:
        yield conn
    finally:
        conn.close()
# The migration code!
with get_old_db_connection() as conn:
    # read_sql with chunksize will return a iterator
    df = pd.read_sql(
        "SELECT * FROM table_1 ORDER BY id;",
        con=conn,
        chunksize=100000,
        parse_dates=["created_date"],
    )
    # Record the completed row index, in case if the 
    # process stopped before completing start again
    # with last updated index 
    # `SELECT * FROM table_1 WHERE id > 1000 ORDER BY id;`  
    for di in df:
        di.to_sql(
           name="table_1", 
           if_exists="append", 
           con=sqlengine, 
           index=False
        )
        print(f"Rows completed = {di['id'].iloc[-1]}")
Enter fullscreen mode Exit fullscreen mode

While flask_alchemydumps is a good choice for backup and restore it didn't work in this case. Ideally it shall have worked.

After the updating the db you most probably need to do this...

Top comments (0)