Daniel Chapman for AWS Community Builders

Posted on Sep 27

Migrating from SQLServer to Aurora PostgreSQL

#databasefreedom #aurora #sqlserver #babelfish

In this blog I am going show you how to migrate from SQLServer to Aurora PostgreSQL using the Babelfish extension.

This blog post assumes that you are familiar with the RDS service.

The first thing you need to do is to create an Aurora PostgreSQL custom parameter group with the babelfish setting
"rds.babelfish_status" set to on.

I then created a new security group that allowed connections to my VPC range for ports 1433 & 5432. This will allow me and the DMS service to be able to connect to both endpoints.

From here, I created an Aurora PostgreSQL cluster using this parameter group. I have opted to use Serverless V2 but this is not mandatory. Once your cluster has created, you should see that the two endpoints available (writer/reader) are listening on both TCP 5432 & 1433

You then need to create two IAM roles for the DMS service to use.
These are: dms-vpc-role and dms-cloudwatch-logs-role

arn:aws:iam::<accountnumber>:role/dms-cloudwatch-logs-role
arn:aws:iam::<accountnumber>:role/dms-vpc-role

They need to be set up as the exact ARNs mentioned above, otherwise the DMS service will not recognise them

https://docs.aws.amazon.com/dms/latest/userguide/security-iam.html

Navigate to the DMS service.
From here, navigate to "Subnet groups"

You then need to create a subnet group for the DMS replication server to run out of. This needs to be a CIDR range that can communicate to your source (SQLServer) instance and your target (APG)

https://docs.aws.amazon.com/dms/latest/userguide/subnet-group.html

The CLI can also be used:

aws dms create-replication-subnet-group \
--replication-subnet-group-identifier SQLServerToAPG \ 
--replication-subnet-group-description SQLServerToAPG \ 
--subnet-ids <SUBNETA> <SUBNETB> <SUBNETC> >/dev/null

You are then in a position to create the DMS replication instance

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.Creating.html

The CLI can also be used:

aws dms create-replication-instance \
--replication-instance-identifier SQLServerToAuroraPostgreSQL \ 
--allocated-storage 50 \ 
--replication-instance-class dms.t3.medium \ 
--vpc-security-group-ids <SG ID created above> \
--no-multi-az \
--replication-subnet-group-identifier <Subnet Group created above> \
--engine-version 3.5.3 \
--no-publicly-accessible >/dev/null

The next step is to create both source and target DMS endpoints

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Endpoints.Creating.html

The CLI can also be used:

Source:

aws dms create-endpoint \
--endpoint-identifier SQLServer-source \
--endpoint-type source \
--engine-name sqlserver \
--ssl-mode require \
--username <username> \
--server-name sqlserversourcerds.abcdef.eu-west-1.rds.amazonaws.com \
--password <password of the SQLServer instance> \
--port 1433 \
--database-name <Local Database you wish to migrate> >/dev/null

Target:

aws dms create-endpoint \
--endpoint-identifier APG-Target \
--endpoint-type target \
--engine-name aurora-postgresql \
--ssl-mode require \
--username <username> \
--server-name aurorapostgresqltarget.cluster-abcdef.eu-west-1.rds.amazonaws.com \
--password <password of the APG cluster> \
--port 5432 \
--database-name babelfish_db \
--postgre-sql-settings '{"MaxFileSize": 512}','{"ExecuteTimeout": 3600}'>/dev/null

You then need to test the endpoints to ensure that the replication instance can communicate to both the source/target endpoints

https://docs.aws.amazon.com/dms/latest/sbs/chap-mariadb2auroramysql.testendpoints.html

From here, connect to the source SQLServer instance. I am using SSMS in this example, but any tooling will work.

The dataset was populated using this document

https://www.sqlservercentral.com/articles/generating-dummy-data

This creates some dummy tables and populates a dataset

We need to gather the DDL from this instance next
In SSMS, select the database you're wanting to export, tasks, generate scripts.

Ensure 'script the entire database' is selected, also ensure 'all database objects' is ticked.

On the next page, under advanced ensure that 'Script-Full-Text-Indexes' are selected and also 'Script Triggers' is set to true.

Save the output to a SQL file.

I then tend to split the large DDL file into multiple separate files, for pre and post migrations steps. For example,

tables-ss.sql    -- This will include all CREATE TABLE statements
FK-ss.sql        -- This will include all FOREIGN KEY statements
functions-ss.sql -- This will include all CREATE FUNCTION statements
defaults-ss.sql  -- This will include all DEFAULT statements
procedure-ss.sql -- This will include all CREATE PROCEDURE statements
function-ss.sql  -- This will include all CREATE FUNCTIONS statements
triggers-ss.sql  -- This will include all CREATE TRIGGERS statements
views-ss.sql     -- This will include all CREATE VIEW statements
indexes-ss.sql   -- This will include all CREATE INDEXES statements

You now need to enable CDC on the source SQLServer instance. This is done via the below commands, example local database here

exec msdb.dbo.rds_cdc_enable_db 'commbuilderdemo';
GO

From here, I connect to the APG cluster but via the 1433 port. This will be using a SQLServer native UI or the SQLCMD command line tool.

Once connected to the master database, I will create the target local database.

In the PostgreSQL endpoint all this simply does is create a new schema.

From there, I then will run the script created above to create all the blank tables and any applicable primary keys

As you can now see, we've connected to the Aurora PostgreSQL cluster using a native SQLServer UI tool and you can now see the tables visible that we have previously created

To access this instance from the PostgreSQL endpoint you would run

psql -h <hostname> -U <username> -d babefish_db << EOF 
set search_path=<LocalDatabaseCreatedAbove_dbo;
\dt
EOF

An example of the above with a populated variables is here

psql -h <hostname> -U <username> -d babefish_db 

babelfish_db=> set search_path=commbuilders_dbo;
SET
babelfish_db=> \dt
                       List of relations
      Schema      |       Name       | Type  |      Owner
------------------+------------------+-------+------------------
 commbuilders_dbo | customer_data    | table | commbuilders_dbo
 commbuilders_dbo | first_names      | table | commbuilders_dbo
 commbuilders_dbo | last_names       | table | commbuilders_dbo
 commbuilders_dbo | numbers          | table | commbuilders_dbo
 commbuilders_dbo | street_type      | table | commbuilders_dbo
 commbuilders_dbo | transaction_data | table | commbuilders_dbo
 commbuilders_dbo | transactions     | table | commbuilders_dbo
(7 rows)

babelfish_db=>

From here, navigate back to the DMS service. This is where we will create a replication task to move the data over from source to target.

Ensure you set the DMS task to 'Do nothing' for 'Target table preparation mode'. This is to ensure that DMS does not drop and re-create the tables. If this was to happen, then the SQLServer endpoint cannot always see the table(s) as the data types will be the PostgreSQL types and not the SQLServer required types.

Here I am selecting all tables, I'm not excluding anything

You need to rename the schema to the localdatabase_dbo. With localdatabase being the name of the "CREATE DATABASE" command above

Here is a summary of the transformation rules that I created:

Once the data has loaded, it will stay in CDC mode until you are ready to switch over.

You can check the status of the task by checking CloudWatch logs or inside the task itself, it shows how many rows it has migrated over:

When you are ready to switch over, this is where you'd stop the source application and then once all connections have dropped you are in a position to stop the replication

You can now run the post scripts that you created before.

FK-ss.sql        
functions-ss.sql 
defaults-ss.sql 
procedure-ss.sql 
function-ss.sql  
triggers-ss.sql  
views-ss.sql     
indexes-ss.sql

Once you have created all the database resources, you now need to run data validation steps.

This will include, row counts validation, object count validation (FK, indexes, constraints etc), column count validation (to ensure all columns on the tables have replicated).

For row counts, I simply ran a row count on all tables and compared.

SELECT COUNT(*) FROM X;

For object counts, I compared sys.objects in both instances, connecting to the 1433 endpoints so I could run the exact same SQL on both.

SELECT type_desc,count(*) from sys.objects 
where schema_id=(SELECT SCHEMA_ID('dbo'))
group by type_desc
order by type_desc;
GO

That way you can easily compare objects and drill down into any differences, if applicable

Once all of those steps have passed and you are happy, you will need to instruct your application team(s) to change their endpoint to the Aurora PostgreSQL endpoint and restart their application(s)

Congratulations - you've managed to migrate from SQLServer to Aurora PostgreSQL.

You can see a walkthrough of this process here on the AWS twitch site, that I presented earlier this year.

https://www.twitch.tv/videos/2043017167?collection=BEUpZw5lUxeLJA

I hope this blog post helped, and happy building

Feel free to connect

https://www.linkedin.com/in/daniel-chapman-a753b56b/

DEV Community

Migrating from SQLServer to Aurora PostgreSQL

Top comments (0)

Read next

Is Video Poker Skill or Luck? Strategies for Consistent Play

Deploying Kubernetes CronJobs with Helm and Google Cloud Build

Building a Real-Time Flask and Next.js Application with Redis, Socket.IO, and Docker Compose

Enclave Games Monthly Report: November 2024