In this blog I am going show you how to migrate from SQLServer to Aurora PostgreSQL using the Babelfish extension.
This blog post assumes that you are familiar with the RDS service.
The first thing you need to do is to create an Aurora PostgreSQL custom parameter group with the babelfish setting
"rds.babelfish_status" set to on.
I then created a new security group that allowed connections to my VPC range for ports 1433 & 5432. This will allow me and the DMS service to be able to connect to both endpoints.
From here, I created an Aurora PostgreSQL cluster using this parameter group. I have opted to use Serverless V2 but this is not mandatory. Once your cluster has created, you should see that the two endpoints available (writer/reader) are listening on both TCP 5432 & 1433
You then need to create two IAM roles for the DMS service to use.
These are: dms-vpc-role and dms-cloudwatch-logs-role
arn:aws:iam::<accountnumber>:role/dms-cloudwatch-logs-role
arn:aws:iam::<accountnumber>:role/dms-vpc-role
They need to be set up as the exact ARNs mentioned above, otherwise the DMS service will not recognise them
https://docs.aws.amazon.com/dms/latest/userguide/security-iam.html
Navigate to the DMS service.
From here, navigate to "Subnet groups"
You then need to create a subnet group for the DMS replication server to run out of. This needs to be a CIDR range that can communicate to your source (SQLServer) instance and your target (APG)
https://docs.aws.amazon.com/dms/latest/userguide/subnet-group.html
The CLI can also be used:
aws dms create-replication-subnet-group \
--replication-subnet-group-identifier SQLServerToAPG \
--replication-subnet-group-description SQLServerToAPG \
--subnet-ids <SUBNETA> <SUBNETB> <SUBNETC> >/dev/null
You are then in a position to create the DMS replication instance
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.Creating.html
The CLI can also be used:
aws dms create-replication-instance \
--replication-instance-identifier SQLServerToAuroraPostgreSQL \
--allocated-storage 50 \
--replication-instance-class dms.t3.medium \
--vpc-security-group-ids <SG ID created above> \
--no-multi-az \
--replication-subnet-group-identifier <Subnet Group created above> \
--engine-version 3.5.3 \
--no-publicly-accessible >/dev/null
The next step is to create both source and target DMS endpoints
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Endpoints.Creating.html
The CLI can also be used:
Source:
aws dms create-endpoint \
--endpoint-identifier SQLServer-source \
--endpoint-type source \
--engine-name sqlserver \
--ssl-mode require \
--username <username> \
--server-name sqlserversourcerds.abcdef.eu-west-1.rds.amazonaws.com \
--password <password of the SQLServer instance> \
--port 1433 \
--database-name <Local Database you wish to migrate> >/dev/null
Target:
aws dms create-endpoint \
--endpoint-identifier APG-Target \
--endpoint-type target \
--engine-name aurora-postgresql \
--ssl-mode require \
--username <username> \
--server-name aurorapostgresqltarget.cluster-abcdef.eu-west-1.rds.amazonaws.com \
--password <password of the APG cluster> \
--port 5432 \
--database-name babelfish_db \
--postgre-sql-settings '{"MaxFileSize": 512}','{"ExecuteTimeout": 3600}'>/dev/null
You then need to test the endpoints to ensure that the replication instance can communicate to both the source/target endpoints
https://docs.aws.amazon.com/dms/latest/sbs/chap-mariadb2auroramysql.testendpoints.html
From here, connect to the source SQLServer instance. I am using SSMS in this example, but any tooling will work.
The dataset was populated using this document
https://www.sqlservercentral.com/articles/generating-dummy-data
This creates some dummy tables and populates a dataset
We need to gather the DDL from this instance next
In SSMS, select the database you're wanting to export, tasks, generate scripts.
Ensure 'script the entire database' is selected, also ensure 'all database objects' is ticked.
On the next page, under advanced ensure that 'Script-Full-Text-Indexes' are selected and also 'Script Triggers' is set to true.
Save the output to a SQL file.
I then tend to split the large DDL file into multiple separate files, for pre and post migrations steps. For example,
tables-ss.sql -- This will include all CREATE TABLE statements
FK-ss.sql -- This will include all FOREIGN KEY statements
functions-ss.sql -- This will include all CREATE FUNCTION statements
defaults-ss.sql -- This will include all DEFAULT statements
procedure-ss.sql -- This will include all CREATE PROCEDURE statements
function-ss.sql -- This will include all CREATE FUNCTIONS statements
triggers-ss.sql -- This will include all CREATE TRIGGERS statements
views-ss.sql -- This will include all CREATE VIEW statements
indexes-ss.sql -- This will include all CREATE INDEXES statements
You now need to enable CDC on the source SQLServer instance. This is done via the below commands, example local database here
exec msdb.dbo.rds_cdc_enable_db 'commbuilderdemo';
GO
From here, I connect to the APG cluster but via the 1433 port. This will be using a SQLServer native UI or the SQLCMD command line tool.
Once connected to the master database, I will create the target local database.
In the PostgreSQL endpoint all this simply does is create a new schema.
From there, I then will run the script created above to create all the blank tables and any applicable primary keys
As you can now see, we've connected to the Aurora PostgreSQL cluster using a native SQLServer UI tool and you can now see the tables visible that we have previously created
To access this instance from the PostgreSQL endpoint you would run
psql -h <hostname> -U <username> -d babefish_db << EOF
set search_path=<LocalDatabaseCreatedAbove_dbo;
\dt
EOF
An example of the above with a populated variables is here
psql -h <hostname> -U <username> -d babefish_db
babelfish_db=> set search_path=commbuilders_dbo;
SET
babelfish_db=> \dt
List of relations
Schema | Name | Type | Owner
------------------+------------------+-------+------------------
commbuilders_dbo | customer_data | table | commbuilders_dbo
commbuilders_dbo | first_names | table | commbuilders_dbo
commbuilders_dbo | last_names | table | commbuilders_dbo
commbuilders_dbo | numbers | table | commbuilders_dbo
commbuilders_dbo | street_type | table | commbuilders_dbo
commbuilders_dbo | transaction_data | table | commbuilders_dbo
commbuilders_dbo | transactions | table | commbuilders_dbo
(7 rows)
babelfish_db=>
From here, navigate back to the DMS service. This is where we will create a replication task to move the data over from source to target.
Ensure you set the DMS task to 'Do nothing' for 'Target table preparation mode'. This is to ensure that DMS does not drop and re-create the tables. If this was to happen, then the SQLServer endpoint cannot always see the table(s) as the data types will be the PostgreSQL types and not the SQLServer required types.
Here I am selecting all tables, I'm not excluding anything
You need to rename the schema to the localdatabase_dbo. With localdatabase being the name of the "CREATE DATABASE" command above
Here is a summary of the transformation rules that I created:
Once the data has loaded, it will stay in CDC mode until you are ready to switch over.
You can check the status of the task by checking CloudWatch logs or inside the task itself, it shows how many rows it has migrated over:
When you are ready to switch over, this is where you'd stop the source application and then once all connections have dropped you are in a position to stop the replication
You can now run the post scripts that you created before.
FK-ss.sql
functions-ss.sql
defaults-ss.sql
procedure-ss.sql
function-ss.sql
triggers-ss.sql
views-ss.sql
indexes-ss.sql
Once you have created all the database resources, you now need to run data validation steps.
This will include, row counts validation, object count validation (FK, indexes, constraints etc), column count validation (to ensure all columns on the tables have replicated).
For row counts, I simply ran a row count on all tables and compared.
SELECT COUNT(*) FROM X;
For object counts, I compared sys.objects in both instances, connecting to the 1433 endpoints so I could run the exact same SQL on both.
SELECT type_desc,count(*) from sys.objects
where schema_id=(SELECT SCHEMA_ID('dbo'))
group by type_desc
order by type_desc;
GO
That way you can easily compare objects and drill down into any differences, if applicable
Once all of those steps have passed and you are happy, you will need to instruct your application team(s) to change their endpoint to the Aurora PostgreSQL endpoint and restart their application(s)
Congratulations - you've managed to migrate from SQLServer to Aurora PostgreSQL.
You can see a walkthrough of this process here on the AWS twitch site, that I presented earlier this year.
https://www.twitch.tv/videos/2043017167?collection=BEUpZw5lUxeLJA
I hope this blog post helped, and happy building
Feel free to connect
Top comments (0)