The Great Migration: From SQL Server to AWS Aurora

Helen Anderson on April 09, 2019

Over the last nine months, I've been leading a project to migrate 60 Data Analyst users and all their objects from a legacy SQL Server database to ... [Read Full]
markdown guide
 

"""
But, there was a lot more manual intervention needed in this process... sometimes it would pick some and sometimes none at all.

A lot of the time this resulted in multiple tables landing that needed to be cleaned up.
"""

Could you please give examples of what "manual intervention" were needed?

Which are the times when "sometimes none at all"? -- How was the problem discovered? How was the cross-database data comparison achieved?

And when "needed to be cleaned up", what clean up is needed and how did you do it?

 

Hi, thanks for the questions 😊

I found that when I used more than one transformation rule of the same type eg. two rules for schema I would sometimes see two tables in my destination database. One with the rules in place (upper case the name, add a suffix), and one without. After testing multiple times it wasn’t clear why this was happening. To clean up I simply deleted the table I didn’t need or copied the data into the correctly named one at the database level. Not a deal breaker but more manual intervention than was intended

 

Great read! My company has had a lot of good experiences with PostGres recently. I’m even converting stuff from MongoDB (well really Mongoose) to PostGres because we can utilize JsonB if we need to get unstructured. Articles like yours give me extra confidence! :)

 

That's great to hear, you'll have to write about your experiences here to keep us updated on your progress :)

 

The end architecture seems a bit redundant. Why not using Redshift for BI, instead of migrating data from there to Aurora? The language used by the Analysts would be the same...

 

Redshift is used as the BI platform for data modelling by the Data Services team.

We move the data to Aurora so any data analyst in the business can use our models without interrupting what's happening on Redshift and without us interrupting what they do.

 

The ease of provisioning in the cloud can sometimes make us to spawn up instances we don't really need and thus, incur avoidable charges.

I don't know how many concurrent connections you expect or what you exactly mean when you say "We move the data to Aurora so any data analyst in the business can use our models without interrupting what's happening on Redshift and without us interrupting what they do". But know ye that Redshift is designed to scale and also massively handle parallel query execution.

Fair comment on the ease of spinning up new services.

The reason we keep things separate isn't just down to scale. We keep a level of abstraction from Redshift, used for our team's models and landing source tables. And the analyst DB, where any analyst in the business can run queries.

Our visualisation tool uses Redshift as a data source to provide operational reports for all parts of the business. We've found in the past when our team shared the same platform as the analysts, queries would hit during update windows, they would (accidentally) lock tables, drop tables, and the teams didn't have the same kind of release processes as our team, so it was hard to keep track of who was doing what. We were using the platform as a Production space with Operational reports and couldn't risk our service being interrupted.

We didn't want to police 60 users all over the world and they didn't want to be restricted by our teams release processes if they were using the space as a sandbox.

So we keep things separate.

 

Exactly my thoughts. Redshift is OLAP oriented unlike Aurora which leans more towards OLTP. I see a lot of capacity under-utilization in this design.

 

Hey Helen, if you have the information of the size, the cost and the speed of the migration from x to Aurora vis DMS for your use cases I know that is high-value knowledge to share since the docs don't really give you examples.

I recently assisted migrated a 10 million row database from Postgres to MySQL Aurora Serverless and the migration was done in under 10 mins. I believe the cost was negligible but I personally don't remember.

DMS is as you say a fiddly tool

 

Hey Andrew

That is an excellent question. From my understanding, the DMS services are free if you are moving to Aurora so the only costs are the CloudWatch logging that went along with it - aws.amazon.com/dms/free-dms/

I'll check back and let you know if that's not the case

 

Just a quick check for transfer:

All data transfer into AWS Database Migration Service is free, and data transferred between AWS Database Migration Service and databases in Amazon RDS and Amazon EC2 Instances in the same Availability Zone also is free. Standard AWS data transfer rates apply when you migrate your source database to a target database in a different Availability Zone, Region, or outside of AWS. Standard AWS data transfer rates are listed on the EC2 instance pricing page.

The cost comes down to the server used and duration it ran, so knowing what size and how long would be of help

Awesome, I'll do some digging and check it out

 

"In SQL Server everyone could see everything, so everyone could potentially delete anything

We made the conscious decision to create a schema for each function so they have a dedicated place to write their tables. Analysts then grant access to tables they want to share because everyone shouldn’t have access to everything"

That's a critique of your original database design, but it reads like a limitation of the SQL platform.

I'd love to get my teeth into some cloud database projects but all the companies I work on cannot allow their data to be hosted on the cloud and insist on in house SQL servers segregated from the internet.

 

Great look behind the scenes at a migration of this scale and the performance improvements you've found.

Slightly tangential but would you recommend the AWS ecosystem for someone looking at the different cloud solutions certifications?

 

Thanks Alan,

I'm (slowly) working on doing the AWS Solutions Architect Associate Certification. It's taking a bit longer than expected as I don't know how things like networking work in a traditional sense, let alone in the AWS ecosystem :)

So far, so good. I don't have anything to compare to - like Azure and the MS offerings to be honest. But there are loads of resources if you are looking to start, which is a great help for noobs like me

 

very interesting post. recently I had to migrate some tables from Vertica to Aurora Serverless because we were building an internal tool and we needed more separation between those tables and the rest of the DWH. Unfortunately, i did not have the time to the Data Migration Service from AWS and since the tables were quite small anyway we went for a naive manual process. I will definitely have a deeper read at your other posts!
But I must say that the setup and use of Aurora Serverless, especially via the serverless yml has been a breeze!

 

Good to know, good to know ...

The team here are pretty excited for Serverless too.

 

Hi Helen
Thank you for the interesting post,
Could you please explain how do you monitor all the process, for example the task scheduler from failures?
And if it fails, what’s happen?

 

Hi Mario,

Sure thing. We use New Relic as our tool to monitor most processes. My team have a roster where we monitor the dashboards during the day and over the weekends.

The task scheduled procs that the Analysts build are a little bit different. Our DevOps team built a process that takes care of logs and alerts. Each Analyst creates their proc then adds the relevant details like when it's ideal to run, the service account then securely runs the task. If the task fails then an alert is triggered in the Analyst Slack channel and tags the owner of the proc.

 

I haven't had to convert an entire database before -- bookmarking this for one day when I might. Thanks!

 

It's quite the undertaking for sure. But the reality for working in IT and BI so I've taken it on positively :)

 

Was looking forward to hearing about this, glad it's going well!

 

Thanks Jared, the video is now available if you want to listen to me talk about it rather than reading all that :)

code of conduct - report abuse