loading...
Cover image for Move PostgreSQL AWS RDS to Google Cloud SQL

Move PostgreSQL AWS RDS to Google Cloud SQL

mfahlandt profile image Mario Updated on ・3 min read

We have the issue, that we have to move a large postgreSQL database away from Amazon's AWS to Google's GCP.

The Problems where:

Large Database: 160GB+ We only had the Snapshots of AWS

Get a snapshot out of RDS into Storage

To do this we created a new compute engine and connected to it via ssh. We want to get the dump file direct to a new bucket in storage. So we have to enable the bucket as new volume of the machine:

gcloud init

You can either Login or use the Service Account, but keep in mind that the Service Account needs the rights to create a bucket.

gsutil mb gs://my-new-bucket/

Now we have to mount the Bucket to our machine. For this wie use Cloud Storage FUSE, to install it we need the following steps:

export GCSFUSE_REPO=gcsfuse-lsb_release -c -s
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install gcsfuse

And now we can finally mount it

gcsfuse db /mnt/gcs-bucket

So we have a place to store the dump, what next? We have to install the same PostgresQL Version on the Machine as the remote Server is, to get a working pg_dump

echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main 9.5" | sudo tee /etc/apt/sources.list.d/postgresql.list
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
apt-get install postgresql-9.5
sudo apt-get install postgresql-9.5

Now we finally can do the dump:

pg_dump -h yourRDS.rds.amazonaws.com -p 5432 -F c -O -U postgres DATABASE > /mnt/gcs-bucket/db.dump

Depending on how large your Database is this will take quite a while. Whats next, create your SQL Instance on GCP. There is an import function for SQL Files out of the Bucket but sadly not for dumps, so we have to do the restore the hard way.

pg_restore -h YourNewSQLInstanceIP -n public -U postgres-user -d DATABASE -1 /mnt/gcs-bucket/db.dump

This will even take longer, be sure to whitelist the IP of the Compute Engine, that it can have access to the SQL Instance.

I did everything like you told me but i receive weird errors

Something like this?

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 4198; 0 0 ACL children_of(integer) 
postgrespg_restore: [archiver (db)] could not execute query: ERROR: role "user" does not exist Command was: REVOKE ALL ON FUNCTION children_of(root_id integer) FROM PUBLIC;

Easy to answer, you have missing users on your new Database that are referenced by in the Dump.

How to avoid this?

Easy to answer, create the users. Sadly you can't export them due some regulations RDS that makes it impossible to do a pg_dumpall -g (Only User and Roles)

pg_dumpall -h yourRDS.cd8cncmdv7f0.eu-central-1.rds.amazonaws.com -g  -p 5432  -U postgres > /mnt/gcs-bucket/db_roles.dump

This do not work and you will receive the error

pg_dumpall: query failed: ERROR:  permission denied for relation pg_authid
pg_dumpall: query was: SELECT oid, rolname, rolsuper, rolinherit, rolcreaterole, rolcreatedb, rolcanlogin, rolconnlimit, rolpassword, rolvaliduntil, rolreplication, rolbypassrls, pg_catalog.shobj_description(oid, 'pg_authid') as rolcomment, rolname = current_user AS is_current_user FROM pg_authid ORDER BY 2

Because AWS RDS don't do the query as the superuser and so you cannot export it. However if you create them manually it will work fine

till next time

Posted on by:

mfahlandt profile

Mario

@mfahlandt

tech freak, organizer of GDG Munich Cloud, doing react, nodejs and a lot of cloud and kubernetes, docker stuff. Developer @ Königspunkt

Discussion

markdown guide
 

May I ask, why you moved to GC SQL? Are there any major advantages over AWS RDS (for your use case)? Just asking out of curiosity, thanks :-)

 

First Reason in our case it is cheaper:
db.t3.large is roughly db-n1-standard-2 and you save 0.005 $ / hour and on it goes.

We got Credits @Google ;)

You might want an alternative to prevent being vendor locked.

Multicloud is also a thing. As bigger plattforms like Netflix have showd us last year it is a bad idea to rely on one cloud.

And you can also use most of the steps to put your database in a kubernetes cluster that is brought to multicloud layer via istio.

 

I did a similar move from AWS RDS to Google Cloud SQL (for MySQL) at my previous workplace. Although the instance cost was cheaper (~30% less), the support was awful. We had a few issues with network bandwidth, query optimizer settings, size of database reported on console vs that shown in the DB itself. This was 3 years ago, so hopefully they have improved.

3 years ago CloudSQL was freshly out of Alpha into Beta. it shure did improve and change a lot.
Beta products in Google cloud are always a bit rough. Don't use the official support there. Use the official GCP Slack and ask in the corresponding channel, usually there are always some PO's of google around to help or give advice whom to contact. This is way better than the support, especially on Alpha & Beta implementations. Also a good way to get into Alphas.

Hey Mario could you post a link to the official GCP Slack? I can't seem to find it on the GCP support pages.

 
 

Have you looked at Buccardo as a means for seamless replication?

 
Sloan, the sloth mascot Comment marked as low quality/non-constructive by the community View code of conduct

Why in all hell are you changing RDS for the poorly supported, with poor features and extensions, stuck in 9.6 and with important HA issues GoogleSQL? It is a total no-sense in practical applications.

 

First of: Next time your first community contribution upon joining a community should not be a toxic written comment. This would be nice.

Also it is not part of the article to explain any WHY you would move, sometimes you just have to.

There are multiple reasons.
Price, Credits, Multicloud Setup, avoiding beeing single vendor locked and most of the post is about how do you get your data out of AWS. For example moving it to a way better Multicloud Setup inside of multiple kubernetes clusters handled by istio. Because single cloud applications are like you would put it "a total non-sense"

For Details see the other comment