RDS stands for Relational Database Service.
Relational databases are organized into tables where they have rows for the data items and columns which are the fields in the database.
These tables are structured in a way they are related to each other. Each specific type of domain data is stored in its own table.
I still remember that in my university times I used Visual Paradigm in a class to design these diagrams, but you can just go to draw.io and easily draw a relational database diagram there.
Here is an example of a table:
When should we use RDS databases?
Typically RDS is used for OLTP, Online Transaction Processing workloads.
What are the databases available with RDS?
MS SQL Server
Aurora is the AWS offer for RDS databases and it’s compatible with MySQL and PostgreSQL, but more performant and scales automatically.
With RDS, you can launch in multi availability zones, you can also configure failover capability and automated backups.
How long does it usually take to install oracle on brand new servers and configure a basic cluster with replication to another data center and also configure daily backups? Usually, that would take several days to get everything up and running on production.
But with AWS RDS you can get it up and running in a matter of minutes.
OLTP vs OLAP
OLTP processes data from transactions in real-time. Focus on large amounts of small transactions in real-time.
OLAP processes complex queries to analyze historical data. Focus on data analysis, large amounts of complex data, and complex queries that take a long time to complete.
It makes sense to store customer orders on a table like a customer table that is shown above in this document.
OLAP can be used for example for the NET Profit Analysis of car sales for example for the last 3 years, across different countries, costs, sales prices, sales prices compared to the unit cost.
For OLAP, in AWS you should use something like RedShift which is a data warehouse.
RDS Multi-AZ and Read Replicas
What is Multi-AZ?
Multi-AZ is an exact copy of your production database in another availability zone.
If you’re running an application on some easy EC2 instances behind an elastic load balancer, and they are storing that data on an RDS database with Multi-AZ enabled.
With Multi-AZ we have a primary RDS instance and in this example, our primary is located in one specific AZ, lets, say us-east-1.
And we have a standby RDS instance, located in a different availability zone.
And in this case, our standby is located in us-east-2.
An RDS will replicate the data from the primary instance to standby.
Now under normal circumstances with everything operating as expected, the standby RDS instance is not visible or accessible to the application servers.
But if something goes wrong with our primary database instance, it could be there’s a hardware issue or even a problem with the entire availability zone, we still have another database instance in the standby location. And RDS will automatically failover to the standby database instance.
And AWS handles all the replication between primary and secondary, so you don’t have to configure anything yourself.
And when you write to your production database, this right will automatically synchronize to the standby database.
All types of RDS databases can be configured as Multi-AZ
The main purpose of Multi-AZ is to provide resilience and keep your application up and running.
If you are performing maintenance on your primary RDS instance. And in the event of an unplanned failure, RDS will automatically failover to the standby so that database operations can resume quickly, without any administrative intervention.
It will automatically failover to the secondary instance and your application can keep on running. And it’s really important to understand that Multi-AZ is for disaster recovery.
It’s for DR and it is not for scaling out and improving performance, so that means that you cannot do this. So you cannot have your database clients or your application servers connecting to both the primary and standby simultaneously.
And you might be thinking if Multi-AZ can’t be used for improving performance, then what can I use to improve performance?
And that’s a good question because the main thing you can do to improve performance and particularly read performance is to add Read Replicas.
Photo by freestocks on Unsplash
What is a Read Replica?
And a Read Replica is a read-only copy of your primary database.
So imagine you’ve got a couple of application servers and they are reading and writing data to an RDS instance.
And you’ve also got a business intelligence application as well and this application needs to access the same data, but it only needs to read the data.
So maybe your team needs to run reports and forecasts using the data but they don’t need to write to the database, they only need to read access.
Well, this is a really good use case for read replicas. Because you can add a read replica and it’s a read-only copy, but it will allow the sales team to run all our reports without using op capacity on our primary database and without impacting our customer-facing application in any way. So this is great for read-heavy workloads because it takes the read load off your primary database. And a Read Replica can be loaded in the same availability zone as your primary database.
It can also be cross-AZ, so located in a completely different ability zone.
Or it can even be cross-region and located in a completely different region.
And each Read Replica has its own DNS and point, which is different and independent from the primary database. So we have one end point for the Read Replica and one for the primary database.
And Read Replicas can even be promoted to become their own independent databases.
However, of course, if we do that, that’s going to break the replication from the original database but it will give us two completely independent databases both allowing read and write access.
So with Read Replicas it’s important to understand that these are used for scaling read performance. So they’re primarily used for scaling and not fault disaster recovery.
And in order to configure a Read Replica, you will need to have automatic backups enabled.
And automatic backups of course they are enabled by default, but if for some reason you’ve disabled backups then you won’t be able to deploy a Read Replica.
RDS Backups and Snapshots
What are Database Snapshots?
A point-in-time copy of the storage volume is attached to your database instance. With automated backups, RDS creates daily backups or snapshots which run during a backup window that you define.
And in addition to this daily backup, it also generates transaction logs, which are used to replay transactions when you come to restore the database.
Automated backups give you the ability to perform a point-in-time recovery and recover your database to any point in time within a retention period of one to 35 days.
How does it work?
You can have an automated backup scheduled to complete within a backup window of between 2:00 AM and 6:00 AM. So RDS will automatically take a daily snapshot during that backup window, and the snapshots are stored in S3. And it’s also creating transaction logs as well.
What about snapshots?
Well, these are not automated, database snapshots are done manually, and they are user-initiated.
Now when you come to restore an RDS database, whether you restore from an automated backup or from a manual snapshot, the restored version of the database will always be a completely new RDS instance with a new DNS endpoint.
Encryption and Database Snapshots
You can enable encryption with RDS, and you can enable it at creation time by just selecting the encryption option in the console.
And RDS is completely integrated with KMS. So encryption is done using KMS, and it uses the industry standard, AES-256 bit encryption.
Now when we enable encryption on an RDS database, then RDS is going to encrypt all of the database storage.
So that includes all of the underlying storage associated with your RDS database, including any automated backups, any manual snapshots, any logs, and read replicas as well.
Encryption can only be enabled when you first create the database, and you cannot, later on, enable encryption on an unencrypted RDS database instance.
So what do you do if you have an existing unencrypted database and you are suddenly asked to encrypt that data?
If you’ve already created an unencrypted database and you need to encrypt that data, you can take a snapshot and that snapshot will also be unencrypted.
And then from that unencrypted snapshot, you can create an encrypted snapshot, and then perform a database restore using that encrypted snapshot.
And in that way, you will get an encrypted database.
Let me know your thoughts on AWS RDS!