In this post, I'm not trying to explain every point in detail, but rather giving an overview with a brief description of some important aspects that we need to keep in mind while working with databases.
Data are structured using tables with columns and rows. The relationship between tables is predefined. SQL is used to query the data in the database.
E.g. PostgreSQL, MySQL, MariaDB, SQLite.
Data are not organized in tables but rather have a more flexible structure. These databases are designed for easy scaling horizontally and ease of use. There are some main types of Non-Relational databases:
Document databases store each item in a JSON, BSON, or XML encoded object called a document.
E.g. MongoDB, CouchDB,
Each item has a key and a value. E.g. name=John, last_name=Doe.
E.g., Redis, Memcached
Data is stored in columns. It is normally used when our data items have a huge amount of properties (each property is stored in one column), and we just want to query some of the properties to avoid unnecessary data.
E.g., Cassandra, HBase
Data is held in nodes and edges. The main focus of this type of database is the connection between nodes (using edges).
E.g., Neo4j, Amazon Neptune
Atomic, Consistent, Isolated, Durable
Basic Availability, Soft-state, Eventual consistent
The data has its copies on multiple interconnected machines. These machines can be in different locations. To reduce access latency, increase availability and read throughput. Some typical database replication techniques are:
New data is written on only one single machine called Leader. The data will then be replicated to other machines. These machines are called Replicas. Reading data can be done on both Leader and Replicas.
Same as Single-Leader, but there are multiple Leaders. Each data center has one Leader. Data written to one Leader will be replicated to the other Leaders.
There is no Leader. The Replicas receive read and write requests directly.
We use transactions when we want to execute a set of operations that can either be successful altogether or not at all.
An index is a data structure that helps read data faster with a charge of slower write operations and more space needed.
Database backup is a way to make a copy of the data. In case of a disaster that causes data loss, we can use this copy to restore our important data.
Only backup what was changed since the last full backup
Backup what was changed since the last incremental backup
Our software often requires changes in the code and the database schema. When we deploy new code to production, the database should be updated as well. This process is called database migration. We have to make sure that during the migration progress, the database can always be rolled back to its original state if something wrong happens.
Cloud providers like Amazon or Google will manage everything (provisioning, scaling, maintenance, etc.) for us. We just need to pay for what we use (pay-per-use).