At WebSummit 2019 I attended a really nice workshop from AWS called "Choosing the right database for your application", by Steven Byer. I'll try to explain the things I found more interesting and add some more information that I personally wanted to understand better.
The volume of data we are handling today is higher than ever before, which makes maintaining databases more challenging than ever before. The challenge begins when choosing what type of database you are going to use in your application.
In the past, the answer would be really clear and simple: a relational database will solve it. Nowadays, even though a relational database will still possibly solve your problem, we have other options that might suit our use case better and make our lives easier. We just need to know the options and, of course, really understand the needs of our use case.
So, what is there in the modern world of databases? The options I will cover here are the same ones Steven Byer mentioned in his talk:
Let’s start then.
Examples: Mysql and Oracle
Relational databases are structured as tables that contain data, and these tables relate to each other using keys that can identify their connection, for example an ID. Relational databases are easy to visualize, maybe because we are so used to them that it is just automatically understandable when you see an image like this:
In the example above we have database for musical albums. An album has a name, a release date, an artist and a genre, but the artist’s and genre’s detailed information belong in their own tables.
First because their data actually belongs to their own entity, and not to the album itself.
Second, we want to normalize our database, so we can connect the same artist and genre to multiple albums without repeating data.
We create this relation by giving them an identifier to every artist and genre, and in our album, we store the identifiers, respectively. Later on we can connect all the pieces we want by querying it with the help of a SQL statement.
Relational databases are consistent, the schema you pre-define must be respected in order to store your data in it, and the data you get out of it is guaranteed to be in the norm of your schema; Also, all major relational databases adhere to the ACID principles and guarantee referential integrity.
Some use cases: ERPs and CRMs are good real life examples where using a relational database is the perfect fit. Applications where you have a highly structured data schema that needs to be respected; if you need to perform actions when the data is manipulated (through triggers); when consistency is at the top high priority.
Examples: DynamoDB, Cassandra and Redis
Opposed to relational databases, where we need to pre-define the data structure, key-value databases (also called key-value stores) treat data as a single collection. It’s a simple key-value method, where a key, that can be anything (but must be unique), links to a value, that can also be anything. Each record may contain different fields. There is no schema to abide by, because the schema is defined by the item being inserted. A unique key linking to a particular set of data. That’s all.
The example above is provided by AWS in their section about key-value stores. As you can see, each item contains a different type of data: books, albums, movies, all in the same store, and each type of data, doesn’t necessarily contain the same structure.
The simplicity of a key-value store is one of the advantages of using it, but there’s more: key-value databases are really easy to partition, therefore it’s easier to horizontally scale them.
Some use cases: Shopping carts, product catalogs, customer preferences applications. Applications where you have flexible data with no complex relationships and need to be able to easily scale horizontally.
Examples: MongoDB, Couchbase, DynamoDB again and Redis (Enterprise with Redis Modules)
Document databases are an extension from key-value stores. In these databases you still give the data a unique key, but the values are stored in a structured format that the database can understand, usually a JSON-like document (hence the name). The values are structured in a defined format, but storing it is the same as in a key-value database: no schema, the data can be anything, as long as it respects the format.
In a key-value store, the database doesn’t know anything about the data, just stores it in whatever way you insert it. You can even store multiple types of data, and the database will treat everything the same way. In a document database, the database can read and understand what is stored, therefore, it gives you some extra powers when querying the values. Aside from the key, you can also define other indexes and write more complex queries than you would in a simple key-value store.
Some use cases: CMS applications, catalogs and since document databases are an extension from key-value databases, the examples listed in the previous section also apply here.
Examples: Neptune, Neo4j and Dgraph
A graph database organizes data in nodes and edges. The nodes represent the entities and the edges represent the relationship between nodes.
In a relational database, the relationship between tables are only that, a link between records. But in a graph database, the relationship (edges) tells us more than just the link. The edges have metadata that explains better what type of relationship a record (node) has with the other.
Graph databases are also schemaless, as key-value and document databases, giving us the same flexibility mentioned before.
In graph databases it’s much easier (and faster) to find specific connections between entities, since these connections are the key concept in this type of database. See the example:
In the example above you can see a really simple social network, you see the users, which are the nodes and the edges connecting them. Each edge has a defined type of connection set as metadata, which then can be used to query the nodes. If we want to, for example, find possible friends in common from Angela and Anthony, we could find all Anthony’s friends who are not Angela and who Angela is still not friends with, which would show us Debby.
Some use cases: recommendation engines, social networking and fraud detection. All this is possible in a simple relational database, but a graph database is optimised for when the connections are the most important part of the data.
Examples: AWS Timestream and Graphite
Time-series databases are optimised for storing data collected sequentially over time. A timestamp is the primary axis in this type of databases, and they are designed to store and process huge volumes of data.
Some use cases: Monitoring applications, live stock information, weather forecast. Whenever time is the main key.
Examples: AWS Quantum Ledger Database (QLDB)
QLDB is designed specially for blockchain applications, but there are many use cases in which we need to store historical or audit data. AWS created QLDB with the idea of solving the problems you might encounter when implementing these applications.
In QLDB a ledger is a database that contains a journal, a current state and the indexed history tables.
The journal is where each event that happens to an entity is appended as a record in an immutable, cryptographically verifiable block in the journal. On top of the journal containing all the transactions, there is the current state table, which contains what the name already tells us: the latest state of the entity, based on the sequence of changes that happened to it, up to now. And last but not least, the indexed history table, which contains the sequential state changes, stored as a history.
QLDB is much more than just saving historical data (and much more than I can explain), if you want to know all the details, I recommend watching this talk from Chris de Kadt and Andrew Certain at AWS re:invent 2018.
Some use cases: Blockchain, event sourcing applications, basically everywhere you need to keep history and/or this history needs to be verifiable.
Key-value, document, graph, time-series and ledger are types of noSQL databases, meaning that they are non-relational. They don’t store data in a tabular traditional format. NoSQL databases exist since the 60s,but they only got the fancy name a while ago, and they are getting more popular everyday.
There was a time where going for a traditional relational database was the obvious choice. Today, in the cloud era, even though there is absolutely nothing wrong with going for a relational database, you do have other options that might suit you better, and are not as hard and expensive to implement as it was a couple of years ago.
All databases have their advantages and disadvantages, and the answer will always depend on your use case and your priorities.