There are hundreds of SQL and NoSQL databases. Some are popular, others are ignored. Some are simple and well documented, and some are difficult to use. Some are open source, while others are proprietary. Perhaps most importantly, some are scalable, optimized, highly available, and some are difficult to scale or maintain.
A natural question arises: which database to choose from? To answer it, we need to decide what we want to achieve with the database. To form an idea, you need to answer the following questions::
- Do we need analytical access to the database?
- Do we need to write or read in real-time?
- How many tables/records do we want to keep?
- What kind of accessibility do we need?
- Do we need columns?
Will we be able to access tables filtered by columns or by rows?
When making a decision, you need to remember what a particular database can offer. The specifics of each database may differ, but in general, there are only a few types within which we can achieve basically the same goals. Let's look at them in more detail.
Relational SQL databases
If you've ever worked with databases, most likely you started with this type, because it's the most popular and common. Such databases allow you to store data in relational tables with certain columns of a certain type. Relational tables are good for normalization and merging.
Advantages:
- SQL support
- ACID transactions (atomicity, consistency, isolation, and durability)
- Indexing and partitioning support
Disadvantages:
- Poor support for unstructured data / complex types
- Poor event handling optimization
- Complex / expensive scaling
Examples: Oracle DB, MySQL, PostgreSQL.
Document-oriented databases
If we don't want to combine multiple tables to get the data we need, we can look at document-oriented databases. They allow you to store records in JSON format. In this format, we can create a complex value for any key and immediately include the entire data structure in a single record.
Advantages:
- The scheme is free of charge
- There is no need to always write all the fields in each record
- Good support for complex types
- Suitable for OLTP
Disadvantages:
- Poor transaction support
- Weak analytical support
- Complex / expensive scaling
Examples: MongoDB.
In-memory databases
Databases of this type can provide a real-time response for selecting and inserting specific records. Most of them mostly store data in RAM, but in some cases, they also offer persistent storage on hard drives or solid-state drives. Most of these databases work with key-value records, so the values can be stored in a document-oriented format. But some databases also work with columns and allow secondary indexing of the same table. Using RAM allows you to process data quickly, but makes it more unstable and expensive.
Advantages:
- Quick Writing
- Quick Read
Disadvantages:
- High reliability
- Expensive scaling
Examples: Redis, Tarantool, Apache Ignite.
Databases with wide columns
These databases store data as key/value records on a hard disk or solid-state drive. These solutions are designed to scale well enough to manage petabytes of data on thousands of shared servers in a distributed system. They represent Stable architecture. This architecture was designed for two use cases: fast key access and fast write with high availability.
Advantages:
- Quick entry line by line
- Quick Key Reading
- Good scalability
- High availability
Disadvantages:
- Key-value format»
- No analytics support
Examples: Cassandra, HBase.
Columnar databases
Sometimes we need to quickly access data not with specific keys, but with specific columns. In this case, it is better to abandon the line-by-line insertion and switch to the batch recording. Batch insertion allows columnar databases to prepare data for fast column-by-column reads.
Advantages:
- Quick read column by column
- Good analytical support
- Good scalability
Disadvantages:
- Only suitable for batch inserts
Examples: Vertica, Clickhouse.
Search Engine
If we want to access the data by filtering by any value and even by any word in the column, we need to remember the search engines. These databases index each word in columns and allow a full-text search. They are ideal for storing and analyzing logs or large text values.
Advantages:
- Quick access by any word
- Good scalability
Disadvantages:
- Only suitable for batch inserts
- Poor analytical support
Examples: Elastic.
Graph databases
For some cases, graph data structures are suitable. If your tasks require working with graphs, there are special databases that will meet your needs.
Advantages:
- Graph data structure
- Managed relationships between entities
- Flexible designs
Disadvantages:
- Special query language
- Difficult to scale
Examples: Neo4j.
Conclusion
Almost any task can be performed with almost any type of database. The question is how expensive and optimized it will be. Choosing a tool that you are used to can reduce your time to market. But it can also cost you a huge amount of money to maintain and expand your equipment, which can be used inefficiently. Always try to use the database as intended. Perhaps a solution that meets your needs already exists.
Top comments (7)
Almost everything listed in this article can actually be achieved within MariaDB, the modern fork of MySQL. It has native JSON support, it support ColumnStore as a storage engine now, it can do recursive (graph) querying, has the MEMORY storage engine, and also supports various methods to scale out read/write operations, including sharding across multiple nodes.
Do you recommend MariaDB over Postgres?
I don't have an answer to that question. I'm sure tons of people will want to swoop in with the "I use X therefor I recommend it" answer.
But I'll be 100% honest. I have a crapload of experience with MariaDB, and almost zero experience with PostgreSQL. I absolutely love MariaDB, and I have friends that absolutely love PostreSQL. We actively acknowledge the advantages and disadvantages of each, and how those have narrowed over the years.
In the end, don't pick an application based on popularity or if other people like it or use it. Figure out what your problem domain is, and then pick the tool that best matches your problem domain.
One missed great search database is
Meilisearch
. Its an alternative to AlgoliaThere are also distributed SQL databases like YugabyteDB.
Did you take inspiration from Fireship's 7 Database Paradigms? :)
Nope :)