What is NoSQL?
NoSQL is a general term that encompasses databases that usually don't support the use of SQL (Structured Query Language) because they store data differently. NoSQL databases came about in the 2000s to meet the scalability demands of big companies like Google, Amazon, and Facebook. Traditional databases have been around since the 70s. Being such a general term, NoSQL can refer to many types based on how they store their data. The most popular types are document, key-value, wide-column, and graph stores. Document stores seem to be by far the most popular, widely used as a general-purpose database in many of the tutorials you might find on this platform.
Document Stores
Document stores make use of unique documents, usually a type of JSON object with fields and values. These documents support many of the primitive data types we're used to from our programming languages, making it easy to get up and running with your language of choice. As opposed to normalizing data and building relations, documents allow you to store all the information related to an entity in that document.
Relational Databases vs. Document Stores
Relational
Suppose we want to build an application that has users and those users can track their hobbies. Using a relational database we would define the structure of our two basic entities to the level or normalization (the act of breaking up tables to reduce redundancy and increase data integrity) we like. A basic structure for our users and hobbies look like this:
Users Table
id | first_name | last_name | hashed_password | |
---|---|---|---|---|
1 | Alex | Shelley | s.alex@mail.com | 7Fy5vQhehSBAuJaKBJLC |
2 | Manuela | Avery | mavery@cool.org | GFUtPYL3vyKZAQqJeVmA |
Hobbies Table
id | hobby |
---|---|
1 | Swimming |
2 | Music |
We can build a relationship between users
and hobbies
with another table, let's call it user_hobbies
:
users_hobbies
id | user_id | hobby_id |
---|---|---|
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
We can interpret it to mean user 1
both swims and enjoys music while user 2
only recorded enjoying swimming. A query with a SQL operator like a JOIN
allows us to reference the users
and hobbies
tables with the IDs listed in the user_hobbies
table. This table structure allows us to assign any number of hobbies to any of the users in that table. We can also add users or hobbies individually without making new connections between them. Additionally, relational databases also allow us to alter any of the values and columns on a table.
Document-Oriented
To capture the same data in a document-oriented database, we can store our user information and their hobbies in the same document.
{
"_id": 1,
"name": {
"first": "Alex",
"last": "Shelley"
},
"email": "s.alex@mail.com",
"hashed_password": "7Fy5vQhehSBAuJaKBJLC",
"hobbies": [{ "hobby": "Swimming" }, { "hobby": "Music" }]
}
{
"_id": 2,
"name": {
"first": "Manuela",
"last": "Avery"
},
"email": "mavery@cool.org",
"hashed_password": "GFUtPYL3vyKZAQqJeVmA",
"hobbies": [{ "hobby": "Swimming" }]
}
This way of storing data allows us to be more flexible with our data. For example, We can change the structure of user 2
without changing the structure of user 1
.
Scalability
Scaling a relational database is usually done by a combination of 3 principles: functional partitioning, data partitioning, and replicating data. Briefly, functional partitioning refers to delegating different data to different databases based on their purpose or function, with an emphasis placed on grouping together data that has similar write or read throughput needs. One example would mean storing data for users in one database while storing videos in another. This allows each service to be scaled independently. Within each service, data partitioning can be applied to divide a body of data among different machines so that no one server has all the information and operations can be shared. Lastly, replication refers to having more servers all replicating the other servers they’re connected to. Some of these servers will be for distributing read-only operations, while all write ops go through a primary server for the secondary ones to read from and replicate. All of these methods to scale relational databases present significant challenges such as replication lag and cross-partition queries. Luckily when the situation calls for it, we have alternatives.
NoSQL databases like document-oriented ones were invented to solve problems that arise when we want to scale a relational database. These were built with horizontal scalability in mind, the holy grail of scalability. Typically, the more servers we add the more your throughput increases without many of the complications mentioned above. No choice is without trade-offs though, truly horizontal databases can still struggle with particularly workloads. Like always, it is about choosing the right tool for the job. Next time you find yourself wondering which one to use think about the workload you expect for your database and see how different types may help you. It is okay and common to use multiple types of databases for the same product, too, so no need to try and make a particular an all-situations magical tool.
Thanks for reading! I hope to soon write about other types of NoSQL databases and familiarize myself with what's out there.
Cover photo by Pixabay on Pexels.
Latest comments (2)
NoSQL is poorly named, it has little to do with SQL but more to do with relational databases. There are NoSQL systems which use structured query languages, some even quite similar to SQL.
Additionally NoSQL databases existed long before 2000. It's just that this term of non-relational databases was introduced and became popular. For example graph databases predate even relational databases.
Loved this explanation. Graph databases would be a good topic that I would love to see.