Enmanuel de la Nuez

Posted on Jul 9, 2020

NoSQL Document Stores

#mongodb #sql #webdev #database

What is NoSQL?

NoSQL is a general term that encompasses databases that usually don't support the use of SQL (Structured Query Language) because they store data differently. NoSQL databases came about in the 2000s to meet the scalability demands of big companies like Google, Amazon, and Facebook. Traditional databases have been around since the 70s. Being such a general term, NoSQL can refer to many types based on how they store their data. The most popular types are document, key-value, wide-column, and graph stores. Document stores seem to be by far the most popular, widely used as a general-purpose database in many of the tutorials you might find on this platform.

Document Stores

Document stores make use of unique documents, usually a type of JSON object with fields and values. These documents support many of the primitive data types we're used to from our programming languages, making it easy to get up and running with your language of choice. As opposed to normalizing data and building relations, documents allow you to store all the information related to an entity in that document.

Relational Databases vs. Document Stores

Relational

Suppose we want to build an application that has users and those users can track their hobbies. Using a relational database we would define the structure of our two basic entities to the level or normalization (the act of breaking up tables to reduce redundancy and increase data integrity) we like. A basic structure for our users and hobbies look like this:

Users Table

id	first_name	last_name	email	hashed_password
1	Alex	Shelley	s.alex@mail.com	7Fy5vQhehSBAuJaKBJLC
2	Manuela	Avery	mavery@cool.org	GFUtPYL3vyKZAQqJeVmA

Hobbies Table

id	hobby
1	Swimming
2	Music

We can build a relationship between users and hobbies with another table, let's call it user_hobbies:

users_hobbies

id	user_id	hobby_id
1	1	1
2	1	2
3	2	1

We can interpret it to mean user 1 both swims and enjoys music while user 2 only recorded enjoying swimming. A query with a SQL operator like a JOIN allows us to reference the users and hobbies tables with the IDs listed in the user_hobbies table. This table structure allows us to assign any number of hobbies to any of the users in that table. We can also add users or hobbies individually without making new connections between them. Additionally, relational databases also allow us to alter any of the values and columns on a table.

Document-Oriented

To capture the same data in a document-oriented database, we can store our user information and their hobbies in the same document.

{
  "_id": 1,
  "name": {
    "first": "Alex",
    "last": "Shelley"
  },
  "email": "s.alex@mail.com",
  "hashed_password": "7Fy5vQhehSBAuJaKBJLC",
  "hobbies": [{ "hobby": "Swimming" }, { "hobby": "Music" }]
}

{
  "_id": 2,
  "name": {
    "first": "Manuela",
    "last": "Avery"
  },
  "email": "mavery@cool.org",
  "hashed_password": "GFUtPYL3vyKZAQqJeVmA",
  "hobbies": [{ "hobby": "Swimming" }]
}

This way of storing data allows us to be more flexible with our data. For example, We can change the structure of user 2 without changing the structure of user 1.

Scalability

Scaling a relational database is usually done by a combination of 3 principles: functional partitioning, data partitioning, and replicating data. Briefly, functional partitioning refers to delegating different data to different databases based on their purpose or function, with an emphasis placed on grouping together data that has similar write or read throughput needs. One example would mean storing data for users in one database while storing videos in another. This allows each service to be scaled independently. Within each service, data partitioning can be applied to divide a body of data among different machines so that no one server has all the information and operations can be shared. Lastly, replication refers to having more servers all replicating the other servers they’re connected to. Some of these servers will be for distributing read-only operations, while all write ops go through a primary server for the secondary ones to read from and replicate. All of these methods to scale relational databases present significant challenges such as replication lag and cross-partition queries. Luckily when the situation calls for it, we have alternatives.

NoSQL databases like document-oriented ones were invented to solve problems that arise when we want to scale a relational database. These were built with horizontal scalability in mind, the holy grail of scalability. Typically, the more servers we add the more your throughput increases without many of the complications mentioned above. No choice is without trade-offs though, truly horizontal databases can still struggle with particularly workloads. Like always, it is about choosing the right tool for the job. Next time you find yourself wondering which one to use think about the workload you expect for your database and see how different types may help you. It is okay and common to use multiple types of databases for the same product, too, so no need to try and make a particular an all-situations magical tool.

Thanks for reading! I hope to soon write about other types of NoSQL databases and familiarize myself with what's out there.

Cover photo by Pixabay on Pexels.

Top comments (2)

Michiel Hendriks • Jul 11 '20

NoSQL is a general term that encompasses databases that usually don't support the use of SQL (Structured Query Language) because they store data differently. NoSQL databases came about in the 2000s to meet the scalability demands of big companies like Google, Amazon, and Facebook.

NoSQL is poorly named, it has little to do with SQL but more to do with relational databases. There are NoSQL systems which use structured query languages, some even quite similar to SQL.

Additionally NoSQL databases existed long before 2000. It's just that this term of non-relational databases was introduced and became popular. For example graph databases predate even relational databases.

Sahil Arora • Jul 11 '20

Loved this explanation. Graph databases would be a good topic that I would love to see.