So you know SQL and want to get started with NoSQL? This starter tutorial will be perfect for you 😊.
What we'll cover:
- Comparison: The strengths of SQL and NoSQL
- 4 NoSQL families: What NoSQL types exist?
- Selling points for NoSQL: Dive deeper into NoSQL advantages
Most developers know SQL, but NoSQL has been in its shadow for a long time. In the past decade though, more and more business cases emerged that can't be handled easily by SQL solutions. Let’s go ahead and compare the two database concepts:
|Scalability||NoSQL solutions are at home in the cloud; thus scalability and big data applications are nicely supported.|
|Flexibility||NoSQL can deal with changing schemas without any explicit update scripts.|
|Cost||Many NoSQL databases are open source, there are no expensive licensing terms. What’s more: NoSQL runs well on commodity machines which again saves costs.|
|Replication||Most NoSQL solutions have built-in support for sharding and replication sets. This allows to split huge amounts of data across many servers and at the same time replicating the data, thus creating higher availability.|
|Transactions||Querying and updating data across multiple tables in an atomic fashion is better supported in SQL.|
|Consistency||In SQL, updates are immediately visible to all clients. NoSQL uses eventual consistency – meaning the data will be uniform, but it might take half a second until all nodes hold an update.|
|Support and maturity||Most SQL databases have been around for a while and are thus very stable. Moreover, if you choose an enterprise solution like Oracle DB, you get professional support.|
|Data normalization||In SQL, we store all data items just once – no repetitions necessary. This allows for simple updates. On the other hand, updates in NoSQL can require writes to multiple aggregates.|
One of the confusing things about non-relational databases is the fact that many varying database concepts belong to this family. All those families share one property though: data is NOT arranged into tables with fixed column sizes. For example, in Key-Value and Document Stores, each row can have a different length and format.
Here are the 4 families:
|Key-Value Store||Dynamo, Redis||A key-value map where the value is of arbitrary data type like String, Array or JSON.|
|Document Store||MongoDB, Couchbase||Each data entry is a document (e.g. as JSON) of varying schema. It's possible to query by JSON elements.|
|Column-Family Store||Cassandra, HBase||Store data in column families. Great for data compression and aggregate queries (e.g. SUM)|
|Graph Store||Neo4J, Giraph||Do computations on graphs, where the majority of data comes from edges rather than nodes.|
If you want to get to know NoSQL better, a great starting point is the learn and use a concrete store. For me, when I started to learn about non-relational databases, I simply started with MongoDB and tried out some queries / inserts etc. This helped me tremendously to grasp the concept of NoSQL.
NoSQL emerged around 2000 when companies like Google and Amazon felt they needed an alternative to traditional SQL databases. The primary driver for NoSQL was the tremendous growth of data volume as well as the frequent changes in the data schema. Thus highly influential papers like BigTable and Dynamo were released which showed unprecedented ways for storing and managing big data.
While most SQL databases run on one server (with lots of RAM), NoSQL was designed from the start to run on clusters. Thanks to this capability, a data pool can easily be increased if need be, by simply adding more nodes. When you think about a company like Google, Amazon, Netflix or Facebook, you can easily imagine that their data wouldn’t even fit on 100 hard drives. Anyways, there’s an increasing number of businesses where huge data volumes are common, like for example NewRelic which stores log data from your applications and allows you to easily query and display that data. If you’ve already seen a large business application running in verbose logging mode, you know how hard it can be to browse through the log, so a Logging solution like the one from NewRelic can be really helpful.
Another huge selling point for NoSQL are flexible schemas – in fact, you don’t need any schema, you can start storing data right away. Now obviously, the application dealing with your database has to understand how to make sense of the bits and bytes in your data store. So implicitly, there will be some schema, it's simply not in your database. Nevertheless, having the ability to quickly change the structure of new data records has tremendous advantages. For SQL, you’d first have to change your table schema – but what do you do with existing records? You might even have to create a data migration script and you obviously need an SQL script altering existing tables. In the non-relational world, all you do is change the logic in your application – and that’s it.
As a case in point, lets consider a retail store, which stores product info in a NoSQL database. Once a new piece of information gets introduced – like the eco-friendliness of a given product – then you can simply add this information to newly introduced products, but you can leave old products as is. All you have to do is make sure your application can handle both data formats – products with and without eco-friendly-scale.
With NoSQL, we use replication of data to ensure high availability. While replication comes in different forms (master-slave, peer-to-peer), the result is the same: One node can become unavailable, but querying data will still work, as other nodes take over the traffic in question. There are many ways how a data server can become unavailable – hard disk failure, temporary problems with the internet or a power outage – and while most of those happen only once a year or less frequently, there are a number of business cases where such downtimes are intolerable. Note though that SQL solutions can also offer high availability – with concepts like backup databases.
In the NoSQL world, there are many open source solutions, which one can simply install on his or her own cluster (MongoDB, Cassandra, CouchDB, Hypertable, Neo4J). So, when you choose a NoSQL open source solution, you are not at the mercy of a company’s licensing terms which might change unfavorably over time. What’s more, non-relational solutions are the ideal candidate for commodity servers. One can achieve really good throughput and availability without buying expensive high-end servers. For SQL, the game is different, here people often scale vertically, meaning an existing server is replaced by one with more RAM and faster CPU cores. This typically infers higher costs than scaling horizontally.
Bonus points if you're still reading this😊 So we've covered the most important general concepts of NoSQL.
Note that this post has only scratched the surface of NoSQL. If you are like me an prefer more concrete examples – well I’m glad you asked, I’ll be posting a tutorial on Document Stores and MongoDB shortly!
Also, if you're new to NoSQL and enjoy learning from books, I can highly recommend NoSQL Distilled by Sadalage and Fowler.
Finally, please remember to like ❤️ this article if you found it useful!