This is the #7 post of the IoT series that I am writing, in this article I will talk about databases, in special about the best databases to use in your IoT projects. Notice that this is an opinion based article, so obviously I will talk my experience and my preferences, but fell free to share your thoughts in the comments section at the end of the post.
Not all IoT applications need a database, that are cases and cases, and I will talk about them.
For example if you are building a little robot that will be controlled by a microcontroller that will communicate with your joystick through IoT protocols, having a database is not a requirement. But you might want/need to collect data and analyze it, to generate insights, to make machine learning predictions or anything else, so in this case, the database will allow you to organize and hold data.
In other hand, if you have smart devices that are connected in industry machines, and these devices collect data from the machines and send it to a main server where all the brute data need to be transformed into information, and generate reports about performance, failures and send it to the customers, in this case, having a database is an obligation, is a requirement for the system work.
So, you usually will need to use a database in your Iot application when:
Do you have massive data being collected by your devices, so they will not be able to store it in-memory.
Do you have constant variation in the data collected.
Your application has a continuous data flow.
Do you want to generate insights based on a data analysis, like Big Data.
Your application has machine learning in the business rule.
Your application has a low fault tolerance, so is essential to minimize data loss.
Short answer: It depends on your design.
But I suppose you are not here for a basic answer like that, so let's go deeper. As you might know, NoSQL databases are on the rise in the last years, they become very popular because of their capacity to handle large amount of data, even irregular/unstructured data. NoSQL databases usually are fast and easier to manipulate, because they don't have structured data, and also, they are built to scale and to run in cloud providers.
In the IoT field, many times we don't know exactly what a device will send to our server, so in this case IoT would be better. Another example is when our data can have many variations and in a very short time interval, then we have lots of data (with variations) to process, so we will need to scale, and consequently NoSQL again.
But what about SQL? It can be used for IoT? Yes, for sure, SQL is totally compatible with IoT, we just need to take care. When it comes to business intelligence and analysis, it’s still easier to work with SQL. The expertise and familiarity of developers and even non-developers to work with SQL is also a point in favor of choosing SQL, because many times the whole infra, devops engineers, managers and developers are accustomed to SQL, and change it takes time and money.
Large amount of data
Unstructured data or highly variable data
Continuous read and write accesses to the database
When timestamp is essential to your business rule and do you need to store all timestamps changes.
Professionals and infrastructure already adapted with SQL
Company is not open to any changes
When the data sent by the sensors is not too variable
When the data is structured and organized and do you want to do a business intelligence analysis more quickly
MongoDB is a document-oriented database software that is available as a free and open source cross-platform framework. It is classified as a NoSQL database application. MongoDB makes use of JSON-style documents with schemas. Organizations like it for IoT because it allows them to store data from any context, analyse it in real time, and modify the schema as they go. MongoDB is a really good choice because of the community and popularity, and recently they released a service called MongoDB for IoT.
InfluxDB is a relatively new database, having been published in 2013. This NoSQL database is open-source and was created using the Go programming language. InfluxDB is a time series database for optimizing and handling time series data. InfluxDB gained popularity as the Internet of Things grew, because of the vast amount of growing data. One factor that makes InfluxDB a excellent choice is that it has a syntax very similar to SQL, so SQL DBAs and developers will be able to use InfluxDB without having any big challenges.
Cassandra was initially developed by Facebook to be used in the search engine of your inbox of messages. In 2008 it became open-source and in 2009 it was maintained by the Apache Foundation. Its system distribution model is based on Dynamo (developed by Amazon) while the form of data organization is based on BigTable (developed by Google).
Cassandra is, by design, made to work in a distributed way, and there are no great advantages in working with it using only one machine. By using multiple machines (also called nodes), we see the true potential of the solution.
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. Prometheus supports multidimensional data models, time series functionalities and alerting.
RethinkDB is at the top of the open source database list. It’s a built-from-the-ground-up portable JSON database for the real-time Network. By transposing the conventional database architecture, RethinkDB presents an innovative new access model. When a developer issues a command, it can continuously push modified query results to applications in real-time.
About PostgreSQL I guess that everyone knows, PostgreSQL is one of the most popular databases of the world. PostgreSQL is a SQL database, but it is very fast and scaleable, and also has a big community, so this makes it a good option to be used in an IoT application.
We have reached the end of our article, in this article we talked a little bit about databases in general, NoSQL, SQL, and also about some good databases to be used in the IoT sector.
It is important to remember that this post is opinion-based, keep this in mind, because what i have mentioned here it is not asbolute, and might be wrong. This post was written accordingly to my experience at work and at college, so if you have different opinions, please let me know!