This is the written version of my new youtube video ✍️ 🙂
In this Redis tutorial, you will learn about Redis and how Redis can be used as a primary database for complex applications that need to store data in multiple formats.
- What Redis is and its usages as well as why it is suitable for modern complex microservice applications?
- How Redis supports storing multiple data formats for different purposes through its modules?
- How Redis as an in-memory database can persist data and recover from data loss?
- How to scale and replicate Redis?
- Finally since one of the most popular platforms for running microservices is Kubernetes and since running stateful applications in Kubernetes is a bit challenging, we will see how you can easily run Redis in Kubernetes
Redis stands for remote dictionary server
Redis is an in-memory database. So many people have used it as a cache on top of other databases to improve the application performance. 🤓
However, what many people don't know is that Redis is a fully fledged primary database that can be used to store and persist multiple data formats for complex applications. 😎
So let's see the use cases for that.
Let's look at a common setup for a microservices application.
Let's say we have a complex social media application with millions of users. For this, we may need to store different data formats in different databases:
- Relational database, like Mysql, to store our data
- ElasticSearch for fast search and filtering
- Graph database to represent the connections of the users
- Document database, like MongoDB to store media content shared by our users daily
- Cache service for a better performance for the application
It's obvious that this is a pretty complex setup.
- ❌ Each data service needs to be deployed and maintained
- ❌ Know-How needed for each data service
- ❌ Different scaling & infrastructure requirements
- ❌ More complex application code for interacting with all these different DBs
- ❌ Higher Latency (Slower), because of more network hopping
In comparison with a multi-model database you resolve most of these challenges. First of all you run and maintain just 1 data service. So your application also needs to talk to a single data store and that requires only one programmatic interface for that data service.
In addition, latency will be reduced by going to a single data endpoint and eliminating several internal network hubs.
So having one database, like Redis, that allows you to store different types of data or basically allows you to have multiple types of databases in one as well as act as a cache solves such challenges.
- ✅ Run and maintain just 1 database
- ✅ Simpler
- ✅ Reduced Latency (Faster)
Redis Modules 📦
The way it works is that you have Redis Core, which is a key value store that already supports storing multiple types of data and then you can extend that core with what's called modules for different data types, which your application needs for different purposes. So for example RediSearch for search functionality like ElasticSearch or Redis Graph for graph data storage and so on:
And a great thing about this is that it's modular. So these different types of database functionalities are not tightly integrated into one database, but rather you can pick and choose exactly which data service functionality you need for your application and then basically add that module.
Out-of-the-box Cache ⚡️
Of course when using Redis as a primary database you don't need an additional cache, because you have that automatically out of the box with Redis. That means again less complexity in your application, because you don't need to implement the logic for managing populating and invalidating cache.
Redis is fast 🚀
As an in-memory (data is stored in RAM) database, Redis is super fast and performant, which of course makes the application itself faster.
But at this point you may be wondering:
How can an in-memory database persist data? 🤔
If the Redis process or the server on which Redis is running fails, all the data in memory is gone right? So how is the data persisted and basically how can I be confident that my data is safe? 👀
Well, the simplest way to have data backups is by replicating Redis. So if the Redis master instance goes down, the replicas will still be running and have all the data. So if you have a replicated Redis, the replicas will have the data.
But of course if all the Redis instances go down you will lose the data, because there will be no replica remaining. 🤯So we need real persistence.
Redis has multiple mechanisms for persisting the data and keeping the data safe.
First one: the snapshots, which you can configure based on time, number of requests etc. So snapshots of your data will be stored on a disk, which you can use to recover your data if the whole Redis database is gone.
But note that you will lose the last minutes of data, because you usually do snapshotting every five minutes or an hour depending on your needs. 😐
So as an alternative Redis uses something called AOF, which stands for Append Only File.
In this case every change is saved to the disk for persistence continuously. And when restarting Redis or after an outage, Redis will replay the Append Only File logs to rebuild the state.
So AOF is more durable, but can be slower than snapshotting.
Best Option 💡 : Use a combination of both AOF and snapshots, where the AOF is persisting data from memory to disk continuously plus you have regular snapshots in between to save the data state in case you need to recover it:
Let's say my 1 Redis instance runs out of memory, so data becomes too large to hold in-memory or Redis becomes a bottleneck and can't handle any more requests. In such case how do I increase the capacity and memory size for my Redis database? 🤔
We have several options for that:
First of all, Redis supports clustering. This means you can have a primary or master Redis instance, which can be used to read and write data and you can have multiple replicas of that primary instance for reading the data:
This way you can scale Redis to handle more requests and in addition increase the high availability of your database, because if master fails 1 of the replicas can take over and your Redis database basically can continue functioning without any issues.
Well that seems good enough, but what if
- your dataset grows too large to fit in a memory on a single server.
- Plus we have scaled the reads in the database, so all the requests that basically just query the data. But our master instance is still alone and still has to handle all the writes.
So what is the solution here? 🤔
For that we use the concept of sharding, which is a general concept in databases and which Redis also supports.
So sharding basically means that you take your complete data set and divide it into smaller chunks or subsets of data, where each shard is responsible for its own subset of data.
So that means instead of having one master instance that handles all the writes to the complete data set, you can split it into say 4 shards, each of them responsible for reads and writes to a subset of the data. 💡
And each shard also needs less memory capacity, because they just have a fourth of the data. This means you can distribute and run shards on smaller nodes and basically scale your cluster horizontally:
So having multiple nodes, which run multiple replicas of Redis which are all sharded gives you a very performant highly available Redis database that can handle much more requests without creating any bottlenecks 👍
Check out my video below for the last 2 topics and scenarios:
- Applications that need even higher availability and performance across multiple geographic locations
- The new standard for running microservices is the Kubernetes platform, so running Redis in Kubernetes is a very interesting and common use case
The full video is available here: 🤓
Hope this was helpful and interesting for some of you! 😊
Like, share and follow me 😍 for more content: