Kostas Kalafatis

Posted on Feb 2, 2023 • Originally published at dfordebugging.wordpress.com

Crash Course Redis - Redis Architectures

#blockchain #crypto #defi #web3

Before we start talking about the inner workings of Redis, let's talk about the different ways Redis can be deployed and the advantages and disadvantages of each.
We will be focusing in 4 different configurations, namely:
- Single Redis Instance
- Redis High Availability Architecture
- Redis Sentinel
- Redis Cluster

You have the option of using one setup or another, depending on the scale of your operation and the use case you are addressing.

Single Redis Instance

Redis can be deployed with the least amount of fuss by using a single database instance. It allows users to set up and run small instances, which can help them expand their services and increase their speed. However, there are some issues with the manner in which this deployment was carried out. For example, if this instance fails or becomes unavailable, all client calls to Redis will fail, lowering the system's overall performance and speed. If given enough memory and server resources, this instance has the potential to be very powerful.

A scenario primarily used for caching may result in a significant performance boost with very little additional setup required. If you have enough system resources, you might be able to run this Redis service on the same machine as the application. To manage data within the system, it is necessary to have a basic understanding of some Redis concepts. The commands sent to Redis are initially processed in memory.

Understanding a few Redis concepts for managing data within the system is essential. Commands sent to Redis are first processed in memory. Then, if persistence is set up on these instances, there is a forked process on some interval that facilitates data persistence through RDB (very compact point-in-time representation of Redis data) snapshots or AOF (append-only files).

Because of the two flows described above, Redis can support multiple replication strategies, have long-term storage, and enable more complex topologies. If Redis is not configured to persist data, any information stored in the database will be lost during a restart or failover. If the persistence feature was turned on during the restart, the instance is ready to take on new client requests after loading all of the data from the RDB snapshot or AOF back into memory.

Pros of Single Redis Instance Architecture

Ease of Installation: It is not difficult to get started using Redis because the installation of a single Redis instance is not complicated and only requires a small amount of configuration. Because of this, it is a good option for businesses and developers who want to set up a Redis solution in a short amount of time.
Cost Effectiveness: Because setting up a single Redis instance is less expensive than setting up multiple nodes, it is a cost-effective solution for projects that are on the smaller to medium side in scale. This is especially helpful for newly founded companies and smaller companies that are on a tighter financial footing and want to implement Redis.
In-memory Storage: Redis stores all of its data in memory, which enables it to perform read and write operations in an extremely timely and effective manner. When compared to traditional disk-based databases, this results in a significant performance boost. This is especially beneficial for applications that require real-time data processing and analysis.
Flexibility: Redis is a flexible solution that can be applied to a diverse range of use cases because it supports multiple data structures. These data structures include strings, hashes, lists, sets, sorted sets, and geospatial indexes. This gives developers the ability to pick the data structure that works best for their particular requirements, which ultimately results in a solution that is more effective and optimized.

Cons of Single Redis Instance Architecture

Scalability Limitations: Because it can only process a certain amount of data and traffic at one time, a single Redis instance has a limited capacity for scaling. This means that as your data and traffic continue to grow, you may eventually reach a point where a single instance is no longer sufficient to handle the load. If this happens, you will need to move to a more scalable solution.
Data Persistence: All of the data that is currently stored in memory will be lost if the single instance that is currently running fails. There are options for persistence that can be used with Redis, such as RDB and AOF; however, using these options may result in a decrease in performance and may not be appropriate for all use cases. This indicates that you will need to implement additional safeguards, such as backups and replication, in order to guarantee the persistence of your data.
Single Point of Failure: Because there is neither a backup nor any other form of redundancy in place, a single instance of Redis can be a single point of failure. This means that if only one of your instances goes down, your entire Redis setup will be inaccessible until the instance that is causing the problem is fixed.
Memory Constraints: Because Redis stores all of its data in memory, it has the potential to become a memory bottleneck, which is especially problematic when working with large datasets. Because of this, you will likely need to make an investment in additional memory to support your Redis configuration, or you will need to investigate alternative solutions for dealing with large datasets.

High Availability Architecture

Another common Redis configuration is a primary deployment with a secondary deployment that is kept in sync with replication. As data is written to the main instance, copies of those commands are sent to a replica client output buffer for secondary instances, allowing for easier replication. One or more instances in your deployment can serve as secondary instances. These instances can help scale Redis reads or provide failover if the primary fails.

But first, what is High Availability? High availability (HA) is a system characteristic that aims to maintain an agreed level of operational performance, typically uptime, for a longer than average period of time. It is critical in these HA systems that there is no single point of failure so that systems can recover gracefully and quickly. This leads to reliable crossover, which prevents data loss during the switch from primary to secondary, as well as automatic failure detection and recovery.

At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel), there is a leader follower (master-replica) replication that is simple to use and configure. It allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks and will attempt to be an exact copy of it regardless of what happens to the master.

There are three states that the system can be in:

When a master instance and a replica instance are well-connected, the master keeps the replica up-to-date by sending a stream of commands to the replica to replicate any changes to the master dataset caused by client writes, keys that have expired or have been evicted, or any other action.
When the connection between the master and the replica breaks because of network problems or because either the master or the replica detects a timeout, the replica reconnects and tries to do a partial resynchronization. This means that it will try to just get the part of the stream of commands that it missed while it was disconnected.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.

Redis can copy data in two ways: asynchronously and synchronously. Most users leave it on "asynchronous," which is the default mode. The WAIT command, which lets clients ask for data replication at the same time, can be used to make synchronized replication happen. But using the WAIT command does not guarantee strong consistency across all Redis instances. Depending on how Redis persistence is set up, data can still be lost during failover even if the WAIT command is used.

How Redis replication works

When replicas connect to masters, they use the PSYNC command to send the replication ID of their old master and the offsets they've already processed. So, the master can send only the part that needs to be added. But if there isn't enough backlog in the master buffers or if the replica is referring to a history (replication ID) that is no longer known, a full resynchronization happens. In this case, the replica will get a full copy of the dataset, starting from scratch.

In more detail, here's how a full synchronization works: The master starts a save process in the background that makes an RDB file. At the same time, it starts putting all new write commands from clients into a buffer. When the background saving is done, the master sends the database file to the replica, which saves it to disk and then loads it into memory. The master will then send all commands that had been queued to the replica. This is done in a stream of commands, just like the Redis protocol.

You can test it out yourself using telnet. While the server is working, connect to the Redis port and issue the SYNC command. A bulk transfer will occur, and then every command received by the master will be re-issued in the telnet session. Actually, SYNC is an old protocol that is no longer used by newer Redis instances but is still present for backward compatibility: it does not support partial resynchronizations, so PSYNC is now used instead.

As previously stated, replicas can automatically reconnect when the master-replica link fails for whatever reason. If multiple replica synchronization requests come in at the same time, the master will do a single background save to serve them all.

Replication IDs and how they work

In the last section, we saw that two instances have the same data if they have the same replication ID and replication offset. But it is helpful to know what the replication ID is and why each instance has two of them: the primary ID and the secondary ID.

A replication ID marks a certain point in the history of a data set. A new replication ID is made for an instance every time it starts over as a master or a replica is promoted to master. After the handshake, the replication ID of the master will be passed on to the replicas that are connected to it. So, two instances with the same ID are linked by the fact that they hold the same data, possibly at different times. For a given history (replication ID), the offset works as a logical time to figure out who has the most up-to-date data set.

For example, if two instances A and B have the same replication ID, but one has an offset of 1000 and the other has an offset of 1023, this means that the first one is missing some commands that have been run on the data set. It also means that A can reach the same state as B by giving just a few commands.

Because replicas can be promoted to masters, each Redis instance has two replication IDs. After a failover, the promoted replica needs to remember what its old replication ID was, because that was the replication ID of the old master. So, when other replicas try to sync with the new master, they will use the old master replication ID to try to do a partial resynchronization. This will work as expected because when the replica is promoted to master, it changes its secondary ID to its main ID and remembers what the offset was when this ID switch happened. Later, when a new history starts, it will choose a new ID by chance. When the new replicas connect, the master will match their IDs and offsets with both the current ID and the secondary ID (up to a given offset for safety). This means, in short, that replicas that connect to the new master after a failover don't have to do a full sync.

Normally, a full resynchronization requires creating an RDB file on disk and then reloading the same RDB from disk to feed the replicas with the data. If the disks are slow, this can be a very hard job for the master. The first version of Redis to support diskless replication is version 2.8.18. In this setup, the child process sends the RDB directly to the replicas over the wire, without putting it on the disk first.

Redis Sentinel

The Redis Sentinel system is distributed. Sentinel, like all distributed systems, has advantages and disadvantages. Sentinel is designed in such a way that a cluster of sentinel processes collaborate to coordinate state in order to provide high availability for Redis. After all, you wouldn't want the system that protects you to have its own single point of failure.

The Sentinel is in charge of a few things. First, it ensures that the existing primary and secondary instances are operational and responsive. This is required because sentinel (along with other sentinel processes) can detect and respond to situations in which the main and/or secondary nodes are lost. Second, it functions similarly to Zookeeper and Consul in other systems in terms of service discovery. When a new client attempts to write to Redis, Sentinel will inform the client of the current main instance.

So sentinels are constantly monitoring availability and sending that information to clients so that they can respond if they do failover. Below are the capabilities of the Sentinel:

Monitoring: Sentinel keeps checking to make sure that both your master instance and your replica instances are working as they should. Notification: Using an API, Sentinel can notify the system administrator or other computer programs that there is a problem with one of the Redis instances it is monitoring.
Automatic failover: If a master stops working properly, Sentinel can start a failover process in which a replica is promoted to master, the other replicas are reconfigured to use the new master, and the applications that use the Redis server are told the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for clients seeking service discovery, allowing them to connect to Sentinels to obtain the address of the current Redis master in charge of a given service. Sentinels will report the new address if a failover occurs.

Using Redis Sentinel in this manner enables failure detection. Multiple sentinel processes agree that the current main instance is no longer available for this detection. This process is called a Quorum. A quorum is the minimum number of votes required for a distributed system to perform an operation such as failover. This number is customizable, but it should reflect the number of nodes in the distributed system. Most distributed systems are three or five nodes in size, with quorums of two or three. In cases where the system must break ties, an odd number of nodes is preferred. This provides increased robustness and protection against a single machine misbehaving and being unable to communicate with the main Redis node.

This configuration has some drawbacks, so let's go over some recommendations and best practices for using Redis Sentinel. Redis Sentinel can be deployed in a variety of ways. As a general rule, it is recommended to run a sentinel node alongside each of your application servers (if possible), so you don't have to account for network reachability differences between sentinel nodes and Redis clients. Sentinel can run alongside Redis instances or even on separate nodes, but this complicates things in different ways. I recommend running at least three nodes with a quorum of at least two. Here's a simple chart that breaks down the number of servers in a cluster, as well as the associated quorum and tolerable failures.

Number of Servers	Quorum	Number of Tolerated Failures
1	1	0
2	2	0
3	2	1
4	3	1
5	3	2
6	4	2
7	4	3

Let's pause for a second and consider the potential problems that might arise in such an arrangement. You can expect to experience each of them eventually if you let this system run for a long enough period of time.

What happens if none of the sentinel nodes are in quorum?
What if the network splits, putting the old main instance in the minority? What happens to those manuscripts? (Spoiler alert: they are lost when the system fully recovers.)
What happens if sentinel nodes and client nodes (application nodes) have misaligned network topologies?

There are no guarantees of durability, especially since disk persistence (see below) is asynchronous. When clients learn about new primaries, there's always the nagging question of how many writes we missed because of a careless primary. When new connections are established, Redis recommends that they query for the new primary. Depending on how the system is configured, this could result in significant data loss.

There are a few ways to reduce the level of losses if you force the main instance to replicate writes to at least one secondary instance. Keep in mind that all Redis replication is asynchronous and has trade-offs. As a result, it will need to track acknowledgements independently, and if they aren't confirmed by at least one secondary instance, the main instance will stop accepting writes.

Redis Cluster

Redis Cluster is a distributed Redis implementation that shards (i.e., partitions) data across multiple Redis nodes automatically. Nobody can predict how many resources their Redis database will use. This means that being able to adequately scale your Redis database during periods of high demand is critical. Scalability is linked to availability, a metric that measures users' ability to access the database.

Let's examine various scaling methods before we dive into the Redis cluster.

Scalability describes the elasticity of a system. While we frequently use it to refer to a system's ability to grow, it is not limited to this definition. We can scale down, scale up, and scale out as needed. The amount of network traffic received by your website, web service, or application determines its success. It is common to underestimate how much traffic your system will experience, especially in the early stages. This could result in a server crash and/or a decrease in service quality. Thus, scalability describes your system's ability to adapt to change and demand. Scalability protects you from future downtime and ensures the quality of your service. But what options do you have for implementing scaling and ensuring your business's scalability? This is where horizontal and vertical scaling come into play.

Horizontal scaling (also known as scaling out) refers to adding new nodes or machines to your infrastructure to meet increased demand. If you host an application on a server and then find that it can't handle the traffic anymore, adding a server may be the answer. It's similar to splitting the workload among several employees rather than just one. However, the added complexity of your operation may be a disadvantage. You must determine which machine does what and how your new machines will interact with your old ones. Consider this the inverse of vertical scaling.

Vertical scaling (also known as scaling up) refers to the addition of new resources to a system in order to meet demand. What distinguishes this from horizontal scaling? While horizontal scaling refers to adding more nodes, vertical scaling refers to increasing the power of your existing machines. If your server requires more processing power, for example, vertical scaling would imply upgrading the CPUs. You can also scale memory, storage, and network speed vertically. Vertical scaling can also refer to completely replacing a server or moving a server's workload to a more powerful one.

The Redis Cluster scales horizontally.

So, to clarify some terminology, when we decide to use Redis Cluster, we have decided to shard the data we are storing across multiple machines, which is known as sharding. As a result, each Redis instance in the cluster is regarded as a shard of the data as a whole. This creates a new problem. How do we know which Redis instance (shard) is holding the data if we push a key to the cluster? There are several approaches, but Redis Cluster employs algorithmic sharding. We hash the key and multiply the total result by the number of shards to find the shard for a given key. Then, using a deterministic hash function, which means that a given key will always map to the same shard, we can predict where a given key will be when we read it in the future. What happens if we want to add a new shard to the system later? This is known as resharding.

If the key 'foo' was mapped to shard zero after the introduction of a new shard, it may now map to shard five. Moving data around to reflect the new shard mapping, on the other hand, would be slow and unrealistic if we needed to grow the system quickly. It also has a negative impact on the availability of the Redis Cluster.

Redis has a solution to this problem called Hashslot, which is where all of the data is sent.

Enter Hashslot

The Redis cluster employs a type of composite partitioning known as consistent hashing to determine which Redis instance a given key should be assigned to. In the Redis Cluster, this is known as a hash slot. The cluster's key space is divided among the cluster's masters. It is divided into 16384 slots, effectively limiting the cluster size to 16384 master nodes (though the suggested maximum node size is in the order of 1000 nodes). The hash slot is the CRC-16 hash algorithm applied to the key, followed by a modulo computation using 16384 bits.

Each master node is given a subset of hash slots, and the key and value are stored on that Redis instance. Every key is essentially hashed into a number between 0 and 16383. These 16k hash slots are distributed among the masters.

Prior to Redis 3.0, when Redis Cluster was not yet available, it was difficult to observe load management among multiple masters when implementing sharding or data partitioning. Redis-server does not support sharding and only supports a master-slave mode in Redis 3.0 and earlier. Consistent hashing and hash slots are algorithms that would suffice in terms of distributing keys that would be cached in Redis. While the other algorithm works, it is not as fast or as efficient as consistent hashing and the hash slot, which are commonly used in Redis master-slave mode or Redis Cluster. However, we will concentrate on two algorithm modes: Consistent Hashing and Hash Slot.

Both algorithms, as the name implies, employ hash functions and thus employ hash tables. Consider a bucket that you are constantly filling or retrieving the contents of. Consistent Hashing and Hash Slots function similarly to buckets. Assume we have an N-dimensional array with each entry pointing to an object bucket. Adding a new object would necessitate a h(k)% N, or hash(key) modulo N. Then, at the resulting index, check the bucket. If the object does not already exist, it is added. To find an object, we do the same thing, simply looking into the bucket to see if the object is there. Although hash table (or bucket in this analogy) searches are linear, a properly sized hash table should have a relatively small number of objects per bucket, resulting in almost constant time access with an average time complexity of O(N/k), where k is the number of buckets.

Multiple Redis clients have supported Consistent Hashing. Twemproxy supports a variety of hashing modes, such as consistent hashing and distribution. This approach is also used by other clients such as Jedis and Ketama. Consistent Hashing is a distribution scheme that does not rely on the number of servers directly, reducing the number of keys that must be relocated when adding or removing servers. Consistent Hashing is a simple but effective idea. Karger et al. at MIT described it first in a 1997 academic paper. It works regardless of how many servers or objects are in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to grow without interfering with the overall system. It's a circle with a power of 0 - 2 32 -1. The following are the main operation steps:

Typically, to hash a node, use its IP or data with a unique label to hash (IP), and then distribute its value on this closed circle.
The stored key is hashed (keyed), and its value is distributed on this closed circle.
The key is stored in a node found clockwise from the location where hash(key) is mapped on the circle. If no node is found at 0 on the circle, the key storage node is the first node in the clockwise direction after 0 on the circle.

In general, either the clockwise or counterclockwise direction shall apply as long as it circumnavigates the checks for which the hash(key) should be stored based on the distribution of nodes in the circle.

Hash slots still share the concept of using hashes or composite partitioning but do not rely on the circle-based algorithm upon which consistent hashing is based. Hash slots take care of the things that the master-slave model in a sharded environment does not have, such as adding and removing a node. In that case, scaling up or down horizontally can be smooth, or at least have less impact on the performance of your Redis Cluster. When we add new shards, we simply move hashslots across the systems. We only need to move hashslots from shard to shard to simplify the process of adding new primary instances to the cluster. This is possible with minimal downtime and performance impact. Let's look at an example.

We have the master server M1 which contains hashslots from 0 to 8191, and the master server M2 which contains hashslots from 8192 to 16383. So to map the element foo, we take the deterministic hash of the key foo and mod it by the number of hashslots. Let's say that this will lead to the mapping 5832, and our element will reside in M1. Now we create a new instance, let's say M3. The new mappings will be:

M1 with hashslots from 0 to 5460.
M2 with hashslots from 5461 to 10992.
M3 with hashslots from 10993 to 16383. Now all the keys that mapped to M1 that are now belong to M2 would need to relocate. But since we hash individual keys to hashslots, M2 just has to register all the hashslots that are transferred from M1, thus solving the problem of resharding.

Gossiping

The health of the Redis Cluster is determined by gossiping. We have three M nodes and three S nodes in the diagram above. All of these nodes are constantly communicating in order to determine which shards are available and ready to serve requests. If enough shards agree that M1 isn't responding, they can decide to promote M1's secondary S1 to primary status in order to keep the cluster healthy. The number of nodes required to trigger this is configurable, and getting this right is critical. If you do it incorrectly, the cluster may be split if it is unable to break the tie when both sides of a partition are equal. This is known as "split brain." For the most robust setup, it is necessary to have an odd number of primary nodes with two replicas each.

Conclusion

In this post, we discussed the various architectures that can be used when deploying Redis, ranging from single instance architectures to High Availability using Redis clusters. The following post in the series will go over the various persistence models used by Redis.

Top comments (1)

Comment deleted