If you are building a production-grade application and your application uses Redis database (RDB) then you should replicate your data, so that in case of any disaster for your master data you can still use the replicated data.
Redis provides replication in two ways:
- Master-Slave replication
- Redis Cluster Replication
The most basic form of replication in Redis is Master-Slave replication. Data from the Master node is replicated to one or more Slave nodes (Replicas). Replicas can serve read operations, but all write operations are performed on the Master.
Master-Slave replication is a method of replicating RDB in order to improve performance and redundancy. The system has a Master that acts as the interface to the outside world, handling all external read and write requests. Whenever a change is made to the Master RDB, the change is propagated to the Replica connected to the Master. Master-Slave replication can be synchronous (in which changes to the replica RDB are made instantaneously) or asynchronous (in which changes are made only after some time).
The use cases of Master-Slave replication include:
- Improving performance by scaling out the workload to multiple slave RDB.
- Creating backups from the replica RDB, without disrupting the master RDB.
- Running BI and analytics workloads on the slave RDB, without disrupting the master RDB.
By default, Master-Slave replication in Redis is asynchronous. Redis Master-Slave replication is non-blocking, which means that the Master can continue to operate while the Replica synchronize the data. In addition, Replica will be able to handle queries using the out-of-date version of the RDB, except for a brief period during which the new data is loaded.
Redis Replicas are able to perform a partial resynchronization with the Master if the replication link is lost for a relatively small amount of time. New Replicas and reconnecting Replicas that are not able to continue the replication process just receiving differences, need to do what is called a "full synchronization". An RDB file is transmitted from the Master to the Replicas.
The transmission can happen in two different ways:
- disk-backed: the Master creates a new process
redis-rdb-bgsave
that writes the RDB file on disk. Later the file is transferred by the parent processredis-server
to the Replicas incrementally. - diskless: the Master creates a new process
redis-rdb-to-slaves
that directly writes the RDB file to replica sockets, without touching the disk at all.
With slow disks and fast (large bandwidth) networks, diskless replication works better: this can provide faster synchronization times.
The problem
In order to persist the RDB on disk you have to define save
directives in redis.conf
configuration file, also you can run BGSAVE
command manually.
According to Redis RDB disadvantages:
RDB needs to fork() often in order to persist on disk using a child process. fork() can be time consuming if the dataset is big, and may result in Redis stopping serving clients for some milliseconds or even for one second if the dataset is very big and the CPU performance is not great.
The name of the child process is redis-rdb-bgsave
.
According to the Linux man page:
fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent. Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.
Thus, fork() can cause the Master to freeze when performing BGSAVE, the related problem is described in the issue #9503 and antirez
blog posts #83, #84.
The diskless replication
is an option to mitigate the problem.
Prerequisites
You have installed Redis with Sentinel in accordance with article Set up a Redis Sentinel
Configuring the diskless replication
-
In order to disable forking the child process
redis-rdb-bgsave
you have to disable RDB persistence by commenting out all "save" lines in/etc/redis/redis.conf
on all nodes:
# save 900 1 # save 300 10 # save 60 10000
-
To enable the diskless replication set following mandatory parameters on all nodes:
repl-diskless-sync yes repl-diskless-sync-delay 5 repl-diskless-load on-empty-db
repl-diskless-sync-delay 5
: the delay in seconds the server waits in order to spawn the child that transfers the RDB via socket to the replicas.
repl-diskless-load on-empty-db
: use diskless loading the RDB directly from the socket only when it is completely safe for Replica.
When the diskless replication is enabled there are several scenarios could occur:
-
Slave node is rebooted:
This scenario is not abnormal. After Slave node rebooted Master performs "full synchronization" with Replica.
-
Master node is powered off:
This scenario is also not abnormal as well, because of sometimes shutting down a node is required for a long time due to maintenance tasks. In this case, Sentinel promotes Replica node to Master, and Replica's RDB in memory remains the same as before the failover.
After the old Master node is loaded, it will become a Replica and "full synchronization" will be performed in accordance with Scenario #1
-
The process "redis-server" has been killed on Master node
This scenario is dangerous because of the risk of data loss on all nodes as described in
antirez
blog post #80.
To eliminate the possibility of the dangerous scenario #3, it is necessary to add a delay before starting the 'redis-server' process on the Master node so that Sentinel would promote one of the Replicas to Master:
-
Save the following bash-script on all nodes as
redis-wait-for-slave-role.sh
to the folder/usr/local/bin/
:
#!/bin/bash if [[ $# -ne 2 ]]; then echo "Illegal number of parameters" >&2 exit 2 fi IFS="," TIME_OUT_COMMAND=5s redis_conf_path="$1" hosts_ip="$2" my_ip=$(hostname --ip-address) password=$(awk '/^requirepass/ {print $2}' $redis_conf_path | tr -d '\"') get_role() { echo -n "Getting a role from the node '$host' ... " 1>&2 local role=$(timeout $TIME_OUT_COMMAND \ redis-cli -h $host -p 6379 --pass $password --no-auth-warning info replication | \ awk '/^role/ {split($0,a,":");print a[2]}' | \ tr -d '\r') echo $role 1>&2 echo $role } while :; do for host in $hosts_ip; do if [ $host != $my_ip ]; then role=$(get_role) if [ "$role" = 'master' ]; then break 2 fi fi done sleep 1 done
-
Make the file executable:
$ sudo chmod +x /usr/local/bin/redis-wait-for-slave-role.sh
-
Run the following command on all nodes:
$ sudo systemctl edit redis-server.service
-
Add new
Service
section:
[Service] ExecStartPre=/usr/local/bin/redis-wait-for-slave-role.sh /etc/redis/redis.conf 10.0.0.21,10.0.0.22,10.0.0.23 TimeoutStartSec=infinity
10.0.0.21,10.0.0.22,10.0.0.23
: IPs of Master and Slave nodes
TimeoutStartSec=infinity
: if you are using a version ofsystemd
older than229
, you will need to use0
instead ofinfinity
to disable the timeout. -
Restart
redis-server
service on all Slave nodes only:
$ sudo systemctl restart redis-server.service
-
Check
redis-server
service started:
$ sudo systemctl status redis-server.service ● redis-server.service - Advanced key-value store Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/redis-server.service.d └─override.conf Active: active (running) since Tue 2024-09-24 14:53:07 MSK; 17s ago Docs: http://redis.io/documentation, man:redis-server(1) Process: 390540 ExecStartPre=/usr/local/bin/redis-wait-for-slave-role.sh /etc/redis/redis.conf 10.0.0.21,10.0.0.22,10.0.0.23 (code=exited, status=0/SUCCESS) Main PID: 390549 (redis-server) Status: "MASTER <-> REPLICA sync: Finished with success. Ready to accept connections in read-write mode." Tasks: 5 (limit: 7057) Memory: 2.2G CPU: 6.536s CGroup: /system.slice/redis-server.service └─390549 /usr/bin/redis-server 0.0.0.0:6379 Sep 24 14:53:06 redis-3 systemd[1]: Starting Advanced key-value store... Sep 24 14:53:06 redis-3 redis-wait-for-slave-role.sh[390543]: Getting a role from the node '10.0.0.21' ... master Sep 24 14:53:07 redis-3 systemd[1]: Started Advanced key-value store.
-
Restart
redis-server
service on the old Master node:
$ sudo systemctl restart redis-server.service
-
Check
redis-server
service started:
$ sudo systemctl status redis-server.service ● redis-server.service - Advanced key-value store Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/redis-server.service.d └─override.conf Active: active (running) since Tue 2024-09-24 14:58:33 MSK; 6min ago Docs: http://redis.io/documentation, man:redis-server(1) Process: 27935 ExecStartPre=/usr/local/bin/redis-wait-for-slave-role.sh /etc/redis/redis.conf 10.0.0.21,10.0.0.22,10.0.0.23 (code=exited, status=0/SUCCESS) Main PID: 28014 (redis-server) Status: "MASTER <-> REPLICA sync: Finished with success. Ready to accept connections in read-write mode." Tasks: 5 (limit: 7057) Memory: 2.2G CPU: 9.343s CGroup: /system.slice/redis-server.service └─28014 /usr/bin/redis-server 0.0.0.0:6379 Sep 24 14:58:16 redis-1 redis-wait-for-slave-role.sh[27969]: Getting a role from the node '10.0.0.22' ... slave Sep 24 14:58:19 redis-1 redis-wait-for-slave-role.sh[27976]: Getting a role from the node '10.0.0.23' ... slave Sep 24 14:58:25 redis-1 redis-wait-for-slave-role.sh[27984]: Getting a role from the node '10.0.0.22' ... slave Sep 24 14:58:25 redis-1 redis-wait-for-slave-role.sh[27991]: Getting a role from the node '10.0.0.23' ... slave Sep 24 14:58:31 redis-1 redis-wait-for-slave-role.sh[27999]: Getting a role from the node '10.0.0.22' ... slave Sep 24 14:58:31 redis-1 redis-wait-for-slave-role.sh[28007]: Getting a role from the node '10.0.0.23' ... master Sep 24 14:58:33 redis-1 systemd[1]: Started Advanced key-value store.
Testing the failover
This section shows to test the failover of the high availability Redis using Sentinel with enabled diskless replication.
-
Find new Master node by running the following command on a Sentinel node:
$ redis-cli -p 26379 --askpass sentinel get-master-addr-by-name mymaster Please input password: **************** 1) "10.0.0.23" 2) "6379"
-
Set new test key in Master:
$ redis-cli -h 10.0.0.23 -p 6379 --askpass set test_key Hello! Please input password: **************** OK
-
Check the value of this key on all Replicas:
$ redis-cli -h 10.0.0.21 -p 6379 --askpass get test_key Please input password: **************** "Hello!" $ redis-cli -h 10.0.0.22 -p 6379 --askpass get test_key Please input password: **************** "Hello!"
-
Execute the failover manually running following command:
$ redis-cli -p 26379 --askpass sentinel failover mymaster Please input password: **************** OK
-
Find new Master node by running the following command on a Sentinel node:
$ redis-cli -p 26379 --askpass sentinel get-master-addr-by-name mymaster Please input password: **************** 1) "10.0.0.21" 2) "6379"
-
Check the value of the test key on the new Master and all Replicas:
$ redis-cli -h 10.0.0.21 -p 6379 --askpass get test_key Please input password: **************** "Hello!" $ redis-cli -h 10.0.0.22 -p 6379 --askpass get test_key Please input password: **************** "Hello!" $ redis-cli -h 10.0.0.23 -p 6379 --askpass get test_key Please input password: **************** "Hello!"
-
Delete the RDB file on the Master node (don't do this on production environment):
$ rm /var/lib/redis/dump.rdb
-
Kill the
redis-server
process on the Master node:
pkill 'redis-server'
-
Wait for the
redis-server
process to start:
$ systemctl status redis-server.service ● redis-server.service - Advanced key-value store Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/redis-server.service.d └─override.conf Active: active (running) since Tue 2024-09-24 21:42:13 MSK; 23s ago Docs: http://redis.io/documentation, man:redis-server(1) Process: 63631 ExecStartPre=/usr/local/bin/redis-wait-for-slave-role.sh /etc/redis/redis.conf 10.0.0.21,10.0.0.22,10.0.0.23 (code=exited, status=0/SUCCESS) Main PID: 64578 (redis-server) Status: "Ready to accept connections" Tasks: 5 (limit: 7057) Memory: 69.1M CPU: 1.163s CGroup: /system.slice/redis-server.service └─64578 /usr/bin/redis-server 0.0.0.0:6379
-
Check the value of the test key on the new Master and all Replicas:
$ redis-cli -h 10.0.0.21 -p 6379 --askpass get test_key Please input password: **************** "Hello!" $ redis-cli -h 10.0.0.22 -p 6379 --askpass get test_key Please input password: **************** "Hello!" $ redis-cli -h 10.0.0.23 -p 6379 --askpass get test_key Please input password: **************** "Hello!"
Summary
The concept of replication without persistence is obviously impressive and causes anxiety and even wariness. Supporting diskless replication removes undesirable side-effect of disks (since disk I/O is slow and lazy), also it eliminates heavy forking a child process, being used for RDB persistence, with large datasets in memory.
Top comments (0)