Kostas Kalafatis

Posted on Feb 3, 2023 • Originally published at dfordebugging.wordpress.com

Redis Crash Course - Redis Persistence Models

#ethereum #cryptocurrency #blockchain #web3

Before we can use Redis to store any kind of data with the intention of keeping it secure, we need to have a solid understanding of how Redis actually stores data. This is necessary before we can use Redis to store any kind of data. Redis has a wide variety of applications, many of which do not regard the loss of the data that it stores as a catastrophic event. Losing some data will not have a catastrophic effect on the system when it is being used as a cache or when it is powering real-time analytics. In additional use cases, we would like to have certain assurances regarding the persistence and recovery of data.

No persistence

Redis can be used as an LRU-based caching solution, such as a Memcached replacement. In such cases, saving data to disk may not be necessary. This can be set by setting appendonly to no and removing all save directives from the config file:

appendonly no
save ""

Redis RDB - Snapshots-based Persistence

RDB is an acronym for "Redis Database Backup." It is Redis’ solution for point-in-time snapshots. The frequency of the snapshots can be configured using the save configuration directive.

Redis saves snapshots of the dataset on disk by default in a binary file called dump.rdb. You can configure Redis to save the dataset every N seconds if there are at least M changes, or you can call the SAVE or BGSAVE commands manually.

By default, Redis uses the following settings:

save 900 1     # every 15 minutes if at least one key changed
save 300 10    # every 5 minutes if at least 10 keys changed
save 60 10000  # every 60 seconds if at least 10000 keys changed

This is what happens whenever Redis needs to dump the dataset to disk:

Redis forks. We now have a child and a parent process.
The child starts to write the dataset to a temporary RDB file.
When the child is done writing the new RDB file, it replaces the old one.

Forking

The coolest aspect of Redis, in my opinion, is how it uses forking and copy-on-write to enable performant data persistence.

Operating systems use forking to create new processes by making copies of themselves. This results in a new process ID as well as a few other bits of information and handles, allowing the newly forked process (the "child") to communicate with the original process parent.

This is where things start to get interesting. Redis is a process with a lot of memory, so how does it make a copy without running out of space? When you fork a process, the parent and child share memory, and Redis begins the snapshotting (Redis) process in the child process. This is made possible by copy-on-write memory sharing, which passes references to memory at the time the fork was created. No new allocations are made if no changes occur while the child process is persisting to disk.

When there are changes, the kernel keeps track of the references to each page, and if there are more than one, the changes are written to new pages. The change is completely unknown to the child process, which has a consistent memory snapshot. As a result, only a fraction of the memory is used, and we can obtain a point-in-time snapshot of potentially gigabytes of memory very quickly and efficiently.

It is not necessary to understand RDB's inner workings in order to appreciate its limitations. Redis' source code, on the other hand, is easily accessible. Understanding how snapshots are created and saved to disk can be very instructive.

The rdbSave function in rdb.c is the way to get into the RDB persistent model. The function is in charge of making the temporary file where the snapshot will be saved and renaming it to the dbfilename that was mentioned above:

/* Save the DB on disk. Return C_ERROR on error, C_OK on success. */
int rdbSave(char* filename, rdbSaveInfo* rsi) {
    /* [...] */
    snprintf(tmpfile,256,"temp-%d.rdb", (int) getpid());
    fp = fopen(tmpfile,"w");
    if (!fp) {
        char *str_err = strerror(errno);
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Failed opening the temp RDB file %s (in server root dir %s) "
            "for saving: %s",
            tmpfile,
            cwdp ? cwdp : "unknown",
            str_err);
        return C_ERR;
    }
    /* [...] */
    if (rename(tmpfile,filename) == -1) {
        char *str_err = strerror(errno);
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Error moving temp DB file %s on the final "
            "destination %s (in server root dir %s): %s",
            tmpfile,
            filename,
            cwdp ? cwdp : "unknown",
            str_err);
        unlink(tmpfile);
        stopSaving(0);
        return C_ERR;
    }
    /* [...] */
}

After getting the file pointer for the temporary file, Redis sends the serialization work to the appropriate rio* functions. rio.c is a smart way to hide streams, which lets Redis use the same serialization code for both buffers in memory and files. It has a number of extra uses, like making it easy to calculate checksums for data that has already been processed.

The code for serialization is in rdbSaveRio, which works on the rio stream and not on the raw file descriptor:

int rdbSaveRio(int req, rio *rdb, int *error, int rdbflags, rdbSaveInfo *rsi) {
    /* [...] */

    snprintf(magic,sizeof(magic),"REDIS%04d",RDB_VERSION);
    if (rdbWriteRaw(rdb,magic,9) == -1) goto werr;
    if (rdbSaveInfoAuxFields(rdb,rdbflags,rsi) == -1) goto werr;
    if (!(req & SLAVE_REQ_RDB_EXCLUDE_DATA) && rdbSaveModulesAux(rdb, REDISMODULE_AUX_BEFORE_RDB) == -1) goto werr;

    /* [...] */
    for (j = 0; j < server.dbnum; j++) {
        /* [...] */

        /* Iterate this DB writing every entry */
        while((de = dictNext(di)) != NULL) {
            sds keystr = dictGetKey(de);
            robj key, *o = dictGetVal(de);
            PORT_LONGLONG expire;

            initStaticStringObject(key, keystr);
            expire = getExpire(db, &key);
            if (rdbSaveKeyValuePair(rdb, &key, o, expire) == -1) goto werr;

            /* [...] */
        }
        /* [...] */
    }
werr:
    /* [...] */
}

The logic behind serialization is pretty simple as a whole. The code that was left out above has a lot of moving parts and details, but on a high level, it works like this:

We start the file with a file signature that is the name "REDIS" in ASCII followed by the server's version number.
The rdbflags are written to the rio stream. rdbflags is a bitmask that is used to set up different options for how the snapshot should be made:

/* flags on the purpose of rdb save or load */
#define RDBFLAGS_NONE 0               /* No special RDB loading. */
#define RDBFLAGS_AOF_PREAMBLE (1<<0)  /* Load/save the RDB as AOF preamble. */
#define RDBFLAGS_REPLICATION (1<<1)   /* Load/save for SYNC. */
#define RDBFLAGS_ALLOW_DUP (1<<2)     /* Allow duplicated keys when loading. */

We go through each database one by one (see SELECT for details on what this means).
For each database, we get an iterator that lets us go through all the key-value pairs in order.
We call rdbSaveKeyValuePair for each key-value pair. This turns the given key and value into a binary representation that can be saved to disk. The serialization format is different for each type of value.

Advantages of RDB

Compactness: The resulting snapshot file is mostly just a key-value mapping. When inspecting the resulting RDB file with a hex editor, you'll notice that it's a fairly compact representation of the database's internal dictionary (plus a version prefix and checksum). Redis supports LZF-based compression, which is by default enabled and can further reduce the size of the snapshot.
Performance: Because snapshots are generated in the background, RDB-based compression has no effect on write performance. There is no need to do any additional work on individual writes. Forking and running the background process, on the other hand, can have an impact on performance.
Faster restarts: When Redis is restarted, it must reload the entire dataset into memory. Redis simply scans the existing snapshot and loads each KV pair back into its internal dictionary when using RDB. This is significantly faster than recreating the internal dictionaries from AOF entries (more about this later). #### Disadvantages of RDB
Durability: Pure RDB-based persistence is prone to data loss. Redis is a very dependable piece of system software. However, in the real world, machines crash and hardware fails. Even the most obscure issues and soft errors will occur in a large enough deployment. If you absolutely cannot afford to lose individual transactions that occurred after the last successful snapshot, relying solely on RDB for disaster recovery will not suffice.
Performance: RDB snapshots are created in background processes, which necessitate forking the server process. Because the process must iterate over the entire dataset and generate a snapshot from scratch every n updates / m seconds, this can be quite costly.

Redis AOF - Append Only File

The acronym AOF stands for "Append Only File," which is fairly self-explanatory. This is Redis's second persistence model, and it works by appending each operation to a file specified by the appendfilename directive:

#The name of the append only file (default: "appendonly.aof")
appendfilename "appendonly.aof"

Snapshotting is not very long-lasting. The most recent data written to Redis will be lost if your computer running Redis crashes, your power line fails, or you accidentally kill -9 your instance. While this may not be a big deal for some applications, there are some that require full durability, and Redis snapshotting alone is not a viable option in these cases.

Since Redis 7.0.0, a multi-part AOF mechanism has been used. To put it another way, the original single AOF file is divided into base files (at most one) and incremental files (there may be more than one). When the AOF is rewritten, the base file represents an initial (RDB or AOF format) snapshot of the data. The incremental files contain changes that have occurred since the last base AOF file was created. All of these files are stored in their own directory and tracked by a manifest file.

To understand this somewhat better, consider the following example. The "Client" column displays commands issued by Redis clients via RESP, such as redis-cli. The content of the AOF file is displayed in the appendonly.aof column. Redis' current memory layout is shown in the third column.

Client	appendonly.aof	Redis-server (memory)
`SET` hello world	`*3` `$3` `set` `$5` `hello` `$5` `world`	hello: world
`SET` city Seattle	`*3` `$3` `set` `$5` `hello` `$5` `world` `$3` `set` `$4` `city` `$7` `Seattle`	hello: world city: Seattle
`SET` city London	`3` `$3` `set` `$5` `hello` `$5` `world` `$3` `set` `$4` `city` `$7` `Seattle` `3` `$3` `set` `$4` `city` `$6` `London`	hello: world city: London

A set of instructions is appended to the AOF file for each command issued by clients. For example, "SET hello world" will be serialized as "3n$3nsetn$5nhellon$5nworld." "3n" indicates that the command that follows accepts two arguments (3-1). These arguments are the key ("hello") and value in the case of the SET command ("world"). The length in bytes of each command and argument is prefixed. "$5n" indicates that the following argument is 5 bytes long ("world").

Log Rewriting

As write operations are performed, the AOF grows larger and larger. For example, if you increment a counter 100 times, you'll have a single key in your dataset with the final value but 100 entries in your AOF. 99 of those entries are unnecessary for re-creating the current state.

The rewrite is entirely secure. While Redis continues to append to the old file, a completely new one is created with the bare minimum of operations required to create the current data set, and once this second file is ready, Redis switches between the two and begins appending to the new one.

So Redis has an interesting feature: it can rebuild the AOF in the background without interrupting client service. Redis will write the shortest sequence of commands required to rebuild the current dataset in memory whenever you issue a BGREWRITEAOF. If you're using the AOF with Redis 2.2, you'll need to run BGREWRITEAOF on a regular basis. Since Redis 2.4, it is possible to automate log rewriting (see the example configuration file for more information).

Since Redis 7.0.0, when an AOF rewrite is scheduled, the Redis parent process opens a new incremental AOF file to continue writing. The rewrite logic is executed by the child process, which generates a new base AOF. To track the newly generated base and incremental files, Redis will use a temporary manifest file. When they are finished, Redis will execute an atomic replacement operation to make this temporary manifest file effective. To avoid the problem of creating many incremental files as a result of repeated AOF rewrite failures and retries, Redis implements an AOF rewrite limiting mechanism that ensures failed AOF rewrites are retried at a slower and slower rate.

How durable is the AOF?

You can specify how frequently Redis will fsync data to disk. There are three choices:

appendfsync always: fsync whenever new commands are added to the AOF. Very slow, but very safe. It is important to note that the commands are appended to the AOF after a batch of commands from multiple clients or a pipeline are executed, implying a single write and fsync (before sending the replies).
appendfsync everysec: fsync every single second. If you are not fast enough (version 2.4 is likely to be as fast as snapshotting), you may lose 1 second of data if there is a disaster.
appendfsync no: No fsync, simply hand over your data to the operating system. The quicker but less secure method. With this configuration, Linux will normally flush data every 30 seconds, but this is subject to kernel tuning.

The recommended (and default) policy is to fsync once every second. It is quick and relatively safe. In practice, the always policy is very slow, but it supports group commit, so if there are multiple parallel writes, Redis will attempt to perform a single fsync operation.

Advantages of AOF

Durability: Each operation adds to the appendonly.aof file. As a result, Redis can be restored to its previous state by line-by-line replaying the commands listed in the AOF file. This is in contrast to RDB-based persistence, where all writes prior to the most recent snapshot are lost. However, there are some caveats: committing data to disk after each command with fsync would be prohibitively expensive. As a result, Redis defaults to calling fsync at most once per second, and one may still end up losing data in the event of, say, a power outage. If the specific use case cannot tolerate this, Redis can be configured to always call fsync after every operation using appendfsync.
Transparency: When compared to the RDB snapshot, understanding the generated AOF file is simple. With a text editor of one's choice, the appendonly.aof file can be easily inspected or truncated. This can also help with debugging because the file can be used as a log to track down bugs and other issues. #### Disadvantages of AOF
File size: appendonly.aof is almost always significantly larger than the dump.rdb file generated by RDB. This is especially true when performing multiple writes to the same set of keys. In the preceding example, we changed the "city" to "Seattle," then to "London." As a result, we add two operations to the AOF file. If we had instead created an RDB snapshot, there would have been only one key-value pair and no commands or length prefixes. AOF can mitigate this issue in part by compacting the log in the background and rewriting commands (e.g., "INCR counter," "INCR counter," "INCR counter" could be rewritten as "SET counter 3").
Startup time: Redis must replay every command listed in the AOF file in order to return to its previous state. This can be significantly slower than restoring the state from a simple snapshot file, depending on one's write-load. When using RDB, Redis only needs to load each KV-pair into memory, regardless of how frequently a particular key is updated. This is faster than re-executing every command that resulted in the current state.
Performance: It is possible that AOF is slower than RDB, but it is difficult to say for sure without running benchmarks for one's own specific use case. However, when using RDB, performance tends to be more predictable and less dependent on the specific write-load of the use case. Compaction of logs As shown in the preceding example, the size of the AOF file will continue to grow over time. This can be difficult, especially when dealing with constantly changing data. Assuming we have a counter service that is incremented for each page view via INCR, then:

Client	appendonly.aof	Redis-server (memory)
`INCR` counter	`*2` `$4` `INCR` `$7` `counter`	counter: 1
`INCR counter`	`2` `$4` `INCR` `$7` `counter` `2` `$4` `INCR` `$7` `counter`	counter: 2
`INCR` counter	`2` `$4` `INCR` `$7` `counter` `2` `$4` `INCR` `$7` `counter` `*2` `$4` `INCR` `$7` `counter`	counter: 3

As can be seen, we add an additional five lines to the appendonly.aof file for each bump. Assume our Redis instance fails after ten million page views. Redis would have to reevaluate fifty million commands upon restart.

Redis provides the option to "compact" AOF files because this is neither efficient nor scalable. This is now done automatically in newer versions of Redis. Alternatively, the BGREWRITEAOF command can be used to start a corresponding background job:

127.0.0.1:6379> BGREWRITEAOF
Background append only file rewriting started
127.0.0.1:6379> quit
alexandergugel@192 redis % tail appendonly.aof
*3
$3
SET
$7
counter
$1
3

In this example, the three INCR commands were rewritten to a single SET instruction.

AOF with RDB-preamble

Redis lets you make more changes to make log compression and recovery faster. As was said above, AOF instructions can have a lot fewer commands if they are rewritten. Redis, on the other hand, needs to run at startup to get back to the way it was, and it is still not nearly as small as RDB, which was made as a snapshot file format first and foremost.

Because of this, Redis has a setting called aof-use-rdb-preamble:

aof-use-rdb-preamble yes

When this is turned on, a "RDB preamble" will be added to the beginning of the appendonly.aof file. This preamble will have a snapshot of what Redis was storing when the background job to compact the logs was started. As usual, new commands will be added to the AOF file until the next time the AOF file is written over.

By enabling aof-use-rdb-preamble, we can reap many of the benefits of RDB while also ensuring better durability guarantees. The majority of the data will be stored in the RDB preamble, depending on when the AOF file was rewritten. At startup, it's faster to restore a state from the RDB preamble than to play back each command from an AOF file. Also, the RDB file is a more compact version of the AOF commands and arguments with lengths at the beginning. Finally, client commands can be added to the AOF file as soon as they are received. This makes sure that commands that are not in the snapshot can be played back when the server starts up.

The main disadvantage of using the RDB preamble is the added complexity: when problems arise, one can no longer simply inspect the AOF file. Redis has utility scripts for repairing corrupted AOF files, and newer versions even fix some issues on startup. However, there is a level of complexity involved in rewriting AOF files that should not be underestimated. Certain edge cases must be considered and carefully considered, such as whether there is enough space on the volume on which the AOF file is stored.

Conclusion

So, what should one do? AOF or RDB? It is most likely that a combination of both will best suit one's needs. The following are some pointers to help you make your decision:

Always use AOF-based persistence with appendfsync if you cannot tolerate data loss. This ensures that the .aof file is saved to disk following each operation. Even if the power goes out, Redis can be restarted by running the commands listed in the file again. By adjusting appenddsync, the durability guarantees can be relaxed if this proves to be too expensive.
If you can afford to lose the last X seconds or Y operations, choose RDB-based persistence with save directives.
If it is possible to lose some data while maintaining acceptable performance, but it is preferable not to lose such data, an appropriate combination of AOF- and RDB-based persistence should be used. Even when using AOF, periodic snapshots via RDB are recommended for backups.

DEV Community