Memory Management in CockroachDB

#softwaredevelopment

If you've played with CockroachDB before, you may have seen some memory-related flags when starting Cockroach. Specifically, things called "cache" and "max-sql-memory":

cockroach start \
--certs-dir=certs \
--advertise-addr=<node1 address> \
--join=<node1 address>,<node2 address>,<node3 address> \
--cache=.25 \
--max-sql-memory=.25

We talk a little bit about these settings in our docs, but how are they actually used?

I put together a few diagrams to illustrate what we mean by cache and sql-memory, and I'll try to illustrate how they're used.

Above is a diagram of a node running in a CockroachDB cluster. You can see that there are two caches managed by the cockroach process called SQL Memory and Pebble Cache. What we refer to above as "cache" is this Pebble cache. Pebble is the name of the Key/Value storage engine that CockroachDB leverages to actually store data.

There are some other areas of memory available outside of the cockroach process, namely the Operating System's page cache.

Let's consider a read that happens against a CRDB cluster, and let's assume for this example that this is the first read that's happened against this node so there is nothing loaded into any caches.

Here are the steps that happen to satisfy this read query:

SQL SELECT query hits the SQL engine
SQL engine asks KV for data
KV asks the O/S for data
O/S reads from disk & loads files into the page cache (compressed -- all SSTables in CRDB are compressed)
O/S gives KV data
KV puts data in Pebble cache (uncompressed)
KV processes data and gives to SQL
SQL uses memory to process data (ordering, grouping, etc.)
SQL returns results to client

You can imagine that on a subsequent read, that some of these steps could be short-circuited if they got a cache hit -- namely, you might be able to avoid the read from disk and only hit the O/S Page cache; or, even better, if the Pebble cache has the data, you don't need to even talk to the O/S at all.

The write path for a CockroachDB node is simpler.