DEV Community

Jim Hatcher
Jim Hatcher

Posted on

Memory Management in CockroachDB

If you've played with CockroachDB before, you may have seen some memory-related flags when starting Cockroach. Specifically, things called "cache" and "max-sql-memory":

cockroach start \
--certs-dir=certs \
--advertise-addr=<node1 address> \
--join=<node1 address>,<node2 address>,<node3 address> \
--cache=.25 \
Enter fullscreen mode Exit fullscreen mode

We talk a little bit about these settings in our docs, but how are they actually used?

I put together a few diagrams to illustrate what we mean by cache and sql-memory, and I'll try to illustrate how they're used.

Diagram of CRDB Node and its memory caches

Above is a diagram of a node running in a CockroachDB cluster. You can see that there are two caches managed by the cockroach process called SQL Memory and Pebble Cache. What we refer to above as "cache" is this Pebble cache. Pebble is the name of the Key/Value storage engine that CockroachDB leverages to actually store data.

There are some other areas of memory available outside of the cockroach process, namely the Operating System's page cache.

Let's consider a read that happens against a CRDB cluster, and let's assume for this example that this is the first read that's happened against this node so there is nothing loaded into any caches.

Read - no cache hits

Here are the steps that happen to satisfy this read query:

  1. SQL SELECT query hits the SQL engine
  2. SQL engine asks KV for data
  3. KV asks the O/S for data
  4. O/S reads from disk & loads files into the page cache (compressed -- all SSTables in CRDB are compressed)
  5. O/S gives KV data
  6. KV puts data in Pebble cache (uncompressed)
  7. KV processes data and gives to SQL
  8. SQL uses memory to process data (ordering, grouping, etc.)
  9. SQL returns results to client

You can imagine that on a subsequent read, that some of these steps could be short-circuited if they got a cache hit -- namely, you might be able to avoid the read from disk and only hit the O/S Page cache; or, even better, if the Pebble cache has the data, you don't need to even talk to the O/S at all.

The write path for a CockroachDB node is simpler.

Write path

The write involves:

  1. Mutation query (i.e., INSERT/UPDATE/DELETE) hits the SQL engine
  2. SQL engine passes data to KV engine
  3. KV writes data to SSTables in memory
  4. KV calls fsync to flush memory to disk (compressed)
  5. KV updates the Pebble cache (uncompressed)
  6. KV acknowledges write to SQL engine
  7. SQL returns acknowledgement to client

Hopefully, this blog clears up what "SQL Memory" and "Cache" mean in the context of CockroachDB. Happy SQL-ing!

Latest comments (0)