I was using the Rails cache and Redis and I quickly overflew the memory storage so I went on a small quest to better understand the Rails cache implementation. I thought it worths writing a bit about it.
TLDR; use your cache client directly or pass the raw option as true to the Rails.cache methods.
Rails provides a comprehensive and easy to use interface for caching, that is the Cache-Store. It provides a common interface to any of the standard cache implementations that Rails provides out of the box, from the in-memory cache to file, Memcached and Redis.
The cache implementation is very convenient because that allows us to store from
HTML partials to
Models and complex classes. The best part is that it abstracts the whole serialization so you always end up with workable entities without needing to worry about a thing.
> game = Game.last => #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000"> > Rails.cache.write('pokemon', game) => "OK" > pokemon = Rails.cache.read('pokemon') => #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000"> > pokemon.name => "Pokemon"
In the example above we load a record from the
Games table then we cache that entity using the
Rails.cache.write method. When retrieving the cache entry with its key we end up with the same model class we were using before, and we can even call its methods and attributes as expected. That's super cool, isn't it!? But how does Rails do it?
# https://github.com/rails/rails/blob/291a3d2ef29a3842d1156ada7526f4ee60dd2b59/activesupport/lib/active_support/cache.rb#L598-L600 def serialize_entry(entry) @coder.dump(entry) end
The answer is in the snippet above from the cache-store implementation, and what the
@coder instance holds, it holds an instance of the
The marshaling library converts collections of Ruby objects into a byte stream, allowing them to be stored outside the currently active script. This data may subsequently be read and the original objects reconstituted.
Before reading or writing any record the cache-store will serialize the entry by default, and it will use the
Marshal library to do so. In that way, the magic is done for us and we can read and write any Ruby object 🥳!
Let’s now set this learning for a moment and analyze another example. Imagine we want to store a boolean.
> Rails.cache.write('yes', true) => "OK" > Rails.cache.fetch('yes') => true
Rails is able to store it and retrieve without any issues.
That said, we would expect the value stored in the cache to be stringified version of the boolean, right? To confirm that let’s connect directly to the storage and inspect the values there.
— In my case, I’m using Redis as the cache so I just instantiate a new instance of its client to connect directly to it.
After getting the
yes value it is clear than we have much more than “true”.
> redis = Redis.new => #<Redis client v4.1.4 for redis://127.0.0.0:6379/0> > redis.get('yes') => "\\u0004\\bo: ActiveSupport::Cache::Entry\\t:\\v@valueT:\\r@version0:\\u0010@created_atf\\u00161609929749.567886:\\u0010@expires_at0"
What ends up being stored is the serialized version of an ActiveSupport::Cache::Entry instance. The Entry class is an abstraction that implements expiration, compression and versioning of any cache record. Through this class, Rails can implement these features independently from the actual storage used behind it.
The cache entry class encapsulates whatever value we store in the cache by default. Leveraging the
Marshal lib the Rails cache is capable of storing any simple/complex object while offering the cache features. That is great!
In our previous example, the serialized version of the cache entry is a String of
100 chars instead to of a
4 chars String — true. That is an extra
96 chars for storing the same information.
While for the most cases that is totally fine, what if you really need to care about the amount of the stored data?
To understand the impact of these extra chars let’s elaborate more on our example.
short detour: Redis is implemented in C and it probably needs a few extra bytes to maintain our String value which is an array of chars underneath. But let’s not consider it since that’s the same extra bytes to all String values.
Knowing we need
1B to store
1 char, in C, we can conclude we would need
100B to store the serialized version of Entry cache store.
1 million records with the value true we would need
100MB (1M * 100B). This example is “simple” and
100MB may not sound a lot but if you need to store a little bit more than a boolean, if you are using the in-memory store, or if you have limited space in
Redis that can start hurting.
The direct alternative I could think about was to use the
Redis client directly instead of using the
> redis.set('no', false) => "OK" > redis.get('no') => "false"
It should work as expected and we are no longer utilizing the extra space for that value 🙌🏽. We are left then with the job to parse that object back to a boolean value.
Another alternative that I found after looking at the Redis cache store implementation on GitHub was to pass down the
> Rails.cache.write('yes', true, raw: true) => "OK" > redis.get("yes") => "true" > Rails.cache.read('yes', raw: true) => "true"
This option is only mentioned in the
Memcached part of the docs, but that is at least also supported on
Redis cache store implementation as it overrides the default
serialize_entry method . Similar to utilizing the Redis client directly we will need to parse the resulting string back to a boolean manually. Even though we lose the
Entry features that is not a big deal if you are using
Memcached since they provide most of these features out of the box.
Thanks a lot if you got this far!
The level of caution that this post brings to the usage of Rails cache is, most of the times, not required. However, if you ever want to cache millions of simple objects knowing some of these details can make a difference!
See you next time!