Arnaud

Posted on Mar 26, 2021 • Originally published at keypup.io

Use Redis Sets to track and expire cache keys in Rails

#rails #ruby #redis

TL;DR; You use Redis? Don't limit yourself to Rails.cache. Redis offers plenty of functionalities to manage your cache efficiently, including Sets and Lists to manage collections. If you get your hands dirty with Rails.cache.redis it will eventually pay off.

Caching is all about exhaustively expiring cache entries to avoid stale data.

A very common approach in fragment caching is to rely on record timestamps to ensure that your cache fragments do not serve stale versions of the underlying data. It's low maintenance and works well though it's still making database calls to check record timestamps.

Another approach is to use event-driven expiration. You create cache entries and manually expire cache keys when involved resources get updated.

This approach requires more maintenance - as you must ensure that update events properly lead to cache expiration - but opens the door to more efficient and flexible caching.

Previously we talked about basic caching by implementing a Project.find_cached method that caches the result of the find method. For this we developed a module that automatically clears the find cache entry on save.

We quickly ran into more complexity as soon as Project.find_cached started to eager load related parents, essentially because we had to also expire the cache when parents were updated.

The solution we ended up doing is this:

class Project < ApplicationRecord
  # We defined the HasFindCached module in a previous blog article.
  # Its main functionality is to provide a find_cached method (see below)
  # Full module available here: https://gist.github.com/alachaum/1421fb5e824f6f3546e3aa5242bf623c#file-02_find_cached-rb
  include HasFindCached

  belongs_to :company

  # The HasFindCached module provides a method like
  # the one below to retrieve records from cache instead of making
  # a database call.
  #    
  # def self.find_cached(id)
  #  Rails.cache.fetch(find_cached_key(id)) do
  #    find_for_cached(id)
  #  end
  # end

  # The result of this method will be cached and returned
  # by find_cached.
  #
  # Because the cache involves the parent company, we must
  # expire the cache when the parent company is updated
  def find_for_cached(id)
    eager_load(:company).find_by(id: id)
  end
end

# The Company model is configured to expire the cache
# on company updates
class Company < ApplicationRecord
  has_many :projects

  # Expire project cache keys after change actions
  after_commit :expire_associated_find_cached_keys, on: %i[update destroy] 

  # ...

  private

  # Expire cache keys of associated records
  def expire_associated_find_cached_keys
    # Abort if no changes were actually applied to the record
    return unless saved_changes.present? || destroyed?

    # Collect all project cache keys for find_cached
    project_cache_keys = projects.pluck(:id).map { |e| Project.find_cached_key(e) }

    # Delete them in one go
    Rails.cache.delete_multi(project_cache_keys)
  end
end

It's not graceful but it works.

It's not graceful because there is a lot of code involved just to expire the Project cache entries on the company side. We can definitely do better.

If you use Redis in your Rails app then it's time to get your hands dirty with Rails.cache.redis.

Ensure Redis is properly setup

The redis-rb gem is not thread-friendly by default. If you use redis-rb without a connection pool you will end up with race conditions on Redis accesses.

Here is a proper setup for Redis in Rails (you can also read the Rails guide on Redis pooling).

First make sure your Gemfile includes the following:


# Setup redis with hiredis for faster connections.
gem 'hiredis'
gem 'redis-rails'

# Connection pool for shared connections (e.g. Redis)
gem 'connection_pool'

Create a config file for redis:

# config/redis.yml

development:
  cache_url: 'redis://localhost:6379/1'
test:
  cache_url: 'redis://<%= ENV['REDIS_HOST'].presence || 'localhost' %>:6379/15'
uat:
  cache_url: 'redis://10.0.0.5:6379/1'
production:
  cache_url: 'redis://10.0.0.6:6379/1'

Finally, edit your application.rb and specify your cache store:

# config/application.rb

module MyApp
  class Application < Rails::Application
    # Initialize configuration defaults for originally generated Rails version.
    config.load_defaults 6.0

    # Redis configuration with connection pool
    config.cache_store = :redis_cache_store, {
      url: Rails.application.config_for(:redis).cache_url,
      pool_size: ENV.fetch('RAILS_MAX_THREADS') { 20 },
      pool_timeout: 5
    }

    # If you wish to use redis for session storage, you should add this as well
    # config.session_store :cache_store
  end
end

Good! Now you're ready to use Redis.

Redis is more than Rails.cache

Rails.cache gives you access to read/read_multi, write/write_multi, increment and decrement functions. These methods are good to cover the basics but are missing one of the most powerful features of Redis: Sets and Lists.

Sets and Lists allow you to track collections directly in Redis. From a caching perspective it enables the ability to track relationships between cache keys by storing and retrieving these relationships as collections.

Looping back to Rails.cache, it's a much more efficient to handle collections via Redis Sets and Lists rather relying on some form of key pattern matching using Rails.cache.key_matcher (in case you envisaged that solution).

Here are some examples of using Redis Sets and Lists in Ruby/Rails:

# Using Redis Sets
Rails.cache.redis.with do |conn|
  # Add elements
  conn.sadd("myset", "foo1")
  conn.sadd("myset", ["foo2", "foo3", "foo4", , "foo5"])

  # Remove elements
  conn.srem("myset", "foo3")
  conn.srem("myset", ["foo4", "foo5"])

  # Retrieve the list of elements
  conn.smembers("myset")
  # => ["foo1", "foo2"]
end

# Using Redis Lists
Rails.cache.redis.with do |conn|
  # Prepend or append elements 
  conn.lpush("mylist", "foo1")
  conn.rpush("mylist", ["foo2", "foo3", "foo4", , "foo5"])

  # Retrieve and remove the last element
  conn.rpop("mylist")
  # => "foo5"

  # Return all the elements in the list
  conn.lrange("mylist", 0, -1)

  # Get the count of elements
  conn.llen("mylist")
  # => 4
end

There are many other commands available for Sets and Lists. You can see them all here:

‍Set commands‍
List commands

Now let's see how we can harness that new Redis power to improve our caching strategies.

Using Sets to register cache dependencies and expire them automatically

Manually expiring cache keys in associated resources is always a bit ugly. You have to implement custom expiration logic in a commit callback to manually expire cache keys which were built by another record class.

Let's try a reusable approach where foreign dependencies are declared by the cached resource and managed as a Set.

The following module provides reusable logic for Active Record models to register associated cache keys and expire them when the record is updated. You could include this model in Application Record directly.

# app/models/concerns/has_cache_dependencies.rb

module HasCacheDependencies
  # Callback: expire all cache entries which were registered under this record
  after_commit :expire_cache_dependencies

  #
  # Return the Redis Set key maintaining the list of
  # cache keys associated with the record.
  #
  # @param [String] id The id of the record.
  #
  # @return [String] The Set key
  #
  def self.dependency_registry_cache_key(id)
    "#{model_name.cache_key}/#{id}/dependency_registry"
  end

  #
  # Register a cache dependency for the provided record
  # ID and cache key.
  #
  # @param [String] id The ID of the record.
  # @param [String] cache_key The associated cache key
  #
  def self.register_cache_dependency(id, cache_key)
    Rails.cache.redis.with { |c| c.sadd(dependency_registry_cache_key(id), cache_key) }
  end

  #
  # Return the Redis Set key maintaining the list of
  # cache keys associated with the record.
  #
  # @param [String] id The id of the record.
  #
  # @return [String] The Set key
  #
  def dependency_registry_cache_key
    @dependency_registry_cache_key ||= self.class.dependency_registry_cache_key(id)
  end

  #
  # Register a cache dependency for the provided record
  # cache key.
  #
  # @param [String] cache_key The associated cache key
  #
  def register_cache_dependency(cache_key)
    self.class.register_cache_dependency(id, cache_key)
  end

  #
  # Callback invoked when the record is committed. Clear all
  # cache keys which were registered under this record.
  #
  def expire_cache_dependencies
    # Abort if no changes were actually applied to the record
    return unless saved_changes.present? || destroyed?

    Rails.cache.redis.with do |conn|
      conn.multi do |multi|
        # Clear the cache key for each member
        conn.smembers(dependency_registry_cache_key).each do |key|
          conn.del(key)
        end

        # Remove the relationship Set entirely. The Set will be re-populated
        # as cache entries are repopulated by related records.
        # Removing the Set ensures we do not end up with cache key registrations
        # associated with records which have been de-associated (e.g. destroyed)
        conn.del(dependency_registry_cache_key)
      end
    end
  end
end

With the HasCacheDependencies module any resource can declare a cache entry as being dependent on a record by invoking:

# Register cache_key to be expired when a record gets updated

MyModel.register_cache_dependency(record_id, cache_key)
# OR
my_model.register_cache_dependency(cache_key)

Including this module on parent associations allows us to simplify our cache expiration strategy for the Company <-> Project relationship.

It's so simple that we even added a parent User model on Project to show what it looks like with multiple resource dependencies.

Our new version of cache registration/expiration looks like:

class Company
  include HasCacheDependencies

  has_many :projects
end

class User
  include HasCacheDependencies

  has_many :projects
end

class Project
  # We defined the HasFindCached module in a previous blog article.
  # Its main functionality is to provide a find_cached method (see below)
  # Full module available here: https://gist.github.com/alachaum/1421fb5e824f6f3546e3aa5242bf623c#file-02_find_cached-rb
  include HasFindCached

  # Parent associations
  belongs_to :company
  belongs_to :user

  # The HasFindCached module provides a method like
  # the one below to retrieve records from cache instead of making
  # a database call
  #    
  # def self.find_cached(id)
  #  Rails.cache.fetch(find_cached_key(id)) do
  #    find_for_cached(id)
  #  end
  # end

  #
  # The find_for_cached method is modified to invoke the
  # register_cache_dependency hook on each resource this
  # cache entry relies on.
  #
  # The find_cached_key class/instance method is provided
  # by HasFindCached and returns the cache key used to
  # cache the result of find_cached.
  #
  # By doing so we ensure that the find_cached cache entry
  # will be expired on company or user update.
  #
  def self.find_for_cached(id)
    eager_load(:company, :user).find_by(id: id)&.tap do |e|
      e.company.register_cache_dependency(e.find_cached_key)
      e.user.register_cache_dependency(e.find_cached_key)
    end
  end
end

The implementation above follows the natural logic of saying "upon creating this cache entry, please remember to expire it when associated records get updated".

This approach is way more efficient and streamlined than the one presented at the beginning of the article because:

Parent models do not need to make a single database call on after_commit to expire cache keys
Parent models do not need to load all their projects on after_commit to expire cache keys - only the keys which were registered get cleared. It's an opportunistic approach.
There is no custom logic in each parent model. The cache key registration/expiration pattern is reusable across models.

Wrapping up

Using native Redis functionalities can open the door to many optimizations in your application, especially related to caching.

The example above is one of many. As we stated last week, optimising the find method will not really make a difference in your app. But the pattern of registering/expiring cache dependencies can help you put in place many complex caching strategies in your app.

Beyond caching it's possible to use Redis as a complete datastore and almost completely bypass your database. We might show how in a future blog article.

Happy caching!