Leveraging and expiring your cache for model, association and query caching in Rails

#rails #redis

TL;DR; Leverage model caching on commonly used queries, associations and scopes to give your database a break. There are plenty of cheap options in Rails to efficiently use and expire your cache. The more you do it, the smarter the load becomes on your DB.

In most applications, scaling the runtime (= your application servers) is far easier than scaling your database. Using a serverless approach like GCP Cloud Run you can horizontally scale your app to thousands of instances.

The same cannot be said with databases, especially relational ones. Most of the time they represent an expensive bottleneck and are harder to scale. Which is why it's good practice to take the habit to alleviate the load on it when cheap alternatives exist.

Let's take a concrete example: background jobs. Should you use ActiveJob, Sidekiq, Resque or Cloudtasker for GCP, it's very common to have jobs defined like this:

class MyJob
  def perform(project_id)
    return unless (record = Project.find_by(id: project_id)

    # ... do stuff related to your model ...
    # E.g. longpoll data from a third-party provider
  end
end

It's alright, it's just a find method. But let's imagine thousands of these jobs running constantly. Your DB will certainly cope with it, but the question is: do you really want your DB to spend expensive CPU milliseconds on this kind of basic queries?

The primary goal of a relational database is to be right, not to be fast. If you're looking to be fast, you should look at other options. Redis caching is one of the most popular one.

Let's use that pesky find method to see what cheap alternative we have...then dig into other caching opportunities.

Model.find: a quick win

Assuming you have spotted a few models which are read-intensive, the following module will provide a find_cached method which leverages Redis first. The module also expires the cache whenever your model gets updated or destroyed.

# app/models/concerns/has_find_cached.rb

# This module provides a find_cached class method which
# returns a cached version of the record instead of making
# a database call.
#
# The find_cached method relies on find_for_cached which
# specifies how the record should be loaded. If any association
# preloading should be done, then find_for_cached should be
# overridden by the including class.
#
# The cached version gets automatically expired on update,
# destroy or after 10 minutes.
module HasFindCached
  extend ActiveSupport::Concern

  # Find cached duration
  FIND_CACHED_DURATION = 1.day

  included do
    # Expire cache key after change actions
    after_commit :expire_find_cached_key, on: %i[update destroy]
  end

  #---------------------------------------
  # Class methods
  #---------------------------------------
  class_methods do
    #
    # Default lookup method. To be overriden by the
    # implementing class if any preloading is required.
    #
    # @param [String] id The ID of the record.
    #
    # @return [ApplicationRecord] The looked up record.
    #
    def find_for_cached(id)
      find_by(id: id)
    end

    #
    # Return the cache key used for the find_cached method.
    #
    # @param [String] id The ID of the record.
    #
    # @return [String] The cache key
    #
    def find_cached_key(id)
      "#{model_name.cache_key}/#{id}/find_cached"
    end

    #
    # Find a cached version of the project. This method is
    # primarily used in import jobs to prevent making too many
    # database calls.
    #
    # This method should only be used for reading persistent attributes,
    # not real time ones (e.g. project progress or integration status)
    # The integration is preloaded but project cache does not get expired
    # when the integration is updated.
    #
    # @param [String] id The ID of the record.
    #
    # @return [Project] The cached version of the project
    #
    def find_cached(id)
      Rails.cache.fetch(find_cached_key(id), skip_nil: true, expire_in: FIND_CACHED_DURATION) do
        find_for_cached(id)
      end
    end
  end

  #
  # Return the cache key used for the find_cached method.
  #
  # @return [String] The cache key
  #
  def find_cached_key
    @find_cached_key ||= self.class.find_cached_key(id)
  end

  #
  # Expire the cached version of the record.
  #
  def expire_find_cached_key
    # Abort if no changes were actually applied to the record
    return unless saved_changes.present? || destroyed?

    # Expire cached version
    Rails.cache.delete(self.class.find_cached_key(id))
  end
end

You can use this module in your ActiveRecord models like this:

class Project < ApplicationRecord
  include HasFindCached

  # ...
end

Then update your find calls with:

class MyJob
  def perform(project_id)
    return unless (record = Project.find_cached(project_id)

    # ... do stuff related to your model ...
    # E.g. longpoll data from a third-party provider
  end
end

That's all you need. You've just saved your database thousands of useless calls potentially.

"Wait! I usually need to access parent associations through this model, so I would still be making database calls!" Not if you eager load associations in the cached version of your record.

The module above allows you to customize the cached version of your record via find_for_cached. Example:

class Project < ApplicationRecord
  include HasFindCached

  belongs_to :company

  # Eager load the parent company on the cached version
  # returned by find_cached
  def find_for_cached(id)
    eager_load(:company).find_by(id: id)
  end
end

There is a caveat though: The cached company association will not be expired upon company update. It is alright if you only need to access persistent attributes on the company association but if you need to access regularly updated attributes, then you need to manually expire the project cache upon company update.

Cache expiration of associated models can be achieved through an after_commit callback, such as:

class Company < ApplicationRecord
  has_many :projects

  # Expire project cache keys after change actions
  after_commit :expire_associated_find_cached_keys, on: %i[update destroy] 

  # ...

  private

  # Expire cache keys of associated records
  def expire_associated_find_cached_keys
    # Abort if no changes were actually applied to the record
    return unless saved_changes.present? || destroyed?

    # Collect all project cache keys for find_cached
    project_cache_keys = projects.pluck(:id).map { |e| Project.find_cached_key(e) }

    # Delete them in one go
    Rails.cache.delete_multi(project_cache_keys)
  end
end

Your project find_cached version will now be properly be expired on parent model updates.

Now let's keep in mind it's a tradeoff. The more you link records together for cache expiration and the more these related records are updated, the less you'll benefit from your cache.

If all you need on your Project cached versions is to access persistent company references that will never change (e.g. an external customer ID), then you might actually be better off not expiring the Project cached keys upon company update. But if you go down that path, ensure other developers are made aware of this caveat because relying on stale record attributes will lead to bugs difficult to troubleshoot.

Caching is an opportunistic habit, not a silver bullet

The previous section is simplistic and looks at the most basic form of caching: the find method. This is not going to save your application from DB overload. But it opens the path to more complicated caching approaches.

As an example, let's look at the Company <-> Project relationship. If some_company.projects is a call you frequently make and assuming the number of projects returned is expected to be reasonable, you can provide a cached version of this association in the following manner.

class Company
  has_many :projects

  # Return the cache key used to cache the list of projects
  def self.projects_cached_key(id)
    "#{model_name.cache_key}/#{id}/projects"
  end

  # Return a cached version of the list of projects associated
  # with this record.
  def projects_cached
    Rails.cache.fetch(self.class.projects_cached_key(id)) do
      projects
    end
  end
end

class Project
  belongs_to :company

  # Expire project cache keys after change actions
  # Note that unlike previous example, we use `after_commit` instead of `after_commit on: %i[update destroy]`
  # Creating a project should lead to cache expiration.
  after_commit :expire_associated_cached_keys

  # ...

  private

  # Expire cache keys of associated records
  def expire_associated_cached_keys
    # Abort if no changes were actually applied to the record
    return unless saved_changes.present? || destroyed?

    # Expire the parent company cache
    Rails.cache.delete(Company.projects_cached_key(company_id))
  end
end

The same approach can be used for scopes, large queries involving joins etc..

In the end, the hardest part is thinking about which resources are involved in your cache and placing the right expiration calls on your associated models.

Now as the title says, it's an opportunistic habit. There is no point in caching every single database call in Redis as it will clutter your application code more than anything.

Your first habit should be to look at your database monitoring system. NewRelic, DataDog, GCP Query Insights...will give you hints on which queries are expensive and frequently run.

Target these first. Once you've addressed the most expensive queries you can evaluate where to further optimize database calls.

Happy caching!