Discussion on: Switching from Memcache to Redis and Some Tips on Caching

View post

Thanks for the detailed migration story and also the don't list, very useful!

Few comments about something which triggered my curiosity:
as a developer, it is long time that I have not used any caching directly, back in 2013 we were using Oracle Coherence (closed proprietary software, but I did not have any choice) with Java (although I do not think it is relevant for this discussion) and the key of the cache entries would not change when the value changed.

I.e., reusing your example, the cache key would be just:
"user-follow-count-#{id}"
And then every time one more follower had to be added for a user, the application code would increment the value of the corresponding cache entry (and also in the database in a transactional fashion), but the key of the cache entry would remain the same.

In your example, I see that whenever the value of the cache entry changes, the key also changes since the timestamp is part of the key.
You said that the updated_at timestamp is present in the key "to help ensure that it doesn't get stale if a user is updated" but at first look it would seem neater if the timestamp was part of the value (object) instead of being part of the key and it could serve the same purpose.

Also, having the timestamp included in the key means that just the user id is not enough to construct the desired key, which I think adds complexity to the client code.
How does the application code retrieves a specific cache entry value?
Does it use a wildcard for the timestamp part of the key?

To sum up, what are the advantages of having mutable keys when compared with the immutable cache keys which I have described above?
Thanks in advance!

Molly Struve (she/her) • Jan 5 '20

What are the advantages of having mutable keys when compared with the immutable cache keys?

The advantage to having the keys change is that you never need to worry about updating or deleting them which can add a lot of code complexity. Instead, if I have a user with an id and followed_at timestamp that I use to store follower_count then any time that last_followed_at timestamp changes(ie a follower is added or removed) my cache request:

Rails.cache.fetch("user-follow-count-#{id}-#{last_followed_at.rfc3339}", expires_in: 1.hour) do 
  followers.count
end

will create a new key to store the new count. The old key will simply expire. Now every time I request that key until it changes again I use the id and last_followed_at timestamp and the cache will return the correct key.

If I do not use the last_followed_at timestamp then every time the follower count changes I have to add additional code to delete the old cache key. By using the timestamp this code is not needed.

Rails.cache.delete("user-follow-count-#{id}")

Víctor Gil • Jan 6 '20

Ok, now I fully understand it.

So the entries in your cache are actually immutable (both the key and the value never change) and whenever you need to store a new value (i.e., increase the number of followers for a specific user), you just create a new cache entry with the new key (the timestamp being the part which is different from the previous entry key) and also with the new value.
BTW, I previously missed the fact that the last_followed_at is an instance field of the User object.
And as you explained, this way the application code does not bother to delete the outdated cache entries because they will be purged by Redis automatically at the expiration time.

The only drawback I see with this approach is that you are keeping outdated entries in the cache for longer than strictly needed (until expiration time) but you also explained that data space is not a constraint in your case so far, hence, it is a fair compromise in order to remove complexity from the application code.
Everything makes sense now, thank you!

Molly Struve (she/her) • Jan 6 '20

WOOT! Glad I was able to explain it better!

To be clear, we do this for a lot of really simple keys, but in the future, any keys that are very large we would likely plan to remove them as soon as they become invalid rather than letting them hang around. Or, as you said, start removing them more aggressively if cache size becomes a problem.

Amer Mahmud • Jan 10 '20

I am still confused by the reasoning you have provided, Molly. I may be missing something?

Redis inserts/updates a key using the SET command. It automatically overwrites the value of a key if the same key is provided again. So you do not have to worry about writing new code to update, where ever you are currently saving to Redis when the follower count changes with a new key, you can just save the follower count with the same old key?

With your current approach you first need to query for the "last_followed_at" value, then only can you query the user-follow-count? (The "last_followed_at" value may be part of the User object which has already been retrieved but you are still looking it up, yes?) But if you have the same key which is always updated, it is never stale.

And as you mention in your follow up comment, that you'll implement deletion for larger keys as they become invalid, but then you are introducing that same code complexity you were trying to avoid? (Though as per my understanding stated above, I don't believe there is code complexity to be added.)

I guess Victor's initial question remains unclear to me, "what are the advantages of having mutable keys when compared with the immutable cache keys"?

Molly Struve (she/her) • Jan 10 '20

The advantage to having the keys change is that you never need to worry about updating or deleting them which can add a lot of code complexity. Instead, if I have a user with an id and followed_at timestamp that I use to store follower_count then any time that last_followed_at timestamp changes(ie a follower is added or removed) my cache request will create a new key to store the new count.

Rails.cache.fetch("user-follow-count-#{id}-#{last_followed_at.rfc3339}", expires_in: 1.hour) do 
  followers.count
end

The real advantage here is that Rails has this handy fetch method which will default look for a key, if it is there return it, if it is not it will set it. This means we can do ALL the work we need to with this key in this single fetch block rather than having to set up a set AND del command.