DEV Community

Amir Keshavarz
Amir Keshavarz

Posted on

Cache Replacement Algorithms: How To Efficiently Manage The Cache Storage

Caching

Let's start at the beginning and talk about what caching even is.

Caching is the process of storing some data near where It's supposed to be used rather than accessing them from an expensive origin, every time a request comes in.

Caches are everywhere. From your CPU to your browser. So there's no doubt that caching is extremely useful. implementing a high-performance cache system comes with its own set of challenges. In this post, we'll focus on cache replacement algorithms.

Cache

from wikipedia.com

Cache Replacement Algorithms

We talked about what caching is and how we can utilize it but there's a dinosaur in the room; Our cache storage is finite. Especially in caching environments where high-performance and expensive storage is used. So in short, we have no choice but to evict some objects and keep others.

Cache replacement algorithms do just that. They decide which objects can stay and which objects should be evicted.

After reviewing some of the most important algorithms we go through some of the challenges that we might encounter.

LRU

The least recently used (LRU) algorithm is one of the most famous cache replacement algorithms and for good reason!

As the name suggests, LRU keeps the least recently used objects at the top and evicts objects that haven't been used in a while if the list reaches the maximum capacity.

So it's simply an ordered list where objects are moved to the top every time they're accessed; pushing other objects down.

LRU is simple and providers a nice cache-hit rate for lots of use-cases.

LRU
from progressivecoder.com

LFU

the least frequently used (LFU) algorithm works similarly to LRU except it keeps track of how many times an object was accessed instead of how recently it was accessed.

Each object has a counter that counts how many times it was accessed. When the list reaches the maximum capacity, objects with the lowest counters are evicted.

LFU has a famous problem. Imagine an object was repeatedly accessed for a short period only. Its counter increases by a magnitude compared to others so it's very hard to evict this object even if it's not accessed for a long time.

FIFO

FIFO (first-in-first-out) is also used as a cache replacement algorithm and behaves exactly as you would expect. Objects are added to the queue and are evicted with the same order. Even though it provides a simple and low-cost method to manage the cache but even the most used objects are eventually evicted when they're old enough.

FIFO

from wikipedia.com

Random Replacement (RR)

This algorithm randomly selects an object when it reaches maximum capacity. It has the benefit of not keeping any reference or history of objects and being very simple to implement at the same time.

This algorithm has been used in ARM processors and the famous Intel i860.

The Problem of One-hit Wonders

Let's talk about a problem that occurs in large-scale caching solutions, one-hit wonders.

One-hit wonders are objects that are rarely or never accessed twice. This happens quite often in CDNs where the number of unique objects is huge and most objects are rarely used again.

This becomes a problem when every bit of storage performance matters to us. By caching these objects we basically pollute our storage with junk since these cache objects are always evicted before they're used again. So we waste a large amount of resources just to persist some objects that we're not going to use.

So what's the solution? Unfortunately, there's no silver bullet here. The most used solution is just not caching an object when it's first accessed!

By keeping a list of object signatures, we can only cache the objects that are seen more than one time. This might seem weird at first but overall it improves your disk performance significantly.

After accepting this solution, we immediately encounter another challenge. In many scenarios, the number of object signatures is extremely large so storing the list itself becomes a challenge. In this case, we can use probabilistic data structures such as the bloom filter data structure.

I've covered probabilistic data structures in a previous post: Rate Limiting in IPv6 Era Using Probabilistic Data Structures

Bloom filter
from wikipedia.com

In short, probabilistic data structures like bloom filter use a lot less memory but in return, they only give a probabilistic answer to our queries. In our case, the bloom filter offers a nice solution since we don't need definite answers to improve our cache performance.

Conclusion

In this post, we went through multiple cache replacement algorithms and understood how we can efficiently manage our cache storage. We also covered the famous problem of one-hit wonders and a solution that helps in most cases.

Links

Discussion (0)