DEV Community

Cover image for Cache miss : Escaping the Storm
Kumar Swapnil
Kumar Swapnil

Posted on

Cache miss : Escaping the Storm

In pursuit of speeding up database driven applications at scale by alleviating the database load, we use in memory caching tools like Memcached and Redis.

During peak hours when there is heavy load on the Server, if a cache expires whose computation is expensive (or if we version Cache so that new computation would be required for each of the new keys) and there are significant number of Concurrent clients, all the clients will start slamming the database to compute the Cache value. This will put an unnecessary load on the Database and the latency would suffer causing a spike every time it happens (and may cause Congestion collapse in worst case scenario). This is known as Cache Stampede, or Cache Miss Storm.
To prevent multiple simultaneous computations and avoid Cache stampede several of the well known approaches includes Probabilistic Early Expiration, External re-computation and locking.

In locking mechanism, when a Cache miss is encountered, it will set a global lock to recompute the Cache value. Once the value is computed, the lock is released. The threads which lost the race to acquire the lock are left with the following array of choices (but not limited to):

  • short sleep, then re-fetch.
  • show the user a page without this expensive content
  • Keep a stale item in the cache to be used while the new value is recomputed
  • Wait until the value is recomputed

The last point is more efficient as here we are not speculating on when the recomputation will finish(sleep based solution) or the solution does not deal with multiple moving parts (stale item and its handling). This approach is easy but a bit tricky to implement. Let’s see how can we achieve it:

private static Dictionary<string, Task> _pendingTasks;

private async Task<T> Get<T>(string key, TimeSpan cacheDuration, Func<Task<T>> dbCallback)
{
    var cacheObject = await _client.GetAsync<Object>(key);
    if (cacheObject != null)
        return (T)cacheObject;
    else
    {
        if(AcquireLock(key))
        {
            var task = CreateMemcacheObjectAsync(key, dbCallback, cacheDuration);
            _pendingTasks.TryAdd(key, task);
            var res = await task;
            _pendingTasks.Remove(key);
            return res;
        }
        else
        {
            await _pendingTasks[key];
            return (T)_client.GetAsync<Object>(key);
        }

    }
}

In the above implementation, when there is Cache miss and the thread lost the race to recompute the Cache value, it will pull out the already enqueued task from the Dictionary which is computing the Cache value and then wait on that task instead of doing a redundant computation. Once, its computed, we simply return the Cached value.

However, If it acquires the lock, then it simply proceeds with the computation using createMemcahedObject(). This way, we end up calling database only once per Cache miss.
To keep a track of pending tasks which are performing computation, I have simply used a Dictionary. The AcquireLock() function looks something like below:

private bool AcquireLock(string key)
{
    return _client.Store(StoreMode.Add, key + "_lock", "lock", TimeSpan.FromMinutes(1));
}

The above method describes the use of Memcached’s add feature to acquire a “ghetto lock”. add(key, value) operation succeeds only if the cache wasn’t already holding a value for the key and so, only one process will acquire this lock with a lease time of one minute. One minute should be more than the estimated maximum time for the lengthy operation. This avoids the lock being held forever if ever things go really bad such as your server crashing. And once the Computation is done, we can simply release this lock.

That’s it. With this small change we can avoid a Cascading failure that may arise when the server is pushed to limits and keep the latency consistent across varying loads.

Top comments (0)