With most web applications you can drastically increase performance by using caching for data that’s frequently read across network boundaries. This lesson will explore some common caching techniques, you’ll learn how some common tools and libraries provide caching for us.
While caching helps with performance it can also cause some surprises and bugs in applications and i’ll discuss some of those too.
Database course index
This is part of a full course on persistence in postgres with typeorm and sql!
There is a github repo to go with this course. See part 2 for instructions.
Caching
If you look at data flow in an app you can see how latency is reduced when a cache is used. The example below shows how a read might be cached.
Caching is used outside of networked applications also. Computer processors make heavy use of caching for example. Its orders of magnitude faster to read from RAM Vs a hard drive, same improvement reading from a processor cache Vs computer RAM.
Caching will usually make the cost of your application infrastructure cheaper. You can keep most data on a cheap storage system and only put current data on a faster storage system.
Caching helps to smooth out unpredictable access patterns. The effect of even very short caching can be significant.
If you have something that is read heavy but changes often and you cache it for just one second, you can guarantee that any downstream infrastructure can only get 60 requests/minute. Any hardware these days could easily handle 60rpm. Most people that use the system wont notice a delay less than 1 second for new data.
In the diagram above there are 500k r/pm to some infrastructure without a cache. This would be pretty tough for a decent RDBMS to handle but Redis could handle that cheaply. In the second diagram you can see the effect of putting a short cache in front of the RDBMS - RDBMS requests are flat and predictable.
Common caching scenarios
Here are some tools you probably use every day that use caching
DNS
The domain name system that runs the internet is based on local caches. Records propagate around the internet with Time to Live set. Your browser first asks the cache in your computer how to find google.com, if that record is stale then the browser goes to your ISP for the DNS record. If your ISP doesn’t have a record then the browser goes to one of the root nameservers for the internet.
Having a local cache for common DNS records like google.com results in much faster browser experience.
React Query
The react query library is an excellent data retrieval library for client applications. React query has built in read-through caching. Each request your application makes is stored in a local cache. React query will reuse the data in the local cache to quickly display data to the user. Then it will refresh the data in the background to reduce the appearance of spinners.
Apps with data stores
Most applications store data somewhere, often a database. Most applications are read heavy so we can increase performance and reduce the cost of the data store by utilising a cache between the application and the data store.
Common caching patterns
There are a bunch of very common caching patterns. These three are the main ones I see when building web applications.
Cache aside
Cache aside is very common. The application is aware of the cache and acts as the coordinator.
Assuming a cache miss scenario:
- The application checks the cache for data.
- Cache returns empty response
- Application gets the data from the database
- Application stores the data for the next cache retrieval
Read through caching and Write behind caching
In read through and write behind caching the application only interacts with the cache. The cache layer then communicates with the database layer directly. The application is never aware of the database layer.
Assuming a cache read miss:
- The application queries the cache
- The cache has no data so it queries the datastore
The cache stores the result and returns it to the client application
Assuming a cache write
- The application writes data to the cache
- The cache writes the data to the database
These two patterns are not always used together. e.g. you might always read through the cache but write directly to the database. It’s just easier to talk about them in one section because the principal is so similar.
Slonik Postgres client can provide an example of read through caching. It allows you to configure interceptors. The interceptor are middleware that run on your query pipeline.
You can then apply hints like @cache-ttl 10
to the interceptors in your sql query. See the documentation at https://github.com/stockholmux/slonik-redis-cache
const
import { sql, createPool } from 'slonik'
import redis from 'redis'
import redisCache from 'slonik-redis-cache'
const client = redis.createClient({
/* connection options */
})
const pool = createPool('postgres://localhost:5432/applicationDb', {
interceptors: [redisCache(client)],
})
pool.connect(async (connection) => {
let results = await connection.query(
sql`SELECT * FROM aTable -- @cache-ttl 10`
)
console.log(results)
})
An example of write through pattern can be described using Redis Enterprise with Redis Gears (https://docs.redis.com/latest/modules/redisgears/).
Redis Gears allows you to set up Redis to run Python and Java with hooks into the Redis runtime. The hook for Write Through Caching will propagate Redis writes to your datastore.
Cache Invalidation
If your datastore has data that isn’t in the cache yet then you have a stale cache. Whether this is an issue is highly dependent on the data and your application. At some stage you will want to invalidate this data so that the cached record gets replaced with the new record.
There are excellent articles online describing how to solve this issue so i'll just give a brief overview here. Check out https://redis.com/blog/three-ways-to-maintain-cache-consistency/ for a deep dive into cache consistency and cache invalidation.
The following are the common methods used to invalidate cached data. One important thing to note is that if you use write-through then your cache should always be up to date.
Time to live
Time to live invalidation is extremely common. You ask the cache to automatically expire an item after a set time period, meaning your data layer will have to go to the source to refresh the item after some set time period.
Cache on write
For data with a high read rate and that changes infrequently, you can simply update the cache each time you write to your datastore. This invalidation strategy becomes expensive as write frequency increases.
Cache Eviction
Least recently used is an eviction method. This focuses on freeing resources, rather than keeping data fresh, by deleting from the cache data that isn’t getting read very often.
There are other common eviction mechanisms
- Most Recently Used
- Least Recently User
- First in, First out
- Last in, first out
Their usage will be highly dependent on your application.
Issues to watch our for with caching
Caching is notorious in software development for causing difficult to debug issues. For example if you're debugging an issue and the data doesn't make sense, check for caching somewhere. Caching is a form of state in these scenarios.
These are some of the things I look for when caching is present in a system.
Inappropriate load for caching
If your database is not under significant load adding a database can have a negative effect because the latency between app -> Cache -> DB
can be more than a more traditional app -> DB
not under load
Caching too eagerly
If you cache many things that are not read very often then you will fill up your cache. This will make it more expensive to run and having so many records will slow down the cache.
Incorrect configuration
If you cache data for too long or you don’t clear the cached data at the right time then you might return stale data to a customer.
Conclusion
You’ll come across caching on most apps. Tweaking the caching strategies and invalidation strategies that are used will help you to improve user experience and cost of your infrastructure.
Caching introduces complexity that has to be managed carefully.
Top comments (0)