Hi, very nice article with real background knowledge. But I think you missed the biggest advantage of the (self-written) in-memory cache: real hard performance. There are no blocking I/O connections, networking syscalls etc.
I've also developed such a system (was never real productive) where I implemented an in-memory cache. I think in such an example, you pointed out, like just fetching data from a database there is no real need to have an external caching service deployed. By optimizing and adding more memory to the MySQL instance you may also achieve near the performance like with redis (with less complexity). You just have to make sure that the cache (query and results) inside MySQL is big enough to fit as much pages in it. You would not have to deal with TTL, cache hit/miss and so on. Furthermore you would also have the single instance of thruth.
Of course you are welcomed to read about the results I managed to achive with my in-memory cache (also in Go). I've also conducted much performance evaluations.
Shortcut is an URL Shortener written in Go. It's fully featured with a JSON API and an in-memory cache.
Benchmark
For an URL Shortener it's very important to satisfy as most request per seconds as possible
Therefore, a benchmark is conducted
I am measuring the speed of shortcut with wrk 4.1.0 in a VMWare Workstation 15 virtual machine
In order to find out what slows down the application, various approaches were pursued.
Host
Virtual Machine
OS
Windows 10 (1803)
Fedora 29 (Linux 4.20)
CPU
Intel i7-7500 Dual Core
Intel i7-7500 Dual Core
Memory
16GB
4GB
Go version
go 1.12 linux/amd64
Database
MySQL 8.0.15 Community Server
Original
The original application writes every request to the database (logging table). Furthermore, any request is logged on stdout.
This code uses UUID as primary keys.
$ wrk -c 256 -d 5s -t 48 http://localhost:9999/asdf
Running 5s test @ http://localhost:9999/asdf
48 threads and 256
I think in such an example, you pointed out, like just fetching data from a database there is no real need to have an external caching service deployed.
Yeah, I might put a very simple example that looks overkill to implement cache-aside pattern for such problem space, you're right. But I did this for the sake of simplicity. All comes back again to the real use case.
By optimizing and adding more memory to the MySQL instance you may also achieve near the performance like with redis (with less complexity). You just have to make sure that the cache (query and results) inside MySQL is big enough to fit as much pages in it. You would not have to deal with TTL, cache hit/miss and so on. Furthermore you would also have the single instance of thruth.
By tuning up MySQL, it should be able to withstand more traffic. But I think there are also trade-offs, for instance we need to aware the maximum numbers of database connections, and need more bucks for better machine specs.
I believe there's no silver bullet to handle this problem. There are some benefits when we separate the read model for this, for example if there's a heavy locking on MySQL during writes, the customer-facing app will still serves quickly on a warm cache situation, versus no cache at all.
Of course you are welcomed to read about the results I managed to achive with my in-memory cache (also in Go). I've also conducted much performance evaluations.
This is interesting, I might miss on finding on how you handle cache invalidation when there are some data being updated (not inserted) into the database during heavy reads. Will the app serve stale data for a longer period of time until no one is accessing the endpoint?
This is interesting, I might miss on finding on how you handle cache invalidation when there are some data being updated (not inserted) into the database during heavy reads.
Thanks for the reminder, I maybe missed this out (this piece of software is already two years old). But the idea of course was to update the cache first and afterwards the database.
Will the app serve stale data for a longer period of time until no one is accessing the endpoint?
I do not fully understand you question, but I've created a manager goroutine which cleans up the cache based on the configuration here: github.com/davidkroell/shortcut/bl...
When a particular data it's still being accessed by many users in specific time range, that specific data will still be in the cache, although there's an update data request coming up from another endpoint. Thus new users is getting stale data (old data prior update).
Eventually, after no one is accessing the data for a period of time, the data will be deleted by the cache manager. CMIIW.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Hi, very nice article with real background knowledge. But I think you missed the biggest advantage of the (self-written) in-memory cache: real hard performance. There are no blocking I/O connections, networking syscalls etc.
I've also developed such a system (was never real productive) where I implemented an in-memory cache. I think in such an example, you pointed out, like just fetching data from a database there is no real need to have an external caching service deployed. By optimizing and adding more memory to the MySQL instance you may also achieve near the performance like with redis (with less complexity). You just have to make sure that the cache (query and results) inside MySQL is big enough to fit as much pages in it. You would not have to deal with TTL, cache hit/miss and so on. Furthermore you would also have the single instance of thruth.
Of course you are welcomed to read about the results I managed to achive with my in-memory cache (also in Go). I've also conducted much performance evaluations.
davidkroell / shortcut
URL Shortner in Go
shortcut
Shortcut is an URL Shortener written in Go. It's fully featured with a JSON API and an in-memory cache.
Benchmark
For an URL Shortener it's very important to satisfy as most request per seconds as possible Therefore, a benchmark is conducted I am measuring the speed of shortcut with
wrk 4.1.0
in a VMWare Workstation 15 virtual machine In order to find out what slows down the application, various approaches were pursued.Original
The original application writes every request to the database (logging table). Furthermore, any request is logged on stdout. This code uses UUID as primary keys.
$ wrk -c 256 -d 5s -t 48 http://localhost:9999/asdf Running 5s test @ http://localhost:9999/asdf 48 threads and 256
…Hi, thanks for the feedback.
Yeah, I might put a very simple example that looks overkill to implement cache-aside pattern for such problem space, you're right. But I did this for the sake of simplicity. All comes back again to the real use case.
By tuning up MySQL, it should be able to withstand more traffic. But I think there are also trade-offs, for instance we need to aware the maximum numbers of database connections, and need more bucks for better machine specs.
I believe there's no silver bullet to handle this problem. There are some benefits when we separate the read model for this, for example if there's a heavy locking on MySQL during writes, the customer-facing app will still serves quickly on a warm cache situation, versus no cache at all.
This is interesting, I might miss on finding on how you handle cache invalidation when there are some data being updated (not inserted) into the database during heavy reads. Will the app serve stale data for a longer period of time until no one is accessing the endpoint?
Thanks for the reminder, I maybe missed this out (this piece of software is already two years old). But the idea of course was to update the cache first and afterwards the database.
I do not fully understand you question, but I've created a manager goroutine which cleans up the cache based on the configuration here:
github.com/davidkroell/shortcut/bl...
When a particular data it's still being accessed by many users in specific time range, that specific data will still be in the cache, although there's an update data request coming up from another endpoint. Thus new users is getting stale data (old data prior update).
Eventually, after no one is accessing the data for a period of time, the data will be deleted by the cache manager. CMIIW.