You often hear about how fast languages like Rust and Go are. People port all kinds of things to Rust to make them faster. It's common to hear abou...
For further actions, you may consider blocking this person and/or reporting abuse
Having done most of the optimization of the rust redis library I can say that this use case of sending an enormous pipeline of commands is not something I have optimized for. Most of the time I send a single or just a few commands and it is the roundtrip time that is most important which is of course dominated by IO. However, disregarding that the main overhead isn't in message encoding but rather in the book keeping needed to send concurrent commands on the same connection so optimizing out the remaining overhead of encoding hasn't been a priority.
Still it would be interesting to see how you achieve these impressive results (incase there is something to steal ;) ). But the post neither contains the crystal, redis library nor the benchmark setup itself :( .
Awesome, great to meet you! 🙂
Honestly, I only optimized the Crystal client for heap allocations — I tried to avoid them whenever feasible.
Because
redis::pipe()
doesn't take the connection as an argument, it looks like it's acting as a buffer and sending all the pipelined commands afterward with thequery
method. The convention in Crystal is instead to use I/O streams directly so we don't have to realloc a buffer every time we need to expand it. Instead, the stream has a static buffer. And then after the block is complete, I flush the buffer one more time before reading the results back off the socket.I just published the code on GitHub so you can have a look. The pipeline implementation is here — you can see that it just wraps a connection and overrides
run
(which I believe is the equivalent ofcmd
in the Rust client).The benchmark setup is in the code within the article. I tried using Cargo's
bench
command but it told me it was going to take 6800 seconds to complete all of its iterations, so I was like "uhh, nope" and just measured the time it took to run once instead. 😂 I also (elsewhere in the comments) looped over it to get more than a single sample on a warmed-up connection. It reduced the impact of latency even more on both clients since only the first run had to deal with TCP handshake and slow start.If Redis pipelines in the Rust client aren't optimized, I'd be happy to try something that is. I really only used it because benchmarking anything with I/O is that latency even to
localhost
takes the vast majority of the time, so a benchmark has to run for several minutes to get a meaningful amount of CPU time to compare, especially since the UNIXtime
command only has 10-millisecond granularity at the CPU.Does Rust have anything that measures CPU time internally using something like getrusage for fine-grained measurements?
I figured as much!
redis-rs
could do that as well, at least in the synchronous API. The async API can't however since it may receive concurrent requests and it must make sure that each request is written in its entirety without interleaving.Since I only use the async implementation I have to accept the buffering in
pipe
orcmd
(at least I haven't come up with a way to skip the allocations for the buffer) so changing the API for the synchronous implementation isn't on my radar.Thanks! Another thing that helps crystal here is that since commands are written immediately the redis server will start processing the commands immediately which gives a much better end to end timing. The async implementation is capable of the same thing by simply issuing individual commands, however it naturally has more overhead as each request and response is passed through a channel (which allows requests to be done concurrently from multiple threads).
For raw throughput the pipeline as uses is still the best way in redis-rs, it just isn't something I have optimized for since, as you say, IO is such a huge overhead (and more so for smaller pipelines).
Not really, though you can of course call any C library (might be rust bindings already I guess). I usually don't look at CPU time, just use github.com/bheisler/criterion.rs to get good timings for comparison and
perf
+ github.com/KDAB/hotspot/releases to track down where that CPU time goes to.Ah, okay. I'd been seeing a bunch of stuff on Twitter about how Rust has been favoring async I/O and I saw what looks like some Python-style
aio
inredis-rs
. So all this makes a whole lot more sense to me now. Thank you for clarifying some of this stuff!I'm gonna keep checking some other libraries in Rust and Go so I can get a better picture of the performance landscape among the 3 languages. This was just one chapter of that story and I really appreciate you being a part of it. 🙂
Looking forward to seeing the Crystal client open sourced. Please post again when you get that far!
The peer review can be a big help to the community -- thanks for taking the time to write this up!
Just published! 🙂
That's awesome, thank you!
Doesn’t comparing a heavily optimized Redis client for Crystal with an average one in Rust defeated the purpose of the benchmark?
I love Crystal, but I’m not sure how accurate this test might be
I agree!
The idea that the Rust client, with commits from 63 people and 1500 GitHub stars, has not had any optimizations applied seems a bit presumptuous.
If it helps ease your mind, I ran the benchmark code against the other Crystal Redis client I linked in the article, which is not optimized for heap allocations the way mine is. The only differences in the benchmark code are
s/::Connection//
ands/pipeline/pipelined/
. The code is otherwise identical. Here is the result:Only about 8% slower overall and 42% slower at the CPU (200ms vs 140ms) for the "average" Crystal Redis client. If Rust and Crystal were actually closer in performance, this is the sort of difference in performance I expected. I actually expected Rust to be within ±30%, but I was off by an entire order of magnitude.
I'm really surprised at this, I honestly thought that rust was hard to beat at a performance level, and that a code like crystal, which definetely looks like scripting, couldn't beat rust. I hope you continue with these articles. I think Crystal is a hidden gem, it's not getting (yet) the difussion it deserves.
BTW, it's great to see you interacting with the people that created the rust library and exchanging widom, open source rocks!!!
If Crystal code looks like scripting (heavily inspired by Ruby), it's still a compiled language, so it makes sense that its performances are waaaaaay better than what you can expect from scripting languages :)
Maybe a long running process will show a decrease in Crystal performance due to the garbage collector.
I wrapped the Redis pipeline code to run it 10x in the same process, so it runs 4 million commands in 10 pipelines, but it didn't make any significant changes to how long it took for either app. The results are below, but to summarize: it makes the CPU-time ratio 3.13 CPU seconds for Rust vs 1 CPU second for Crystal, tipping the scales even more in Crystal's favor.
The GC is already running in the benchmark here. My guess is that Crystal actually allocates less memory than Rust for some reason (maybe the Rust client isn't well optimized).
Great comparison. What happens if you enable LTO in rust?
I'm not sure what that is. Is that the same as the
--release
flag?Add:
In Cargo.toml to enable it. It allows more optimizations between crates at the cost of longer compile time. Though it's unlikely to give a 2x improvement.
Sorry, just saw your answer and after I practically typed the same. I agree that it's unlikely to give a 2x speedup.
LTO stands for link-time optimization, which is a great feature of LLVM (thus, rustc). You can enable it in your Cargo.toml:
The above will make
--release
builds use "fat" LTO, meaning all dependencies and the project itself is link-time optimized (you could set it to "thin" which means LTO is only applied to the current crate).Another option to go even further is PGO, but that is a bit more involved and I haven't tried it with rust. Here is some documentation if you are interested: doc.rust-lang.org/rustc/profile-gu...
Combining both can go pretty far in optimizing performance.