Discussion on: Performance Comparison, Rust vs Crystal with Redis

View post

Replies for: Having done most of the optimization of the rust redis library I can say that this use case of sending an enormous pipeline of commands is not som...

Having done most of the optimization of the rust redis library I can say that this use case of sending an enormous pipeline of commands is not something I have optimized for.

Awesome, great to meet you! 🙂

Honestly, I only optimized the Crystal client for heap allocations — I tried to avoid them whenever feasible.

Still it would be interesting to see how you achieve these impressive results (incase there is something to steal ;) ). But the post neither contains the crystal, redis library nor the benchmark setup itself :( .

Because redis::pipe() doesn't take the connection as an argument, it looks like it's acting as a buffer and sending all the pipelined commands afterward with the query method. The convention in Crystal is instead to use I/O streams directly so we don't have to realloc a buffer every time we need to expand it. Instead, the stream has a static buffer. And then after the block is complete, I flush the buffer one more time before reading the results back off the socket.

I just published the code on GitHub so you can have a look. The pipeline implementation is here — you can see that it just wraps a connection and overrides run (which I believe is the equivalent of cmd in the Rust client).

The benchmark setup is in the code within the article. I tried using Cargo's bench command but it told me it was going to take 6800 seconds to complete all of its iterations, so I was like "uhh, nope" and just measured the time it took to run once instead. 😂 I also (elsewhere in the comments) looped over it to get more than a single sample on a warmed-up connection. It reduced the impact of latency even more on both clients since only the first run had to deal with TCP handshake and slow start.

If Redis pipelines in the Rust client aren't optimized, I'd be happy to try something that is. I really only used it because benchmarking anything with I/O is that latency even to localhost takes the vast majority of the time, so a benchmark has to run for several minutes to get a meaningful amount of CPU time to compare, especially since the UNIX time command only has 10-millisecond granularity at the CPU.

Does Rust have anything that measures CPU time internally using something like getrusage for fine-grained measurements?

Markus Westerlind • Jun 27 '20

Because redis::pipe() doesn't take the connection as an argument, it looks like it's acting as a buffer and sending all the pipelined commands afterward with the query method. The convention in Crystal is instead to use I/O streams directly so we don't have to realloc a buffer every time we need to expand it. Instead, the stream has a static buffer. And then after the block is complete, I flush the buffer one more time before reading the results back off the socket.

I figured as much! redis-rs could do that as well, at least in the synchronous API. The async API can't however since it may receive concurrent requests and it must make sure that each request is written in its entirety without interleaving.

Since I only use the async implementation I have to accept the buffering in pipe or cmd (at least I haven't come up with a way to skip the allocations for the buffer) so changing the API for the synchronous implementation isn't on my radar.

I just published the code on GitHub so you can have a look. The pipeline implementation is here — you can see that it just wraps a connection and overrides run (which I believe is the equivalent of cmd in the Rust client).

Thanks! Another thing that helps crystal here is that since commands are written immediately the redis server will start processing the commands immediately which gives a much better end to end timing. The async implementation is capable of the same thing by simply issuing individual commands, however it naturally has more overhead as each request and response is passed through a channel (which allows requests to be done concurrently from multiple threads).

If Redis pipelines in the Rust client aren't optimized, I'd be happy to try something that is. I really only used it because benchmarking anything with I/O is that latency even to localhost takes the vast majority of the time, so a benchmark has to run for several minutes to get a meaningful amount of CPU time to compare, especially since the UNIX time command only has 10-millisecond granularity at the CPU.

For raw throughput the pipeline as uses is still the best way in redis-rs, it just isn't something I have optimized for since, as you say, IO is such a huge overhead (and more so for smaller pipelines).

Does Rust have anything that measures CPU time internally using something like getrusage for fine-grained measurements?

Not really, though you can of course call any C library (might be rust bindings already I guess). I usually don't look at CPU time, just use github.com/bheisler/criterion.rs to get good timings for comparison and perf + github.com/KDAB/hotspot/releases to track down where that CPU time goes to.

Jamie Gaskins • Jun 28 '20

The async API can't however since it may receive concurrent requests and it must make sure that each request is written in its entirety without interleaving.

Ah, okay. I'd been seeing a bunch of stuff on Twitter about how Rust has been favoring async I/O and I saw what looks like some Python-style aio in redis-rs. So all this makes a whole lot more sense to me now. Thank you for clarifying some of this stuff!

I'm gonna keep checking some other libraries in Rust and Go so I can get a better picture of the performance landscape among the 3 languages. This was just one chapter of that story and I really appreciate you being a part of it. 🙂