DEV Community

loading...

Discussion on: Incident Retro: Failing Comment Creation + Erroneous Push Notifications

Collapse
jgaskins profile image
Jamie Gaskins

I like that you enjoy optimizing things and golfing code down to fit in tweets. 🙂 However, like many optimizations, this one is more complex than simple latency. KEYS would indeed have had lower latency, but it locks the Redis event loop for the duration, so we avoid that command wherever possible. To put it into perspective, your KEYS query took 7 seconds in your benchmark (we had almost 5M keys in Redis, so it would've actually taken longer, but I'll work with your numbers for the moment). In that amount of time we normally execute over 10,000 Redis queries, so running KEYS would introduce a pretty sizable head-of-line-blocking issue. On top of that, every HTTP request that makes it past Fastly to our backend hits this Redis instance at least once, so blocking Redis means blocking all HTTP requests. Since we don't want to do that, Redis needs to be available af. We also don't allow other O(n) calls where n can ever be nontrivial for that reason, so nothing like LRANGE my-list 0 -1, either.

In fairness, it almost certainly doesn't lock the main Redis thread for the full amount of time it takes the Ruby app to receive the entire list of keys (we're using Redis 6, which does I/O in a separate thread), but without any insight into how much of that time is spent in the main Redis event loop, we don't take chances. Since this issue impacted only a tiny fraction of our total traffic, we needed to continue optimizing for availability of Redis over shaving a few seconds off of the latency of the solution.

The code Molly posted was adapted from this worker. I didn't know about Redis#scan_each, though. That completely obviates our manual tracking of the cursor, which I really like.

Collapse
joshcheek profile image
Josh Cheek

I like that you enjoy optimizing things and golfing code down to fit in tweets.

Thanks ^_^

[comment about keys blocking]

I agree with the analysis and conclusions here.

The code Molly posted was adapted from this worker. I didn't know about Redis#scan_each, though. That completely obviates our manual tracking of the cursor, which I really like.

Aye, +1 for scan_each. Also, note that on the last iteration, when the cursor is "0", there can be keys that were returned with the cursor, which the worker won't expire, b/c it won't execute the loop body on that last response.

eg:

$ ruby -r redis -e '
  redis = Redis.new

  p count_from_keys: redis.eval(%(return #redis.call("keys", "rpush:notifications:*")), [0])

  cursor = count = 0
  until (cursor, keys = redis.scan(cursor, match: "rpush:notifications:*")).first.to_i.zero?
    count += keys.size
  end
  p count_from_loop: count, final_key_size: keys.size
'
{:count_from_keys=>1200000}
{:count_from_loop=>1199998, :final_key_size=>2}
Enter fullscreen mode Exit fullscreen mode
Thread Thread
jgaskins profile image
Jamie Gaskins

Ooh, nice catch! I'll go ahead and port that worker over to use scan_each. Thanks Josh!