GC compaction: Behold your hash keys!

or: The release of trailblazer-activity-0.16.4

Apparently, recent versions of Ruby (>= 3.2) allow for a new feature to save memory in production: garbage collection compaction. According to the bug reports of a bunch of Trailblazer power users, calling the compaction manually - after the application has loaded - is now a thing in the Rails world.

# app has eager-loaded all dependencies.
GC.verify_compaction_references(expand_heap: true, toward: :empty)

Basically, the fragmentation of the stack is minimized after invoking the compaction, objects and their pointers are moved around, and memory is freed.

While this sounds like a great thing to do, an actual bug in Ruby lead to runtime errors coming in from applications using this pattern and Trailblazer.

NoMethodError: undefined method `[]' for nil
      # trailblazer-activity/lib/trailblazer/activity/circuit.rb:80:in `next_for'

Now, at first glance, this looks like a problem in Trailblazer's heart, the activity gem, which implements the runtime object for an operation. Every time you're running a Trailblazer::Operation, the internal activity will execute a particular step and check for the next step to be invoked.

Hash keys do matter

A "step" in an activity can literally be any callable object. In Trailblazer core code, we often use the pattern of method objects to implement step logic.

class MyOperation < Trailblazer::Operation
  step task: Validation.method(:extract_params)
  step task: Validation.method(:validate)

Deep inside the operation's activity (well, it's actually not really deep), a hash is created that looks roughly as follows.

circuit = {
  #<Method: Validation.extract_params> => {
    Right: #<Method: Validation.validate>,
    Left: ...
  },
  #<Method: Validation.validate> => {...}
}

You can see, the gist of Trailblazer is actually a hash of steps pointing to possible outcomes and the "next" step for each outcome. Super simple stuff!

The problem we now faced was that we use Method instances as hash keys. This worked fine until people started using GC compaction, because the compaction erroneously changed those method hash keys, they were now pointing to ...nothing, crashing the running operation with a NoMethodError.

def a; end
hsh = {method(:a) => 1}

hsh[method(:a)]
 => 1 

GC.verify_compaction_references(expand_heap: true, toward: :empty)

hsh[method(:a)]
 => nil

Keep in mind that this problem only arouse when people were deploying the compaction "trick". I personally didn't even know about this new GC feature.

It's all your fault!

As if that runtime error wasn't enough, bringing us several dramatic bug reports, people now started challenging our code design, asking "why are you using a method instance as a hash key, that's not good style!" or something along that.

However, it turned out that this is really a bug in Ruby and will be fixed in Ruby 3.2.7, 3.3.7 and 3.4.0. While we're at it, I'd love to thank the Ruby core team for their swift responses and their efficiency at fixing this. It took Peter Zhu only a few hours, I wish I was as motivated as this gentleman.

Quick fix

If you happen to be stuck with a Ruby version that's not fixing the compaction problem, you can update to trailblazer-activity-0.16.4 and include our temporary fix.

# For Ruby <3.2.7, <3.3.7, <3.4.0 
# initializers/trailblazer.rb
require "trailblazer/activity/circuit/ruby_with_unfixed_compaction"
Trailblazer::Activity::Circuit.prepend(Trailblazer::Activity::Circuit::RubyWithUnfixedCompaction)