Faster module tests with Facter 4 and rspec-puppet

#puppet #ruby #rspec #testing

The latest major version of Facter—Puppet's tool for collecting system information—has been out for some time now. However, we've been hard at work fixing bugs ever since. The fact that Facter has to be able to run on a variety of operating systems and architectures makes maintenance quite a challenging ordeal.

Returning to using Ruby as the language of choice (Facter 3 was written in C++) meant that we'd have a lot more freedom in structuring and writing the code, mostly at the expense of the added run time. This isn't an issue for how most people use Facter, for example Puppet loads all available facts at once when it runs, so a few seconds of added run time won't make a difference with Puppet runs that already take minutes. This was something we took into account when making the decision to return to Ruby with Facter 4 (initially Facter was written in Ruby, but was then rewritten in C++ for improved performance). What we didn't take into account were the intricacies of how Facter interacts with other downstream projects that we weren't aware of, such as the rspec-puppet test framework.

What is `rspec-puppet`?

Well, RSpec is a well-known test framework for Ruby, and Puppet is... well, you probably already know or you wouldn't be here reading these words.

rspec-puppet is the tool of choice when writing unit tests for Puppet modules. It provides a helpful syntax for interacting with Puppet catalogs in an RSpec way. Since it's unfeasible to acceptance test Puppet functionality on dozens of operating systems and versions, rspec-puppet circumvents this by making Facter trick Puppet into thinking it runs on different OS configurations.

Puppet gets almost all of its system-related information using facts, so for example if you're running some Linux and you're curious to see how your Puppet manifest would behave on macOS, in most cases it's enough to feed Facter a bunch of macOS facts. Of course, the underlying implementation is more complicated than that, especially for Windows, but in a nutshell this is how rspec-puppet works.

For each test, rspec-puppet stubs the fake facts using the custom facts API. The fake facts it gets from facterdb, which is a gem that contains "dummy" facts for a variety of operating systems and Facter versions—basically lots of files containing facter --json output. The resulting information is then fed to Puppet for catalog compilation.

Getting slower and slower...

Over time, people have been noticing that module tests running with Facter 4 were much slower than Facter 2. Module tests have skipped Facter 3 altogether because even though Facter 3 ships a compatible Ruby module, in the end it's C++ code that's extremely incompatible with the Ruby ecosystem (you can't gem install it like you would any other gem).

After some investigation it turned out that Facter 4 was evaluating underlying core facts even though they had been overridden by custom facts. The first instinct would be to classify this as a bug, but a closer look showed us that this functionality was intended in order to be fully-compatible with Facter 3. And because modules never ran tests with Facter 3 this wasn't a problem until now.

Overriding a core fact, similar to how rspec-puppet does:

Facter.add(:ipaddress, weight: 999) do
  setcode { '1.1.1.1' }
end

will return the correct value (1.1.1.1) but it will cause Facter to also load the core ipaddress fact, ultimately resolving all networking facts, which means executing system commands.

Assuming our custom ipaddress fact is defined inside the custom_facts directory, here's what gets called when we attempt to resolve the fact:

$ FACTERLIB=$PWD/custom_facts strace -f -eexecve facter ipaddress
[pid 1777063] execve("/usr/bin/ip", ["ip", "-o", "link", "show"], 0x563bf6b7acf0 /* 110 vars */) = 0
[pid 1777067] execve("/usr/bin/ip", ["ip", "link", "show"], 0x563bf64eafa0 /* 110 vars */) = 0
[pid 1777068] execve("/usr/bin/dhcpcd", ["/usr/bin/dhcpcd", "-U", "lo"], 0x563bf6aa57b0 /* 110 vars */) = 0
# ... dhcpcd is called for every network interface on the system
# I removed the other calls for brevity

[pid 1777078] execve("/usr/bin/ip", ["ip", "route", "show"], 0x563bf6aba230 /* 110 vars */) = 0
[pid 1777079] execve("/usr/bin/ip", ["ip", "-6", "route", "show"], 0x563bf6ae37d0 /* 110 vars */) = 0

Doing this hundreds or even thousands of times per test suite definitely adds to the increase in run time, and it's redundant since the resolved facts are never needed.

Fixing things without breaking more things

Modifying Facter's behavior—undocumented as it was—was a no-go from the start, as we found out before (Facter is a good example of Hyrum's law in action). So, we had to think of other ways to improve performance.

We started by decoupling Puppet from Facter as much as we could, introducing the possibility of having multiple Facter backends. While Puppet would use the default Facter implementation when running on its own, external users would be able to define and pass their own Facter implementation when initializing Puppet, similar to how puppetserver configures Puppet to use its JRuby-compliant HTTP client.

To avoid breaking the Facter API, we ended up implementing an overcomplicated way of interacting with a hash. Using our dumb Facter backend, custom facts were now simply added to a hash, and querying them would just produce them from the hash if available:

class FacterTestImpl
  def initialize
    @facts = {}
  end

  def value(fact_name)
    @facts[fact_name.to_s]
  end

  def add(name, options = {}, &block)
    raise 'Facter.add expects a block' unless block_given?
    @facts[name.to_s] = instance_eval(&block)
  end
  ...
end

With our custom implementation we bypassed Facter altogether. This managed to bring us back to Facter 2 speeds, which behaved similarly by just returning the custom fact's value without resolving any additional facts.

Of course, there may be downsides to this approach, as Facter code paths will no longer be executed by rspec-puppet. In the past there have been occasions where we merged Facter work that passed our CI, but ended up failing in module tests, so switching to this implementation will get rid of this level of testing. I'd argue that it wasn't a module's business to validate Facter itself, but it was a good safety net for us as maintainers.

And because performance improvements mean nothing without showing the numbers, here's how test times have changed for the puppet-nginx module:

Running rake parallel_spec on the module using Puppet 7 / Facter 4 took around 47 minutes with the original rspec-puppet implementation.

We managed to shave off around 11 minutes off the test run by using the custom Facter implementation in rspec-puppet.

One thing I haven't mentioned is that running the same tests with Puppet 6 takes a total of 25 minutes, so there's more to improve in Puppet itself as well. However, from a Facter standpoint it's impossible to make the tests any faster, unless Ruby itself improves hash access speed 😜.

This new functionality is opt-in and configurable by setting the facter_implementation RSpec option in your spec_helper.rb.

RSpec.configure do |config|
  config.facter_implementation = :rspec
end

It was first made available in rspec-puppet 2.11.0, with an additional bugfix that was released in rspec-puppet 2.11.1.

The investigation and work surrounding this improvement has spanned many months and was the product of multiple Puppet employees and community members, namely Gimmy, Josh Cooper, Ewoud Kohl van Wijngaarden and Tim Meusel. Thanks to everyone who contributed!