Debugging Flaky Tests in CI

#rails #testing #ci

Flaky Tests

Debugging CI test failures can be frustrating. Sometimes it's impossible to recreate the failure conditions locally. We recently had an issue with some our tests flaking in CI at my job. The experience of debugging it made me want to write a small post on the process. I hope it can help others encountering these types of problems 💪

Background

At my job, we use Minitest as our testing framework for our Ruby on Rails application. We have simple GitHub actions for running continuous integration (CI). We noticed a few of the tests were occasionally failing in the CI run for Minitest. The frequency of the failures seemed to be roughly 2/3 pass to fail. Every failing test had a few things in common. They were all tests for our API controllers, they were all for a #create action, and they were all following this general pattern:

test "create new resource" do
  assert_difference('Resource.count') do
    post(
      'route',
      params: {
        resource: { **attrs },
        access_token: access_token
      },
      as: :json
    )
  end
end

The goal here is that when a user POSTs to that route with valid attributes and access token, a new resource is persisted to the test database. The assert_difference('Resource.count') checks this by looking for if the count increases by 1.

The failure message was a simple "Resource.count" didn't change by 1.

Process

The main thing I try to do when debugging is recreate the situation that produces the bug. If I can guarantee a situation that produces the bug consistently, I can cut out the noise and focus in on what's going wrong. Sometimes this is straightforward, but with problems in CI, that's often not the case.

Idea 1: Pull the seed number of the failing CI test

Testing frameworks usually offer a mechanism for running the tests in a random order, and Minitest is no exception. This is useful because it's important that tests are free of side effects. If TestA only passes when run after TestB -- this is a problem, and TestA isn't testing what it claims to be testing. The randomized runs help check for these types of situations. Alongside a randomized run, Minitest will output a "seed" identifier for that test run. This allows developers to run the tests in that specific order again if they want to.

Our first idea was to investigate the log output for the failing GitHub Action. These logs will show the seed value used in the test run. If we pass that seed identifier to Minitest locally, will the tests fail in our dev environment?

They continued to pass in development ... On to the next idea!

Idea 2: Force the GitHub action to run with a failing seed every time

So we've confirmed the bug only exists in the CI environment. My coworker had the idea to take a failing seed number and add it to the Minitest command in the workflow for our GitHub testing action:
run: bundle exec rails test --seed=${FAILING_SEED_NUMBER}
Then we could examine if the test fails in CI every time rather than only occasionally. After a few runs, we felt confident that the test was consistently failing in CI. It's inconvenient weren't able to recreate the problem locally. But at least we had a consistent environment for the failures. Now we can begin examining the failing cases.

Print Debugging

Unfortunately we can't hook up a debug session for the tests running in CI. So we had to return to our old friend: print debugging. We spammed a few p and puts statements in our failing test cases. What were the values of different local variables? What was 'Resource.count' returning? What were the instance variables configured in our setup method returning? And finally, what did response look like?

We quickly found out that this wasn't failing because of a mismatch in the output and assertion. Instead it was failing because of an error raised during the test:

ActiveRecord::RecordNotUnique (duplicate key value violates unique constraint). Key already exists.

After examining the code, we became suspicious of a FactoryBot method in our test's #setup:

def setup
  # access_token and user configuration
  @resource = create(:resource, attr1: "attr1", attr2: "attr2", id: '1')
end

An ID was explicitly assigned in the factory. This id was being used for assertions in a different test cases for the controller. After a little research, it wasn't too difficult to find examples of this kind of thing being a problem (see: here). By directly assigning an ID, the natural incrementing of primary keys can be thrown off. So we decided to remove that id argument and change the assertions that relied on the id to look for @resource.id instead. We made these changes for each test case that was using this pattern.

Then we pushed the code up with the CI test command still set to run the failing seed, and it passed! We then went back to running the GitHub action without a specified seed so it would return to running in a random order. After a several successful runs in a row, we felt confident that we'd fixed the problem!

Conclusion

I've run into these types of CI issues a few times now through my job and in open source projects. It can be a difficult when you can't recreate the failures in development. But a little patience and an organized debugging process can make quick work of the problem ✅