DEV Community

Phil
Phil

Posted on

How to deal with flaky tests

The Headache

For anyone writing automated tests, flaky tests have always been the bane of their existence. You're writing and executing code, and ideally, you hope that the result of that execution is that you reach a deterministic result. Perhaps you run a test 10 times and it fails once. Now you're spending time investigating the failure, perhaps re-running the test manually only to find it succeed, and scratching your head as to why a test that seemed so reliable when you first developed it all of a sudden is causing you to sink time in false negatives.

Why is this happening to me?!

No matter the tooling, whether it be WebDriver, Playwright, Cypress, or whatever the next fancy savior of web automation might be, flaky tests happen for a number of reasons.

  • No app is perfect. Sometimes, a page request happens and the page simply doesn't render. All of a sudden, you have a case that cannot be reproduced for proper bug fixing.
  • Legitimate bugs. Sometimes a test that fails 20% of the time really means that there is a critical issue that requires attention for troubleshooting and bug fixing.
  • Generally, tests are and can be written quickly. With speed comes some sacrifice in properly building out the steps in your tests. In my experience, this is the largest contributor to flakiness.

What can I do in the short term?

Flaky tests can sometimes be unavoidable, but with enough time and care, can be mitigated and eventually eliminated.

  • Retrying tests is sometimes just a necessary evil. I have been using this old gem in rspec-repeat for a long while, which is a simpler version and fork of rspec-retry. The idea is that you give it the good ol' "3 strikes and you're out". Certainly, any test that is retried 3 times and fails is worthy of investigation. Even tools like Playwright and Cypress have come around to the idea of test retries. The reality is that sometimes it's not only unavoidable, it's a solution that helps accelerate test feedback.

  • Investigating what I call "1st try failures" is a worthy time investment. A test that fails the first time (and maybe even the second time), and then passes on the third try should still be categorized as a flaky test. Passing on that third try simply means you're removing some noise from your test results. Investigating these 1st attempt failures nearly always results in (1) Legitimate bugs being raised (2) Improvements in test automation after troubleshooting failures.

  • Proper logging and test recording is crucial to investigating "1st try failures" and more often than not, reviewing these playbacks reveal some surprising app behavior that would not be noticed otherwise. I've been using AWS Device Farm that supports recording.

Are your tests just... bad?

The above tips are just a stop gap to overcoming flaky tests in the short term. The hard truth is that the root cause of many flaky tests are due to them being poorly written. Here are some thoughts and common issues that I've experienced that I hope to expand upon in the future.

  • No tooling in existence is going to know the app better than you. There is no magic test framework that is going to automatically understand how your app is meant to behave.
  • Test execution is designed to go as fast as possible. It is up to you to implement the proper speed bumps and traffic signals to increase the probability of your test running deterministically.
  • While designing your tests, ask yourself whether you are "asserting" enough. Apps are commonly designed to provide the user with feedback. Checks should be included for these. Assert on those messages that come up before proceeding. Assert on that spinny/throbber thing. Assert on that element/button/text that loads last on the page. Assert on any important changes that happen to the DOM before moving forward. These are the type of "speed bumps" needed to improve reliability.
  • Keep your tests small in scope and responsibility. Keep your tests atomic.
  • Review some of the tests that aren't flaky! What are those doing that maybe the flaky ones aren't? Perhaps the more robust tests are using useful helpers or better patterns.

Top comments (0)