What's the Point of Code Coverage?

#engineering #testing #codequality #bestpractices

Code coverage is another one of those topics that tend to divide developers. Some developers and managers insist that 100% coverage is the standard. Others, especially thought-leaders, insists on a number somewhere between 80% and 90%.

So what is code coverage? And what, if anything, does it bring to your team?

Code Coverage is Just Another Measurement

Code coverage is just like every other quality control metric you might use to monitor the quality of your codebase. Its also probably the simplest.

While I won’t review every aspect of code coverage (line vs. branch vs. condition, etc.), coverage reports are telling you the percentage of code executed during your tests. Coverage gets collected by invoking your tests with a special agent or compilation flag that tracks the blocks of code executed while tests are running.

Each coverage tool works a little differently, but at the core, most of them either create an “agent” that performs instrumentation on your source (Jacoco and Coverage.py work this way) or by having access to internally generated files used for debugging to track what lines get hit.

Regardless of how they work internally, most coverage tools will generate a report in the form of an .html file and list the coverage for each package, class, etc. This report is where you find the overall percentage of coverage for your codebase. That number is then what you would use if asked by your manager, co-worker, or significant other (OK — probably not your significant other) what the code coverage is on a particular project.

Code Coverage Acts as a Validator

What code coverage tools do is validate that your tests are testing what you think they are. When we write tests (especially unit tests), we are often targeting a specific part of our code to verify. Code coverage can help us validate we are indeed testing the code we intend to.

It is this validation that makes code coverage a quality control mechanism for your codebase. If you have 100 tests in a test suite, but they all are executing the same code over and over, most of them are waste. You can brag all day about how many tests you have, but in reality, you only have one that does anything useful.

Theoretically, then, it would make sense that teams would strive for high coverage. The more lines covered, the more of your code you have tested, right?

Unfortunately, not quite. Here are a few reasons why.

Lines Covered != Behaviors Covered

This is a subtle yet crucial difference in how to understand code coverage. Just because you have executed a line in code, does not mean you have correctly asserted the behavior related to it.

In an extreme case, this might be a test that doesn’t contain any assertions at all. It just calls methods and executes lines. That sounds absurd, but you will be surprised how many how easy it is to commit that mistake yourself. All it takes is a nuance of a requirement changing.

For example, have a simple function that returns a list of elements. Easy enough. Let’s say now the requirement is that you need the list to be sorted in descending order. If you had a test earlier that just asserted the items in the list to have expected items, you probably didn’t assert that they were in descending order. If you had a sort, run your test, it will pass and say your code coverage is 100% — but you didn’t catch that you sorted the list in ascending order :)

I know that is a trivial example, but I believe it shows that code coverage should not be the focus of your testing efforts — you should focus on testing behaviors! If you concentrate on writing good tests that exercise all aspects of your intended behavior, high code coverage will result.

Use Code Coverage as a Guide

With this aspect in mind, code coverage tools change become guideposts to show you where you might need to focus on testing rather than a number to attain. You can run your tests, review the code coverage reports, and make a decision about if the missing coverage is something you need to investigate or not.

I really like Martin Fowler’s post on this train of thought. In particular, this line:

If you make a certain level of coverage a target, people will try to attain it. The trouble is that high coverage numbers are too easy to reach with low quality testing.

Basically, making a specific code coverage number the goal can promote poor testing to just get to a number rather than writing high-quality code with high-quality tests. I’ve seen it happen.

Is 100% Worth It?

With all this in mind, should 100% code coverage be your goal? Is it even achievable?

There is a pretty brief academic paper on that question titled “Is 100% Test Coverage a Reasonable Requirement?” that you find here. They did a study over a two-year project, measuring code coverage metrics as well as interviews with the development team.

Yes, 100% coverage is achievable.
It might be worth it.
Code coverage is not the best metric for quality and shouldn’t be the goal.

With that in mind, I would suggest that 100% likely isn’t worth it. For most projects, it is simply too expensive and has an ironic risk of promoting lazy-testing.

If you still want to enforce some coverage requirements, the teams I have worked have enforced 80% coverage on all new code*. This requirement helps to validate that sufficient testing has been done, doesn’t require the entire project to be updated overnight, and is likely not too expensive. Not to mention, it helps to promote small and frequent changes.

In summary, instead, focus on writing good tests. Focus on writing code that is testable! Remember that code coverage increases by decreasing the total number of lines with the same tests; can you simplify that ifstatement with two booleans to a single “state” enum? That mentality is worth more than achieving a number.

Happy coding!

Full disclosure here: my team does enforce 100% branch coverage on our applications. However, we also recognize some classes require writing useless tests that don’t verify your code will work in production. A typical example is the correct SQL query. Those tests should be done as integration or end-to-end tests against an actual database with the correct schema and database engine versions. With that in mind, we are OK with adding such classes to our exclusions lists, but each class must be agreed to by the team before being added.

Originally posted on Medium