In this article I’ll discuss what Code Coverage is and its usefulness and limitations. I’ll advocate for a risk-aware approach to software quality and give a few practical examples in C# and F#.
An Introduction to Code Coverage
In the development world, the term code coverage gets thrown around a lot. There are many tools built around code coverage (some absolutely fantastic), and it is a very valuable metric to have. However, code coverage is seriously flawed if it is held on too high of a pedestal.
Consider the following C# code, noting the coverage indicators in the left margin:
This method has only 1 out of 6 lines not covered by tests. Since 5/6 is 83%, we can state that this method has an 83% code coverage, just referring to lines of code.
Is that good? Is that bad? It depends.
On the one hand, 83% is pretty high up there. It’s not 100% coverage, but we know with confidence that the majority of statements execute without experiencing issues severe enough to fail their tests.
However, without seeing the specific tests, we can’t know for sure that we’re testing anything regarding the actual result of this method, only that it was called during at least one test execution.
There are absolutists out there who say code needs to have an arbitrary degree of coverage. I most hear of people demanding 80% or 100% code coverage.
I think, however, that we need to consider risk in our testing efforts.
What’s the actual risk that we encounter here? What’s likely to break over time?
The code isn’t likely to suddenly return a slightly different spelling for Corruption damage. Additionally, adding a new case to the switch isn’t likely to cause an error the compiler won’t pick up. So what are we really testing?
With the code above the most likely failure, in my estimation, is that we add a new member to the DamageType
enum and fail to add a case for it here. This would result in an incorrect string coming back to the caller due to a line not currently being present.
Code Coverage does not mean Code Correctness
And here’s the problem with code coverage: even if we have 100% code coverage of this method, adding a new member to that enum will cause a potential bug unless we remember to add a matching line here.
No, if we’re trying to minimize the risk of future defects in C# and do something like the following:
By throwing an argument out of range exception, we maximize the chance that we’ll find the issue. However, there’s still a risk that we wouldn’t find this case.
The best way of reducing risk of introducing a new enum value would be to transform this into a functional programming type of scenario and put functions related to the enum next to the enum. Here’s the equivalent code in F#:
The F# match
statement is like the C# switch
statement, but more powerful. Of note here, it checks itself to see if it is exhaustive of all possible inputs. If we ever add a new DamageType
member, we will be warned if we don’t have a case for it.
The Usefulness of Code Coverage
Yes, that’s a bit of a straw man argument. Most code is more complex than a simple switch statement. Let’s look at another example:
In this case our coverage is pretty good at 60%. There are two branches we have 0% coverage on, however.
The first is in validating gameObject
if it came in as null
. This is typically the sort of line that I tend to leave uncovered as the null check is present only to catch potential issues at the boundary of a major piece of code. For example, I’d rather get an argument-related error at the beginning of a method than a vague null reference exception later on.
Note: An exception to this is if you’re building a public API to be consumed by numerous other individuals and you want to make sure that your validation behavior remains consistent from release to release.
The second branch we don’t have coverage on is the portion where the requested cell did not exist. In this case, that cell is created on the fly and added back into the collection.
So, do these test cases matter or is 60% acceptable?
My personal inclination here would be to ignore testing the argument validation (unless I’m building a public facing API). I would, however, like to see the dynamic creation of the cell incorporated into a test case.
It’s not that I believe that the line in this method will someday fail, but rather the lack of code coverage for this statement tells me that I may be missing significant unit test scenarios in other methods as well.
This line tells me that I ever have dead code that will never be hit or I need to figure out the larger case when that line is hit and wrap a unit test around that process.
The Myth of 100% Coverage
Finally, let’s close with a short example of 100% code coverage:
Here, we call GetAnswerString
with 2
and 2
. This method should give us back “The answer is 4”. Unfortunately, the developer didn’t really do addition and the method always returns “The answer is 42”.
Unfortunately, the unit tests are just built to ensure that the string starts with the expected prefix, so the actual value isn’t tested.
As a result, we have 100% passing tests and a blatantly incorrect method.
This is why you should take code coverage metrics with a grain of salt: Just because a line is executed by a test, doesn’t mean that the line is correct or accurately tested.
So, what? Do we not cover code anymore?
I’m not saying we stop tracking code coverage. I’ve actually integrated it into my builds with OpenCover recently, and have been very happy to watch my coverage climb.
What I am saying is that we should view code coverage only as a small part of the picture in ensuring quality over time.
When I add or change code, my expectation is that I have two strategies for catching issues with the changes before the code even makes it to quality assurance for review.
Sometimes the safety net is a unit test. In fact, this is my preferred way of adding a safety check to modified code.
Other times the safety net involves leaning on the compiler and/or source analysis tools to find blatant issues.
Manual testing by the developer is a good gate to go through as well, with a quick visual inspection or API call to verify that all is working well.
Keep testing your code, and keep tracking coverage, but keep in mind that what we’re after is not a magic number that climbs ever-higher, but an efficient, risk-aware software development process that optimizes code to reduce the risk of adding it as well as maintaining it in the future.
The post The Myth of Code Coverage appeared first on Kill All Defects.
Top comments (6)
One way to avoid some of the pitfalls you'd presented here is using mutation tests technique, it's a very cost way of guarantee more code quality to our projects, but done correctly, it could be very effective.
This is a huge point. I love it. Yes, you can automate some of these tests, particularly with enums. I picked a few readily available examples from my own hobby code to discuss.
Fuzz testing, test data generators, and looping over all possible values won't find everything, but it's something to strongly consider.
I think of code coverage like I think of tests:
The fact that I have a ton of tests that pass doesn't mean that I don't have bugs. But if a test doesn't pass, then I know I do have a bug.
Likewise, the fact that I have high coverage doesn't mean I have good tests. But if I have very low coverage, then I know I don't have enough tests.
Even if some of the tests don’t verify anything valuable, it’s good to have some test exercise it before it goes to production.
So if i’m seeing a number like 60% code coverage. I know that 40% is either never executed or we just hope it doesn’t break our app.
I'm glad we agree!
Definitely agree, with the caveat that you have some other plan for verifying your code if you're not using a test.