“Test coverage is not correctness” and “test coverage is not a quality metric” are two of the lesser understood common phrases you’ll here people parrot. I agree with the first, but strongly disagree with the second.
I love test coverage because it’s a strong, negative, leading indicator towards quality.
The presence of test coverage “we ran this code”, is not the same as the test being correct “I verified the correct behaviours”, but a lack of it tells me a lot.
It tells me to expect low quality and poor automatic verification. It tells me where the worst parts of your codebase probably are.
This is because coverage comes for free in a well-tested system - it’s a measure, not an attribute, of a codebase. It’s a torch in the dark of working effectively with legacy code.
What else does a lack of coverage say?
It tells me that a particular codepath is either so well trusted its owners perceive it cannot fail (or will fail another signal immediately), or it’s so complicated to exercise that it’s not executed often at all. All code infrequently executed is bound to eventual failure as the world changes around you.
Coverage conveys a lot of information, don’t ignore it. It might not be a proxy for correctness but that doesn’t make it useless.
Footnote: Can we fix the “coverage is not correctness” problem? Actually, yes. Mutation testing is a technique where you “automatically test your tests” - notably implemented across languages by the tool Stryker, mutation testing creates subtly modified versions of your codebase tens to hundreds of times, and executes your tests over these “mutants”.
If your tests don’t fail after they’ve been run against mutants? That proves you’re not correctly verifying some behaviour. Mutation testing is awesome and infrequently seen in the wild.
Top comments (0)