Warren Parad

Posted on Feb 2, 2023 • Edited on Feb 4, 2023

You are probably testing wrong

#testing #microservices #agile #testautomation

I love having to answer the questions that come up regarding testing. It's amazing that something that is pure waste according to lean for our customers and users can still offer us so much value. In less mature organizations, an often unfortunate and incorrect assumption is that we need to ensure test coverage. And so the question get's asked:

How many tests should we have

This is a question whose only wrong answers are those that contain a number. 5%? no, wait 80% test coverage is correct!

But only right answer is:

The right number is based on the problem we are trying to solve

My colleagues often joke that this is just another example of It Depends. Because it really does, the only wrong answers to this question are based on fixed numbers or percentages. And I'll share exactly why.

Why write tests at all?

First, we need to start with talking about why we should write tests at all.

Normally, I encourage my teams to write tests like they write automation. Is the code frequently changing recently, do you want to prevent a regression, is the code critical to the success of the application where a failure would spell huge problems, or is there a business reason the code is the way it is (in other words a unit test is better than a code comment).

How many tests this comes out to be, is a result of the domain and those criteria, but my perspective if that it doesn't matter if that is 1% or 100% as long as those things above are tested and nothing else is, then I'm good. The caveat is libraries, which can have component testing at 100% because of the widespread usage, and that's because of Hyrum's Law, we know all the aspects of the library we become used.

A deeper look

Let's review those different areas, where we should write tests.

1. Frequently changing code
Writing tests in the wrong spot is purely a waste, but when you have an area where code changes frequently, this is one of the place places to put tests. It might sound counter intuitive since adding tests will slow down development. But since this areas changes so much, bugs here will happen with the higher probability and frequency. The reason this happens is because there are usually N bugs / lines of code. And every additional line there is risk of a new bug. Add a test.

2. Preventing regressions
At some point you'll change something that has been working before. It rightfully doesn't have any tests because it wasn't that important when it was written, and it's mostly straightforward. However, you are adding something new and really don't want to break what was there. This is the agile test. Since we didn't need a test before we didn't write one. But now that we are changing the code, this is the perfect time. We'll add the test based on the existing code, to prevent additional changes from having an adverse impact. Add a test.

3. High impact code
Does this code being wrong spell disaster for the company? Is it more than $1k if we get this wrong? If the answer is yes, then there should be a test. "User can't log in", isn't $1k, and often even "User can't click buy" isn't even $1k, since those users come back. But... a bug that automatically deletes your whole database because it doesn't have a where clause, now that's bad. Add a test.

4. Business Logic justifications
Comments are the worst. And the worst kinds of comments are

// The sum of a and b
sum = a + b

Completely worthless. Good comments explain the why, here's a good example

/* We use quick sort because we expect this to be the fastest
based on our expectations of what the data looks like, the
data is expected to look like this...
*/
result = quicksort(array)

But that's not great, and it makes a lot of this confusing, and worse if we had some other requirement like:

/* IMPORTANT: DO NOT CHANGE, we only allow alphanumeric
because we use this in XYZ other process.
*/
result = string.replace(/[^a-z0-9]/g, '')

This is the perfect time to write a unit test, and express that dependency as the test description. That way we can be sure no one will accidentally change or delete the comment. Add a test.

Types of tests

Now that we know which areas of our code we should test, we can start to think about how to actually test these things. You'll notice all the examples above lend themselves to be Unit Tests. And that's because almost everything should be a unit test.

How do I know that? I look to the test pyramid, which points to a majority of the tests being unit tests. Above unit tests we have in ascending order:

component/service level
integration--also known as Production Tests
exploratory manual tests

If we think we need a service level test on our endpoints, then I would expect an exponential number of tests at the unit level, same goes for integration. Have 2000 unit tests?, that means you want ~20 service level tests and ~1 integration test.

And the only time you want exploratory manual tests, is during pull requests where new functionality is being added. Your engineering team during the PR should be diving into the automatically built PR ephemeral environment, and trying it out to see if it breaks.

So shouldn't everything be tested?

You'll notice that no where in the pyramid are E2E (End to End) tests. That's because in any real technology deployment it's impossible to test anything end to end and we actually don't need to. That's because our end to end isn't run synchronously, so we don't need synchronous tests to validate that. Further, most of this will happen any way when our users use our technology. We'll see what is and isn't working and we can fix it on the spot. Does Data IN and Data OUT both work? And do all the unit tests for how that data is loaded and saved work? Then the process is isomorphic and will work, we don't need a test for IN and OUT.

Also the easiest answer is just Definitely not. At most companies it makes sense to have a couple (that means 2) validations that test the most important value, that thing that must always work or else the company will go under. But everything else, the test costs more to write and maintain then it does to fix the problem.

With great disdain I'll reference one company that found being on one extreme to work for them:

So how many tests do we really need? The answer is the tests the make up those things we need to test. For one service the coverage may need to be 10% and another component might be 80%. But arbitrarily setting a value is irresponsible. And we start to develop antipatterns.

Testing antipatterns

There are three fundamental antipatterns that exist with test coverage. The first one is "we must have X% test coverage". Let's dive into that. It's so bad, because it falls pray to "we get what we measure". If we are forced to write tests, then we will write the simplest and easy tests. And that means we are neither ensuring that the tests are correct, nor that the tests are the ones we should have. And if was easy to write the tests in the right spot then we would have from the start. So clearly the tests we need are the ones we don't want to write. This causes all the wrong tests to be written, and thus not only is this waste according to lean, but also doesn't give us any value.

The biggest example of the tests that make no sense is "testing user login". We never need to test user login, because if it doesn't work we can know immediately. Since every change an engineer makes needs login to work. Further, we have monitoring up and running that if login doesn't work, we'll know. Also let's take a look at login. Your team didn't write login, your company didn't even write it. You used a third party product to handle 99% of your login needs, be it Auth0 in the B2C space or Authress for example in the B2B space.

Do not test software from another company. They are already testing their software all the time, and if you don't trust them to not break it, you need to find a different provider.

The second antipattern is "The deletion of code causes the test coverage to go down". I love this one. You might think a rule like "if the test coverage goes down block the pull request". But let's say you have a problem with ten lines of code and 50% of that code is tested. If you remove a line of code that is not tested your coverage goes up to 5 / 9 => 55%. BUT if you remove a line of code that was tested then your test coverage goes down to 44%.

That means as an engineer you aren't allowed to remove dead code, code that does the wrong thing, or just fix something and do it more effectively, because you wrote a rule that doesn't make sense.

The last antipattern is "production is never broken."
If production never breaks then we have too many tests. Full stop. The goal is to have tests that prevent production failures, and we don't want tests that will never prevent prod failures. But how do we know which those are?

But still I can’t see why prod not breaking is not something we should aim for.

It's simple actually, if production never breaks, we have too many tests. That means we could waste less time writing tests and more time delivering value. If you never see a problem then it is too far away. And as mindful testers we should be focusing on preventing real problems not imaginary ones. Prevent production problems that will never happen is a waste of everyone's time.

This brings us back to the original point of adding tests where we know we need to. But it isn't always so easy to know where those things above are. Thankfully we can use the DORA metrics to help us. And specifically:

How many prod failures do we have today--Called the Change Failure Rate - CFR
What's the Mean Time To Resolution--MTTR?

Those are two of the four DORA metrics, and if we don't know the answers to those then we also don't know how many tests we should have. If you aren't tracking how many production problems you are getting and how long it takes to fix them, then adding metrics for testing and creating arbitrary tests is the wrong solution. You simply are spraying tests everywhere hoping to hit something. Don't add tests randomly.

Another way of looking at this is, I want to see prod breaking in ways that don't matter, but I don't want it to break it ways that do. If production isn't breaking at all, you are violating this, and of course if it is breaking in ways that does matter, then you need to add more tests.

What is defensive programming

In the guise of testing, we often forget about Logging/Monitoring and more importantly how to write better code. The former I've talked about at length, so I'm not going to go into it here other than to say, if we don't know when we have a problem and also details about what that problem is, then have no idea what the fix should be.

The latter is Defensive Programming. If your code could break something, no matter of unit testing, service level testing, production testing, or exploratory testing is going to find it (unless you are really really good). So in many places, you can throw an extra try/catch around your code, or execute the current code and new code in parallel and then compare the results in memory, before returning the result. It doesn't matter in these situations if your code is tested, because this is a much easier, faster, more reliability, and safer way to be correct. If you can write the simple test, write the test, but that doesn't mean you shouldn't also prevent bugs in production using non-testing based strategies. That's why we have PRs after all ;).

Doing the followup analysis

Some teams do RCA or Root Cause Analysis, on problems that happen in production. RCAs are great, but the only wrong answer is "more tests". It's the wrong answer because, the only place where you will have problems is where you didn't test correctly. So coming up with this as the answer is almost always wrong. Instead we need to look at the long term solution for each problem.

For instance, how did we break "login"? Is it a component that has an issue, is something bespoke that we did, custom, or a change we don't understand. The fix isn't a test, it's potentially "stop doing a custom thing" and do it in a standard way. Or maybe we need education. Adding a test should only be done if we are doing the right long term thing, education is there, but it falls into one of the above 4 risk categories.

Conclusion

Adding unit tests to all our code creates a burden for future development. So we need trade-off extra burden for extra value. Arbitrarily adding tests to meet a "test coverage" always results in the wrong tests.

Even the simple thing like "test that user can log in", is way more complex discussion than it seems. Is it one user, ten users, or every user. How about users that are currently logged in, is it a problem for them? Do we have any ongoing expected user activities or a possibly unexpected spike in user activity that will happen?
The simple thing "let's test user login" isn't straight forward. If no one is logging in and it only affects a couple of users, then that's not important, instead we can be reactive.

Further we need to understand when to be reactive versus being proactive. Testing is proactive, find problems before they happen. We know via the Pareto Principle it would take an infinite of time to prevent all bugs, which means we have to let some through. We don't even have a large finite amount of time let alone infinite. So don't test everywhere, test only in some places. The highest value places. Those we cane be proactive, but everywhere else we should optimize for being reactive.

The truth is that we likely don't need anywhere close to the number of tests that you are collectively running today. I'm going to say anecdotally something like 20% unit test coverage, 5 Service tests per service, and 1 production test per team on average is the right amount.

Clever tests in the right spot are worth so much more than an arbitrary percentage. "Should this thing have automated testing" is a conversation, it definitely can't be "we have some arbitrary metric to hit". Like we should never say "we must have 30% of our code unit tested" besides that being a ridiculously high amount, it's actually detrimental to have more than a few tests. Since every test we add is a burden on building new things.

You want tests where your risks are. Risks that will end the company. Cause 5%, 10%+ revenue loss. An easy way for me to look at this is, you aren't at a scale that's appropriate for having more than one or two E2E tests at the whole company. I'd rather see us move faster and break some things. If production never breaks then we have too many tests.