Jay Glass for NewDay Technology

Posted on Dec 16

The Backend Testing Breakup 💔

#backend #testing #refactoring #techdebt

Or: How to maximize the value tests provide

Preamble

Ah, testing. If you’re like me and have a love-hate relationship with tests, this article is for you.

Having the right testing at the right point in the development process can be a lifeline and prevent costly bugs getting into production, but test suites also have the tendency to grow into unwieldy monsters.

Have you ever worked on a project where understanding and updating the tests was more complex than implementing the actual (money-making) business logic? If not, I guarantee at some point you will.

But it doesn’t have to be this way.

Tests can be simple and manageable, and my hope in writing this article is to help steer testing in a less painful direction for everyone involved - myself especially.

Goals of this article

To suggest a battle-tested way of making testing as painless as possible.

To provide a way to implement this form of testing without impeding feature development.

To cover the importance of buy-in and how to get it.

But first

Let’s not forget the goal of testing: to give us confidence that our system will behave as expected in production.

Any test we write should support this goal.

So…

How do we make testing as painless as possible?

With these simple principles:

Test for each thing as early as possible

Only test for each thing once

But to dig into these principles we need to cover (and establish common language for) the levels of testing.

Levels of testing

I am yet to work at a company that doesn’t have at least one form of testing named differently from everywhere else I’ve worked.

We generally all agree on what a unit test is but if some of the other names differ from what you’re used to, please bear with. I’ve even half-jokingly suggested coming up with new, less-opinion-loaded names for the various levels of testing but I usually get shot down because “there are existing names already” - and everyone then spends the rest of the session debating what those existing names' responsibilities actually are…🤦‍♀️ so for now let’s just go with what appears to be most common.

The test pyramid trophy tetrahedron hierarchy

The test pyramid/trophy/(insert shape here) is great but it does tend to direct focus to the point that there should be more tests at one specific level rather than some other specific level. Which is often translated as “level x should have more tests than level y”. In the Pyramid paradigm, it means writing more Unit Tests than Component Tests, more Component Tests than Contract Tests, and so forth all the way up.

#1

Which detracts from the point: We shouldn’t be focusing on the amount of tests. We should be focusing on the practicality. And this is where “Test each thing as early as possible” comes in:

If some logic can be verified in a Unit Test, we should do so in a Unit Test.

Because these test single things, they have single points of failure. A failing Unit Test tells us immediately what unit of logic failed which makes it extremely fast to find and fix.

As we go up the testing levels, the amount of moving parts increases so if we skip Unit Tests completely and something breaks, did it break because of an internal logic issue? Or an integration issue? A network issue?

Sure you can start debugging or trawl through logs but if you could, why not have the Unit Test to tell you exactly which bit of logic the failure came from?

And the same applies as we go up the levels: we shouldn’t be verifying integrations or contracts in Smoke Tests, putting them in the earlier levels means we can catch issues in a more isolated manner making the root cause easier to identify.

A failing Contract Test takes you directly to the contract that is causing a problem. If we skip Contract Testing and a Smoke Test fails, we need to do a lot more digging to get to the information which tells us it is a contract causing the issue.

So, when adding some new functionality, we should (repeat with me) “test each thing as early as possible”.

#2

And if we have a Contract Test verifying that a contract is as expected, we don’t need to do so again in an Integration Test. Some third-party dependencies may not even need any Integration Tests at all, especially if all we are concerned with testing is that given a specific request format, we get a specific response format.

We may find that maintaining the additional Integration Test gives us no value.

The same goes for Component Tests: we shouldn’t be testing any logic. That should already be done in the Unit Tests. So if a Component Test fails, we know we’ve already proven the internal logic so it must be something else - like a configuration or internal dependency registration issue.

If we start asserting logic in Component Tests, contracts and integrations in Smoke Tests we are duplicating our assertions, crossing the responsibilities of our levels of testing, and creating more test code to understand, maintain and update leading to an eventual monster of a test suite that developers will treat with an appropriate level of dread.

So mantra #2: Only test for each thing once

Defining the test types

…or each level of tests' responsibilities

To help illustrate this we have the following scenario:

A web API for a Zoo which allows zookeepers to:

add an animal, which will result in the animal being added to the database and an email notification being sent.
change the feeding time of an animal, which will result in the time being updated in the database and an email notification being sent.

For the tests, following the mantra of “Only test for each thing once”, we get the following breakdown:

Unit Tests

These are done on the smallest possible unit of logic. We are only concerned with validating that the unit of logic behaves as expected given different inputs. To ensure we are only testing our logic and not any dependencies, dependencies are mocked to return an expected response to help us assert the unit of logic’s behavior for a specific scenario.

These tests are usually written against the public functions defined in business logic.

Example Test Cases

AddAnimal returns success when the mocked AnimalRespository reports that the record has been created.

AddAnimal returns an AnimalAlreadyExists error when the mocked AnimalRepository reports that the animal already exists.

AddAnimal returns success when the mocked EmailClient reports it successfully sent the email.

AddAnimal returns an InvalidEmail error when the mocked EmailClient only forwards the email request to the mocked EmailClient if the call to the mocked AnimalRepository returned successfully.

etc

and similar for the ChangeFeedingTime function of the FeedingTimeService.

💰 Tip: Keeping functions/units-of-logic small results in smaller, more manageable and easier to understand Unit Tests.

✅ Now we know that each individual piece of business logic works as expected.

Component Tests

The next step is to test that all these functions can be chained together and execution flows through the running application as expected.

Since we are only concerned with testing the execution flow, we can mock external dependencies (the Database and Email Web API).

We have also already tested all the logic within the various functions so we don’t need to check any of those again either.

All we assert is that after entering the system, execution flows through to the correct external dependencies and back resulting in the expected response.

Example Test Cases

When an add animal request is sent to the API, the animal is added to the database, an email is sent and we get a success response.

When an add animal request is sent to the API but the animal already exists, we get an “animal already exists” error response.

etc

✅ Now we know that when pieced together, execution flows through the individual pieces of business logic as expected.

💰 Tip: If you need to provide a mock of your entire application: these mocked external dependencies can be used in place of the actual implementations (e.g. by clever in-memory injection) and viola: you have a version of your application which has all the logic, requires almost no additional maintenance, and isn’t beholden to external dependencies causing trouble.

Contract and Integration Tests

We now know our application works when the external dependencies behave as expected - but how do we confirm that the external dependencies behave as expected? This is what Contract and Integration Tests do.

Contract Tests

These ensure that the external dependencies contracts conform to the schema that we expect.

For example: We would have tests which send the Email Web API the various requests we use, to ensure they are accepted with the fields and their values in the format in which we are sending them.

This means that if the Email Web API were to suddenly make a field mandatory that we are not supplying, the Contract Tests would start to fail and we would know there is something we need to change.

Example Test Cases

Email Contract Tests

When an email request is sent with all the required properties, we get a success response.

When an email request is sent with a body greater than 5000 characters, we get an error response stating the body is too large.

Database Contract Tests

When an add animal request is sent with all the required properties, we get a success response.

When an add animal request is sent without a name, we get an error response stating the animal name is required.

etc

Integration Tests

Similarly, Integration Tests ensure the external dependencies behave as we expect.

E.g. The Database may have a rule which prevents Junior Zookeepers from setting feeding times. In this case we can have an Integration Test which asserts that if a Junior Zookeeper tries to change a feeding time, the database returns an error.

Example Test Cases
Email Integration Tests

When an email request is sent, a test email account receives the email.

When an email request is sent which is identical to a previous request, we get an error response stating the email is a duplicate.

Database Integration Tests

When a Senior Zookeeper updates the feeding time, we get a success response.

When a Junior Zookeeper updates the feeding time, we get an error response stating permission denied.

etc

✅ Now we know that the application behaves as expected and the external dependencies behave as expected.

Smoke Tests

So what’s next? Well, when we deploy our application it may be running on different hardware, with different configuration, network rules and restrictions, and who knows what else.

Smoke Tests help us identify any major issues which haven’t and can’t be caught by the previous levels of testing.

The usual convention is to pick a few of the mission-critical features that touch on different dependencies. Because we know the application behaves as expected in all other regards, we can keep the amount of tests in this level to very few. This has the bonus of saving us test maintenance headaches as smoke testing is often the most time-consuming and complex, both in functionality and time to set up.

Example Test Cases
When an add animal request is sent to the API, we receive a success response.

Note: We could create another test case:

When a feeding time update request is sent to the API, we receive a success response.

But the first test case already touches both the database and the email dependency so for this example, the one is sufficient.

✅ Now we know that the application behaves as expected, the dependencies behave as expected, and when deployed, the application can start up, dependencies can be reached and some of the mission-critical features run as expected.

To summarize

We know our logic works thanks to Unit Tests.

We know execution flows through the application correctly thanks to Component Tests.

We know external dependencies conform to the contracts we expect thanks to the Contract Tests.

We know external dependencies behave as expected thanks to the Integration Tests.

And now we know that the application runs in the various environments thanks to the Smoke Tests.

Why does this work?

Because each individual piece has been tested, and we’ve tested the individually tested pieces work together, an we’ve tested that the external pieces behave as our individually tested pieces expect them to, and we’ve tested that our entire system still runs as expected in deployment environments.

Levels of testing conclusion

We’ve got a clear split in the responsibilities of the various levels of testing.

We have a common set of names to use for the various test levels.

We know exactly what to test in each level.

We know to test anything in the earliest possible level.

We know not to test for anything that has already been proven in an earlier level.

And the result is

Happy Developers 🥳 Need I say more?

…but in case I do:

A much more manageable test suite with simple tests that are easy to understand, update and debug.

A clear idea of how to implement testing when picking up a new task.

A common testing paradigm to use across backend teams.

And the real money saver: Quicker feedback on issues with lower levels of testing highlighting issues sooner.

What about other forms of testing?

To reduce the scope of this article non-functional forms of testing have been intentionally omitted.

Performance, security, scalability etc are all important however they differ greatly and have different levels of relevancy depending on the particular project. In my experience, the functional forms of testing tend to be bigger time sinks as they are run and updated far more frequently - so the focus is on the area with the biggest potential for improvement - the functional testing that forms part of the typical testing done for every code change from the developer’s machine to production - as part of the typical software development lifecycle.

Let’s not overload ourselves by trying to do absolutely everything in one go.

Putting into practice

Before this article suffers the same fate of so many before it and gets lost in an abundance of browser tabs, if you see value in the concepts - why not do something about it?

Implementation Guide

The problem with a lot of tech improvements is that they’re seen as blockers. Drop everything and work on this improvement which will prevent us from working on any new features or business as usual for some amount of time.

This may work for smaller tasks than improving an entire, mature test suite - but for large tasks, this may and should have the business quaking in their boots at the prospect of falling behind on delivery schedules.

But there is a way to develop your cake and eat it:

The strangler pattern

Following the concept of the programming pattern, instead of updating the entire test suite in one go, we can methodically update tests as we touch on them as part of the day-to-day software development process.

Decide on and agree a direction for the test suite across the team.
When a particular area of code is worked on, as part of that work update any of the tests that code uses to conform to the new direction.

This allows the migration of tests, bit by bit, without having a massive impact on the day-to-day commitments to features and business-as-usual development tasks.

AND means any new tests created, are created following the new direction - so don’t add to the mess that needs to be migrated later.

After a few months of this, Engineering Improvements tasks can be raised to migrate the still outstanding tests. These can be small enough to slowly pick away at without affecting the team’s cadence much.

And before you know it, you’ll be reaping the benefits of a smoother, quicker and easier test suite.

Getting buy-in

But before doing anything, you’ll need to first convince your team and then convince the people who decide how your team spends its time that this is a worthy endeavor:

For the team:
- if the current test suite is painful enough that you’re reading this article - that’s the job done. Otherwise one might suggest that this is a way to ensure tests don’t get to that point of pain.

💰 Tip: Make this one of your or your team’s official objectives to get additional visibility and support within your company.

For the business:
- increase feature delivery by reducing the overhead/cost of developing new features due to the time spent updating unwieldy tests
- maintain and improve client confidence with the software reliability increase due to improved testing
- this will also make for happier developers which means less developer churn - less costly onboarding
- if multiple teams adopt this, the company will benefit from the knowledge-share across teams and the cohesion that comes from having a shared testing paradigm.
- and can be done without additional cost if done with a strangler approach - which is to only move testing to the new way of doing so when the test code for that specific feature needs to be modified anyway

Fin

And there you have it. The lessons I’ve learned (and what many others have shared with me) over the years to help take testing from a brittle behemoth to a simple, structured, manageable, set of utilities to give us confidence in our system.

We’ve covered:

The levels of testing
Testing each thing as early as possible
Only testing for each thing once
How to migrate old tests without costing the company
How to get understanding and buy-in from the company

Thanks for reading and happy testing! 🤖

DEV Community