Understanding the test pyramid

#testing #javascript #fullstack #webdev

Originally posted on my blog harrisgeo.me

Testing is a really important part of coding that is very often ignored by developers. How many times have we all experienced that pushing code to production (or not even reaching that far) had broken part our service / website?

It especially sucks when a part of the system that is totally unrelated to our code ends up having a problem due to our changes. That phenomenon is usually referred to as a side effect. Putting together several chunks of code that were written by multiple developers is a recipe for disaster as a result of side effects.

Conflicts may arise in files that were modified by more than one person. These conflicts often end up causing bugs and other unpleasant system behaviour. So what do we do to put ourselves in a better place?

The testing pyramid

You might have heard this term by quality engineers (brrr who even talks to them? 😆) when they want to describe how testing can be abstracted in multiple levels. In a world where releasing doesn’t result in us being in a cold sweat, we need to make use of the testing pyramid (along with a “few” other concepts). That way, we will feel more confident that our new code is not going to break the system that easily.

Image by oreilly.com

As shown in the image above, the testing pyramid includes 3 stages.

unit tests (small tests)
integration tests (medium tests)
end to end tests or e2e tests (large tests)

While at Google they like to refer to each one of them based on the impact it has on the system (thus the size), I think the rest of the world prefers the technical term when referring to each one of them.

If you notice in the image above, the unit section is quite bigger than the integration section and the second one itself is bigger than the one for e2e. That is a good way to quickly visualise the amount of tests that are supposed to be written to ensure good testing balance. We will analyse each stage further down in this post.

To help us understand each test's purpose, let’s use as an example the construction for a multi floor building.

Unit tests

Let’s think of unit tests as the idea of making sure that each tile, brick or cable behind the wall works fine.

Unit tests should be testing small pieces of code that run on a single process. Examples of such pieces can be helper functions, independent React components and other I/O operations. We want to test code that has a single purpose and mainly makes our development work smoother. For that reason the majority of the tests our system is going to have will be unit tests.

Another important requirement for unit tests is that they should not be accessing the disk or network. For cases where they rely on libraries or external sources, the use of test doubles is what will help us solve that problem without breaking the rule of no network / disk access. There are cases where unit tests can access public APIs and other external sources but in this article let's keep it simple.

Test doubles are common in all kinds of tests and they include a few different types that can be quite useful for us. These are stubs, test fakes and interaction testing.

Stubs

Stubs (often referred to as mocks) are the most commonly used test doubles in unit tests. Stubs work in a way where we return a hardcoded result we have already predefined before executing the test. That is really useful when our code uses external libraries and dependencies that are supposed to make asynchronous requests to our network or other distributed sources. That technique keeps us on track and we can continue testing without relying on communication with code we have no control over.

Unit tests are usually really fast both to execute and to write. For that reason they should always be included when pushing changes to our code. Most teams I have worked with would reject your PR if it didn’t include any unit tests. Again, with unit tests, the more the merrier.

However, it is really important to mention that only focusing on adding unit tests DOES NOT mean that our system is going to be bug free. That is why I think that concepts like 100% test coverage are b*^%#$€t. But again, that’s my personal opinion. If we want to ensure quality then maybe we should start worrying about the rest of the stages in the test pyramid as well.

Integration tests

Now let’s think of the whole room for the office. Maybe the walls, the floor or even the whole kitchen. They all contain lots of smaller units that when put together as a group do something bigger. Stuff like turning on the lights or making sure the coffee maker will have enough and continuous electricity to work when we want to make some coffee.

Integration tests are used for testing groups of smaller units where we want to see how they behave as a whole. In the frontend world integration tests are often referred to as UI tests. A good example of a library that helps us with that in the React world, is react-testing-library. In the backend world they are often referred to as contract tests or api tests.

When it comes to speed, integration tests sit right between unit and e2e tests. The idea is that we want our code to only reach localhost in order to read or write any data. In other words, even though they are allowed to talk to other services, these services are only allowed to be on the same machine. To achieve that we need to once again make use of test doubles. This is where we can make good use of test fakes.

Test fakes

Test fakes as the name suggests are fake representations of the original service our code is supposed to be talking to. Setting up test fakes can be a bit painful as we need to mock the service and or database we are supposed to be talking to but once this part is done, the value it returns is spectacular. Libraries like nock or mock-service-worker are some really good tools that can help us achieve test fakes.

For Node.js services we can spin up a temporary Database and seed it with some controlled data. Doing that, our API will work as intended but it will instead use our fake Database and test data.

For the temporary Database we can spin up a Docker container that will contain an instance of that Database we are using (like MySQL, PostgresQL, etc). We can then execute all the available migration scripts and then have an exact copy of our required tables.

We can then use fixtures to send controlled data into that instance. That way, calling an API for e.g. all the available food recipes a user has, will return us the actual controlled data that we instructed our code to insert into the Database.

If you think about it, test fakes are basically setting up a quick Database to temporarily write to and once the test is finished, that Database can be destroyed. I have to admit it took me a while to get comfortable with that concept but now it sounds that simple. Maybe the fact that it touches multiple areas all together is something that makes it look a bit more terrifying. However, like everything in programming, at the beginning it may look scary but after doing that a few times, we get used to it and see the real value it provides.

Making integration tests easy to deal with, really depends on the setup of our system. If it is that painful to set them up, that usually is a sign that we need to refactor and / or simplify certain parts of our system.

Interaction tests

Interaction tests can be part of either unit or integration tests. They are mainly designed to test how a function is called without calling its implementation or relying on its result. Some common examples you might have already seen with jest is properties like toHaveBeenCalled or toHavebeenCalledWith(x, y). Such tests can be quite useful if we want to test concepts like if a user resets their password, the function for sending an email with the new password setup link is called with user's email.

Unlike unit tests, for integration tests it makes sense to set them up to be testing multiple things in each one of them. I know that some developers may disagree with multi scoped tests but I think that it saves a lot of time and lines of code as the majority of the times the testing scenarios are the same but we just change the target of the test.

What I really like about integration tests is that when releasing, amongst other tests, they give us the highest amount of confidence that if something is about to break, that should appear here. That is because they touch a decent amount of code, are not that slow and with that combination, they can help us spot most of the bugs that can occur.

End to end tests

We thought of the bricks and tiles of the room, we thought of the different rooms and other groups of units but we haven't thought of how we are going to make sure that they all glue together properly. Now it's time to worry about the entirety of the building. What rooms does each floor have? How do we move between floors? Is it a good idea to add a gym on floor 4 where directly underneath it there is a library?

Systems usually have several microservices, 3rd party apps and more that talk to each other in order to achieve a certain goal. Imagine how many different layers the code visits every time we want to register to a service, login or complete any other full journey. This is what the e2e tests are designed for.

At the end of the day every line of code that we write has one and only one purpose, which is to solve end users problems. Testing these entire journeys users take in order to perform an action is what e2e tests are all about. However unlike unit and integration tests, e2e tests can be really slow in comparison as they are supposed to be talking to the real services and not mock that much. They need to talk to the real database, go through the entirety of our microservices so that we feel confident that everything works well. However e2e tests are prone to network connection issues that may occur that will cause the test to fail.

For these reasons e2e are usually not running that often comparing to unit test and integration tests. We want the development flow to be as fast as possible and sadly e2e tests sometimes may be a blocker. That is why e2e tests are common to run pre deploying to an environment to even furtherly ensure we are not going to break.

Moral of the story

We can all agree that releasing "bug free" code is a bit of mission impossible but that does not mean that we are just going to give up like that. Sooner or later, all developers will realise that the most important part of our jobs is to focus on solving problems. Adding quality to our code is a really decent way of proving that we know what we're doing. Then, having the confidence to say that code that we release is not going to introduce new problems to our system is the way going forward.

Understanding the test pyramid and making use of it is not just another trendy topic that we need to learn because it is asked in interviews. It is a fundamental concept that will help us better debug problems when dealing with the entire stack of our system. Most importantly it is one of the things that knowing how to deal with is great to level up our skills as developers.

Feel free to contact me with any questions! Share this post with your friends and colleagues

Follow me on Twitter
Add me on LinkedIn