DEV Community

Michael Streeter for LogDNA

Posted on

Street Smarts: The Testing Pyramid

At my first developer position, I lost my company a $6 million/yr contract within my first six months. Not for lack of trying: I was burning at least 70-80 hours a week, leaving my office at midnight, working weekends, working through holidays, and pretty much living on coffee.

Through all of that, I never once wrote an automated test. I didn't even know that was a thing you could do. So much of my time was spent just running the software, clicking on buttons, and waiting... waiting... waiting for results. It wasn't until I left that company, years later, did I find the importance of fast, reliable feedback in the form of well-written tests.

All the time I spent manually testing could have been spent writing tests. To that end, the journey of writing useful tests needs a map. And thus, the Testing Pyramid shall be our guide.

The Testing Pyramid

While not an original concept, a quick Google search will present the reader with a dozen varieties of Testing Pyramids. Some pyramids have more layers, some fewer, and some combine layers together. This is what I see in my head when I consider the matter:

Streeter's view of the testing pyramid, a flat hierarchy of concepts that is wide at the base and narrow at the pinnacle, with a set of metrics demonstrating flow along the pyramid. The pyramid itself has this order from the pinnacle to the base: manual tests, end to end tests, functional tests, integration tests, and unit tests. The pyramid has four metrics to the left and right, showing how the metrics rank over the different layers of the pyramid. Value (If the test passes, how confident are we in the feature?) is high at the pinnacle of the pyramid and drops to low by the time you reach the base. Depth (If the test breaks, how close to the surface is the error?) is shallow at the base of the pyramid and gets deeper as you move up the pyramid. Speed (How fast is the feedback loop?) is fast at the base and gets slower the higher up on the pyramid. Expense (How difficult is it to create, maintain, and execute a test suite?) is cheap at the base of the pyramid and gets more expensive as you go up the pyramid toward the pinnacle.

The width of the layers is an indication of how many tests should be written (not to scale). Unit tests can range from hundreds to tens of thousands, while end-to-end (e2e) tests should probably top out at 1-5% of your unit test count.

The height has four different metrics I consider:

  • Value: If a test passes, how confident should we be in deploying to customers? High is better than low... obviously.
  • Depth: If a test fails, how close to the "surface" of the tests is the failure possibly located? Shallow is easier to debug than deep.
  • Speed: How fast do these tests run? More importantly, how quickly do you receive feedback? Faster feedback is better.
  • Expense: How expensive are these tests to create and maintain? Cheaper is better.

Let's break down the layers and discuss them with these 4 metrics in mind.

Unit Tests

Unit tests are a fantastic tool and the very foundation for any application. If the code is a garden, then a unit test is a single flower examined under a glass case. While the lowest value individually, they come fast, cheap, and by far the easiest to debug in case of a failure.

  • Value: Low. A robust test suite could easily have hundreds or thousands of unit tests before a superior level of confidence is reached.
  • Depth: Shallow. A failure is usually isolated to a single method or file.
  • Speed: Fastest. In some languages, some test runners can burn through over 1,000 unit tests a second. One Elixir project I contributed to had 20,000+ unit tests running at just under 8 seconds.
  • Expense: Cheap. Minus some boilerplate, a unit test is often a single line.

Integration Tests

Integration tests are the next step up the pyramid. Their value is increased by testing the connections between individual elements.

  • Value: Medium. The value of knowing internal elements are correctly wired together is just as important as the individual elements working in isolation.
  • Depth: Medium. A failure should be isolated internally to a project or repo.
  • Speed: Fast. The difference in speed usually comes from setup. If you've ever used Java and Spring together, the setup and teardown calls can add an extra second to each test file.
  • Expense: Kinda Cheap. Again, the setup for an integration test is usually a little more extensive. Changing a requirement can cause several test files to be refactored.

Functional Tests

A functional test should be interacting with your code from the outside. For front-end, that means using a browser and triggering clicks on specific elements. A good functional suite will still control the environment and data while also mocking out third-party services.

  • Value: High. It's hard to argue with a well-written functional suite. Imitating how an outside force would interact given proper conditions has a high value of confidence.
  • Depth: High. Unfortunately, a failure could come from almost anywhere in your app. This makes investigating failures take much longer than the lower level tests.
  • Speed: Slow. These tests often rely on browsers' interactions, CDNs, and turning caching off to be certain. Add in cross-browser testing, and you can significantly extend the time to completion. Depending on how many functional tests you have, it could take 30 minutes to several hours, easily reducing the feedback loop to once a day.
  • Expense: Medium-High. A functional suite can quickly get out of hand. A small change to a UI element can force a lot of changes. With a poorly formed functional suite (a suite with many false positives or false negatives), teams can lose faith and build a resistance to maintaining or expanding it.

End-to-End (e2e)

An e2e test differs slightly from a functional one by adding third-party services into the mix. At this point, nothing is mocked out. Everything is live&emdash;no-holds barred, bare-knuckle boxing at its finest.

  • Value: Very High. This is as real as it gets in an automated sense. Not only can you trust your application's behavior but also the connections to outside actors.
  • Depth: Very High. Failures can be completely outside of your control. This can add a lot of time to investigation, especially if the third-party services are less reliable.
  • Speed: Slow - Slowest. When adding third-party into the mix, often there are restraints on how often or when a ne2e test suite can run. This could force a once-a-week or once-a-release run cycle, drastically reducing the feedback.
  • Expense: High. Same as functional.

Manual

Manual testing is a necessary evil. Value is high because, well, you literally tried it out! However, it's very limited. The cost is measured by engineering wages, and the procedure can differ wildly between individuals. Plus, it's completely manual!

  • Value: Very High. Seeing is believing.
  • Depth: Very High. Same as e2e. Failures could be anywhere.
  • Speed: Manual. The feedback loop is relatively fast for a single test, but the tester is locked into the process for the duration.
  • Expense: Highest. Monetarily, an engineer costs several times more an hour than an EC2 cluster. Predictability relies heavily on both good procedure documentation and good discipline.

Cool! ... So what?

The Testing Pyramid is more of a guideline than an actual map. It doesn't dictate the number or ratio of tests, describe how to write them, or do much else. It's a mental model of what a robust suite of tests could look like, with a balance of value, depth, speed, and expense.

I like to think it can help you answer a few questions:

  • I have no tests, what should I do?
    • Like any good structure, start with a foundation.
  • All I have are unit tests, what now?
    • Start building out a few tests with higher value.
  • I spend all day manually testing, HALP!?
    • You're too top-heavy! Automate those manual tasks.

You get the picture.

Conclusion

Testing is part of the developer craft I wish I had learned much earlier in my career. Would it have saved me from my first big blunder? Probably not. There were plenty of other factors that contributed to that. It could have drastically changed how I received feedback during the development, which would have been nice. I hope this explanation really helps everyone else in starting to understand the Testing Pyramid as I see it.

Top comments (0)