Effective Software Testing – A Developer’s Guide

#programming #testing #book #bookreview

I recently finished Effective Software Testing – A Developer’s Guide by Maurício Aniche, and I really liked it. I have been coding for a long time and I think I have been writing pretty good tests for the features I have implemented. Even so, I found this book quite valuable. Particularly the chapters on how to systematically come up with test cases based on the specification, inputs, outputs and the structure of the implementation.

The book also covers many other common topics relevant for developers writing automatic tests, such as: test-driven development, mocking, designing for testability, and property-based testing. The author does a good job describing these. I especially like the code examples – they are larger than the most basic cases, but still small enough to easily keep in your head.

The author is an Assistant Professor in Software Engineering at the Delft University of Technology. He has also worked for several years as a developer. The book apparently grew out of lecture notes from a course on software testing. The academic background shows in that there are plenty of references to relevant research (something I also liked with Code Complete).

Most Interesting Chapters

The two chapters on how to systematically come up with test cases were the most interesting to me. This is something I have been doing in a more ad-hoc manner before, so it was nice with a clear and thorough description.

Specification-Based Testing

The author breaks down how to come up with test cases into a seven step process. It starts with understanding what the program is supposed to do, and to identify the types and domains of the inputs and outputs. Some inputs are equivalent – they result in the same path through the program, even if the values are different. For example, if there is an input string for a name, it may be that both the name Alice and the name Bob are handled pretty much the same. Thus these are equivalent, and both inputs belong to the same partition (or class) of input. On the other hand, and empty string may lead to a different execution path (because maybe empty names are not allowed), so this would be another partition.

The idea is to find all the different partitions, in order to test the complete behavior of the program. A systematic way of doing this is to go through each input individually to find all the possible partitions of it. For example, for a string that might be null, empty, string of length 1, string of length > 1. Next you go through each input in combination with the other inputs. Frequently there are dependencies between different inputs. For example, if there are two string inputs, what if both are null at the same time? What if both have values at the same time? And so on. Finally, look at all the possible outputs. For example, if the output is an array of strings, consider: null, empty array, single item, multiple item. Then for each individual string in the array: empty, single character, multiple characters. Ideally, finding all input partitions should generate all output partitions. However, considering the different output partitions could help you find input cases that you missed.

The next step is finding the boundaries between partitions, since “bugs love boundaries“. If there is a condition for x < 10, there is a boundary where values up to 9 belong to one partition, and values 10 and higher belong to another partition. Whenever there is a boundary, there should be two tests. One that tests the value on the boundary (on point), and one that tests the closest value not belonging to the same partition (off point). If there is an equality rather than a less than, then there are boundaries on both sides, which means there should be three tests. Also, note that boundaries don’t have to be between variable values. If an empty list is treated differently from one with elements in it, that is also a boundary.

Now comes the test case generation. Generating all possible combinations of partitions quickly lead to a combinatorial explosion. Therefore, it is necessary to combine some tests. It can be helpful to look at the implementation to see which cases can be combined. For example, it is common with checks for exceptional cases early. If a null value for an input is checked early, there is no need to combine that null input with every other combination of the other inputs. It is enough with one test case when that input is null. Once the different cases that need to be tested have been identified, writing the actual tests is mostly a mechanical task.

The final step in this methodology is to “augment the test suite with creativity and experience“. I like this! Basically, see if you can come up with interesting variations of the existing test cases. For example, if a string is used, maybe add a case where there is a white space in it.

In the book, there is a great example using substringsBetween() that has a lot more details.

Structural Testing and Code Coverage

We are not done just because we have come up with all the test cases we can think of from the specification based testing. The next step is look at the code coverage of the source code to see if there are parts that have not been executed. This can help us uncover cases we have missed. There can also be implementation details that will not show up in the functional specification, but still need to be tested.

The goal here is not to reach 100% test coverage. Instead, it is a tool to highlight which parts of the code have not been exercised, in order to analyze why that is. Maybe it is something that can’t or shouldn’t be tested, or is not worth the cost to test. If so, no extra test is needed. But sometimes there are missed cases, and the code coverage helps us to find those cases.

This leads to the different coverage criteria. Line coverage means that a given line has been run through at least once from a test case. For branch coverage , if a line has a decision point (if, for, while), then each case must have been executed at least once. When there are conditions with multiple criteria, for example if (a && b), condition + branch coverage means that each individual condition (a, b) has been true and false at least once, and the entire branch has been true and false at least once. Finally, path coverage means that all possible paths of the program have been executed. This quickly gets intractable, as the number of paths grow exponentially with the number of conditions. Furthermore, if there are loops, they may iterate hundreds of times each. So, aiming for path coverage is not practical.

When there are decisions depending on multiple conditions (i.e. complex if-statements), it is possible to get decent bug detection without having to test all possible combinations of conditions. Modified condition/decisions coverage (MC/DC) exercises each condition so that it, independently of all the other conditions, affects the outcome of the entire decision. In other words, every possible condition of each parameter must influence the outcome at least once. The author does a good job of showing how this is done with an example.

So given that you can check the code coverage, you must decide how rigorous you want to be when covering decision points, and crate test cases for that. The concept of boundary points is useful here. For a loop, it is reasonable to at least test when it executes zero, one and many times.

It can seem like it should be enough to just do structural testing, and not bother with specification based testing, since structural testing makes sure all the code is covered. However, this is not true. Analyzing the requirements can lead to more test cases than simply checking coverage. For example, if results are added to a list, a test case adding one element will cover all the code. However, based on the specification it may be important to test that it works with multiple elements in the list as well. Therefore, structural testing should be used as a complement to specification based testing.

This chapter ends with a section on mutation testing. The idea here is to systematically change a lot of small details in the program, then run all the test cases, and make sure the changes are detected by a failing test. The changes are performed automatically by the mutation testing tools, and can for example be: changing a <= to a <, decrementing instead of incrementing a variable, changing a plus to a minus, or replacing a boolean variable with true. While mutation testing can make sure the tests cover every part of the code, it typically takes a long time to run. An example of a mutation testing framework is Mutmut for Python, written by a former colleague of mine (hello Anders!).

Other Chapters

Property based testing. For property based testing, you express a property that should hold for your code, and then a framework generates random test data and checks that the property always holds. If a failing case is discovered, it is automatically reduced to the smallest possible example. The examples I have read of this have always been quite small. In this chapter there are several examples, where some are a bit more substantial. They give a good understanding of how to do it, as well as showing the challenges. My problem with property based testing has always been finding cases where I can describe a property that is not trivial, and where the benefits of those tests outweigh the effort of specifying them. I have used randomly generated tests to very good effect before, but always on complete systems (like generating random calls between phones), never as property based tests.

Test doubles and mocks. This chapter starts by defining dummies, fakes, stubs, spies and mocks. It then gives examples with typical business code with dependencies and side effects, and how using a mocking framework can help with the testing. There is also a good discussion on the pros and cons of mocking.

Designing for testability. The key idea in this chapter is to separate infrastructure code from domain code. This is easier said than done, but it is an idea that has been expressed by many others as well, for example in Clean Architecture. One way of describing how to do this is with Hexagonal Architecture (or Ports and Adapters), which is described well in this chapter. I had heard about that before, but never read up on it. A key technique to achieve this is using dependency injection. You also need your code to be observable. The author shows examples of how to improve the observability by changing the production code to make testing easier. I am all in favor of this!

Test-driven development. This chapter uses the example of converting Roman numerals to integers to show how TDD works. After presenting what it is, the author notes that even though he uses TDD a lot, he does not use it all the time. Not all situations benefit from TDD. This agrees with my own experience, something I wrote about in When TDD Is Not a Good Fit. There is also a discussion on what research studies have to say about the effectiveness of TDD. Empirical research does not find clear benefits from TDD. Some research suggests that observed benefits of TDD may not be so much due to writing the tests first, but instead due to the process of fine-grained, steady steps that improve focus and flow (tests first or not).

There are also chapters on larger tests (using a database), design by contract , and test code quality.

Missing

Exploratory testing. The focus of the book is on automatic tests when developing code. However, I would have liked to have a chapter on exploratory testing as well. It gets mentioned a couple of times, but that’s it. I always do a bit of exploratory testing after I am done developing the code and the automatic tests. Focusing on the complete application, instead of parts of it, gives a different perspective that has helped me find bugs that my automatic tests did not find. I have written about it in Is Manual Testing Needed?, and I recently spoke about it at the Jfokus conference in Stockholm. Fortunately, there is a really good book on the subject called Explore It! and I recommend it as a complement to this book.

Naming tests. Coming up with a consistent and clear naming scheme for tests is surprisingly hard. A few thoughts on that would also have been interesting to read.

Odds and Ends

Some other small observations:

The author works in “developer mode”, then switches to “tester mode” when testing. I do this as well.
The code examples are in Java, and have plenty of “blurbs” that further explain the code. This makes the code even easier to understand.
The author, like me, is against using only one assert per test. Frequently it makes sense to use many.
Each chapter ends with exercises (all with answers at the back), which make you engage more with the material.
“I am not afraid of purposefully introducing a bug in the code, running the tests, and seeing them red (and then reverting the bug)” – this is something I find useful too.

Conclusion

My philosophy is that as a developer, it is not only my responsibility to develop a feature. I also have to convince myself that it works as intended before I can say that I am done. This book is a great resource on how to write good tests that convince me that my code works.

Effective Software Testing is very well put together. I particularly like that the there are many well-chosen examples that highlight the concepts without being simplistic. The whole book is very pragmatic, and we benefit from the author’s experience as both a teacher, a researcher and a practitioner.