DEV Community

Unit Testing is Overrated

Oleksii Holub on July 07, 2020

The importance of testing in modern software development is really hard to overstate. Delivering a successful product is not something you do once ...

Read full post

Boyen86 • Jul 8 '20 • Edited

Just a few points

Many of the issues you describe with tests are actually issues with mocking. Providing a proper test implementation is a better way of resolving that than skipping unit tests all together.
Programming on interfaces is good design, it's not there because you want to write mocks, you should always depend on an abstraction instead of an implementation. Adding an interface decreases the complexity for everyone that is not interested in the actual implementation. When you are adding an interface because you want to mock I'd say you messed up somewhere in your design. Why do you have a class specifically tailored for interaction with an outside source without an interface? You do not want software to depend on this implementation - you are writing tightly coupled software this way.
The goal of unit tests is design of your software, you're writing a contract, as soon as you assume that is for testing your software you misunderstood its purpose. As a contract, you want to ensure that a class behaves as the developer intended it to behave. While developing this ensures less bugs, as you double check that what you write actually does what you expect it to do. This also ensures that other developers that might be working on your software while you are long gone understand what you wanted to achieve, why, and can make changes to your software without breaking its contract.
Skipping the foundation of the test pyramid will set you up for many low level bugs.

Oleksii Holub • Jul 8 '20

Thanks for the comment.

Many of the issues you describe with tests are actually issues with mocking. Providing a proper test implementation is a better way of resolving that than skipping unit tests all together.

That's correct. It's related however because mocks are too often necessary to achieve isolation required for proper unit testing. Unfortunately it's not always possible to flatten the hierarchy or use pure-impure segregation principles to avoid it.

Programming on interfaces is good design, it's not there because you want to write mocks, you should always depend on an abstraction instead of an implementation. Adding an interface decreases the complexity for everyone that is not interested in the actual implementation. When you are adding an interface because you want to mock I'd say you messed up somewhere in your design. Why do you have a class specifically tailored for interaction with an outside source without an interface? You do not want software to depend on this implementation - you are writing tightly coupled software this way.

I personally don't agree with this, especially that you should "always depend on an abstraction instead of an implementation". If your use case doesn't envision polymorphism and your abstraction is there "just in case", you've essentially wasted effort. There's nothing wrong with coupling if that's intentional, not all coupling should be avoided just because you can. In fact, most of your interfaces are still coupled to implementations in ways you may not realize until you try to introduce a second implementation.

The goal of unit tests is design of your software, you're writing a contract, as soon as you assume that is for testing your software you misunderstood its purpose. As a contract, you want to ensure that a class behaves as the developer intended it to behave. While developing this ensures less bugs, as you double check that what you write actually does what you expect it to do. This also ensures that other developers that might be working on your software while you are long gone understand what you wanted to achieve, why, and can make changes to your software without breaking its contract.

If the goal of unit test is to aid in design then I would argue the name is misleading and rightfully gets people confused. I personally don't believe it helps with design, but if it helps you then by all means. However, if your goal is to ensure that your software works, then maybe you want to re-evaluate your approaches. From a high-level perspective, if your software works correctly according to the functional requirements, there might be a million bugs in your code that ultimately don't matter because they never surface in any way that would impact user experience. Instead, by not relying on internal specifics, you get the freedom to change and refactor your code however you want, as long as it doesn't invalidate the top-level public contract.

Skipping the foundation of the test pyramid will set you up for many low level bugs.

What do you mean by low level bugs? Again, I would argue that if the bug never surfaces to the top level, it was never a bug to begin with.

Dave • Jul 8 '20 • Edited

To add to the thread here - I think I'm somewhere between the two of you.

Yes, SOLID is a good principle to follow, but only developers working alone in their bedroom can stick to it rigidly. There are times when the business needs outweigh the beauty of code.

On the point of interfaces... if I'm sharing some class with another project to interact with, yes, an interface is what gets shared. If I'm only using that class internally, and there's only 1 example of it's behaviour, then an interface is (to me) a waste of time (YAGNI).

Over time, as the project evolves, if I need something similar but not quite the same... asking the IDE to create an interface is a single key combination. Refactoring away from the concrete implementation to reference the interface instead is another, key combination. (That is to say, it's not difficult to add an interface if you need one later).

On the point of unit tests - there's a reason that they're at the bottom of the pyramid - if the foundations of the pyramid are incomplete, you risk the peak toppling over! The art there, is in figuring out what's appropriate test coverage at each level of the pyramid, and how you measure that.

In our corporate case, submitting for a peer review with no unit tests means that you have more work to do. Equally, submitting something that has 100% unit test coverage means you have some things to delete (nothing is more beautiful than deleting things!). The same point works for integration tests.

Our definitions:

A unit test is to test a unit of code. Class level. SRP applies, test the user journey's through the class so that outputs are the expected values. One test case per user journey. Mock out 3rd party dependencies & trust (but verify) that they work as intended.
An integration test is to test N units, operating in conjunction. Same rules for SRP & mocking. An integration test may very well test that two (or more) whole applications work in conjunction with each other. Or it may be two (or more) k8 pods, etc.

I would argue that if the bug never surfaces to the top level, it was never a bug to begin with.

Maybe it was simply a bug in some edge case that you didn't consider, and it exposes some sensitive information to an attacker? While it doesn't crash your application directly, it's definitely something that needs fixing... and since it's in an edge case that wasn't considered before, you wouldn't have seen it until a user found it.

Oleksii Holub • Jul 8 '20

Just out of curiosity:

Mock out 3rd party dependencies & trust (but verify) that they work as intended.

How do you both trust and verify?

Maybe it was simply a bug in some edge case that you didn't consider, and it exposes some sensitive information to an attacker? While it doesn't crash your application directly, it's definitely something that needs fixing... and since it's in an edge case that wasn't considered before, you wouldn't have seen it until a user found it.

But if you considered it when writing a unit test, couldn't you consider it when writing a high-level test? Or conversely, if you didn't consider it when writing high-level test, you could very well also not consider it when writing unit tests.

Dave • Jul 8 '20

How do you both trust and verify?

I saw your location & presumed you'd be familiar with the concept (a friend of mine moved from Kyiv to UK, it was him that I first heard the saying from). :)

In our case, at the unit test level, we simply trust that 3rd party dependencies function as their authors intend. There is a small review process that we go through before deciding to include a 3rd party dependency. Basically - if Apache Commons, go ahead, if it's some obscure Docker image on Docker Hub (non-certified) and only 4 other people have downloaded it... err... let's not touch that until it's more popular.

Then, at the integration test level (still in development), tests can be written to use the transport mechanism, or the file system etc (verifying that, for example, Commons IO or Gson dependencies actually do what we expect).

Later still, QA have tests (automated) that will inspect the model being transmitted across the "wire" etc - and they will flag up if we're exposing internal identifiers etc that another service (or the general public) don't explicitly need (all specified as part of the service design).

But if you considered it when writing a unit test, couldn't you consider it when writing a high-level test? Or conversely, if you didn't consider it when writing high-level test, you could very well also not consider it when writing unit tests.

Exactly.

I think my take there, is that it depends where you will fix the bug (with the benefit of hindsight), as to where you should be testing for it. If the bug is a simple "change this unit of code to fix it" then a unit test should be catching it (and this is one of the few use cases where TDD makes sense to me - I know the bug is in this code, so I'll write a test for it first, make the build fail, and then fix it).

However, if the bug is more subtle, and means that two (or more) units are working in unison to produce the problem further up (eg, A and B must be true, and both are in different units of code), then I'd write an integration test, and probably fix the issue(s) in the discrete unit(s). Then spend some time worrying about side effects & how we can potentially mitigate them.

My point being, unit tests are a necessary evil... but so are integration tests, QA tests (including manual testing) and in the majority of our cases, UAT too!

I agree that some companies/books/public speakers overly promote unit tests, but we certainly shouldn't be ignoring them entirely.

Oleksii Holub • Jul 8 '20

Makes sense, thanks.

I was actually familiar with the concept but was curious what exactly you meant by it ;)

Boyen86 • Jul 9 '20 • Edited

"I personally don't agree with this, especially that you should "always depend on an abstraction instead of an implementation". If your use case doesn't envision polymorphism and your abstraction is there "just in case", you've essentially wasted effort. There's nothing wrong with coupling if that's intentional, not all coupling should be avoided just because you can. In fact, most of your interfaces are still coupled to implementations in ways you may not realize until you try to introduce a second implementation."

I'd say your design process is just completely different than mine. When you are designing a class, you don't care about the implementation that you are communicating with. You don't create an interface because you want to introduce polymorphism, you create the interface because all you care about is "what" needs to happen, not "how" it needs to happen. Splitting what from how is absolutely essential when you want to create SOLID software. When you have proper separation of concerns, and your classes are single responsibility all you should care about is this interface, and thus, at that point in time, all you are writing is an interface.

What you propose is backwards you already have an implementation and then create an interface to start mocking. Honestly, it's not surprising why you dislike these tools for software design when you are following this path.

"What do you mean by low level bugs? Again, I would argue that if the bug never surfaces to the top level, it was never a bug to begin with."

High level tests do not test low level intricacies of a class. It can be something as simple as multiple enumerations because you forgot to do a .ToList() (or whatever) on a database query, causing you to perform the same query over and over again. Good luck finding that out on a high level integration test. You need to ensure in your design that what you have designed is actually doing what you expect it to do.

While you are writing this high level tests and you do want to go over all these low level intricacies you are holding a model of many classes (units) in your head. We write small units because complexity increases when the unit size increases. As such, the complexity of a test increases when you are increasing the scope of the test. So you are either

Only testing the user functionality
Writing really complicated tests to check all edge cases and functionalities of all lower level units

Or you just write your unit test while you are designing your software.

"From a high-level perspective, if your software works correctly according to the functional requirements, there might be a million bugs in your code that ultimately don't matter because they never surface in any way that would impact user experience. Instead, by not relying on internal specifics, you get the freedom to change and refactor your code however you want, as long as it doesn't invalidate the top-level public contract."

This is already an advantage of following SOLID standards, writing small units with a single purpose, that is easily exchangeable and reusable. Also, since you were programming against an interface to start with, the implementations don't matter.

Can I also mention that a million bugs in your software that "supposedly" don't surface because your integrations tests don't cover it can cost your organization a serious amount of money. I've been working in banking and offshore before my current job, downtime of half a day can easily cost you 100k, just because a developer didn't want to design the software properly.

Dave • Jul 9 '20

Splitting what from how is absolutely essential when you want to create SOLID software. When you have proper separation of concerns, and your classes are single responsibility all you should care about is this interface, and thus, at that point in time, all you are writing is an interface.

In a perfect world, you're right. My employer certainly doesn't exist in a perfect world though.

What you propose is backwards you already have an implementation and then create an interface to start mocking.

I presume you're a TDD advocate. I mostly write the implementation first then test it, but I wouldn't be creating an interface just to add mocks in tests. I also wouldn't be creating an interface if I only have one concrete implementation - since that implementation effectively works as the interface, until I need to abstract it in some way.

High level tests do not test low level intricacies of a class.

This is a rather large overstatement. Don't they? Why not? Is it impossible to write a high level test that invokes the low level intricacies? Do all of those low level intricacies need to be tested? I'm currently conducting interviews, and rejected one candidate in part because they were writing tests for getters/setters.

downtime of half a day can easily cost you 100k

That's pretty cheap based on the industries I've worked in. In some regulated markets, the fine issued by the government authority for simply having to failover to the DR datacentre exceeds 100k, let alone other ancillary costs like loss of income.

End of the day, there has to be a balance. I personally think the title of this article was a little click-biased, and the author was trying to simulate a discussion by portraying a pretty biased argument. Nothing wrong with that, but the way I read it, the author doesn't entirely believe everything that they've written (as evidenced by my comment discussion with them).

Boyen86 • Jul 9 '20 • Edited

"In a perfect world, you're right. My employer certainly doesn't exist in a perfect world though."

I'm not sure how this is relevant? We are just discussing how we are creating software. It's not as if it takes longer to create/maintain.

"I presume you're a TDD advocate. I mostly write the implementation first then test it, but I wouldn't be creating an interface just to add mocks in tests. I also wouldn't be creating an interface if I only have one concrete implementation - since that implementation effectively works as the interface, until I need to abstract it in some way."

I'm just wondering how the design process works. When you are writing class A and B, and A relies on B, but B is not yet written and you start with writing A, surely you'll program against the interface of B instead of its actual implementation? Anyway, that's how I do it. I will have an interface before an implementation 99.9999% of the time. I do not feel like a well defined interface is clogging up the code, for everyone that's not interested in the implementation it is an easy overview of the API.

And... even though I'd say it is irrelevant I'm neither an opponent nor advocate of TDD. In what order you write your tests or classes is for me an implementation detail. The interface here, however, is that your tests have a purpose in the design and maintenance of your code and that part is important.

"This is a rather large overstatement. Don't they? Why not? Is it impossible to write a high level test that invokes the low level intricacies? Do all of those low level intricacies need to be tested? I'm currently conducting interviews, and rejected one candidate in part because they were writing tests for getters/setters."

I do believe I gave some options in my post, and why you shouldn't be testing logic of low level classes on a high level (something with complexity)

Your tests should be SOLID just like your code base. As soon as you need to go over multiple aspects you are increasing the complexity of your code (test) and with that the readability. Just keep it simple is all that I'm advocating here.

You write tests for logic, if your getters and setters have logic... for whatever reason, I would surely want to test my logic while designing my class. If you are testing the framework of getters and setters I agree with you, but that honestly has nothing to do with with the intricacies (=logic) of the class that I am referring to.

Dave • Jul 9 '20

We are just discussing how we are creating software. It's not as if it takes longer to create/maintain.

We're deliberately staying away from languages, and maybe it's just my approach, but rigidly sticking to SOLID principles (or any principles for that matter) certainly does take longer than me writing code and then tidying it up to obey principles whenever that's needed.

Don't get me wrong, I follow SOLID closely, right out of the gate, but just not strictly.

When you are writing class A and B, and A relies on B, but B is not yet written and you start with writing A, surely you'll program against the interface of B instead of its actual implementation?

That approach is counter-intuitive, at least to me. If A depends on B, but B is not written yet... I'd be starting with writing B. Only in the case that B is being written by someone else on the team would we agree an interface up front so both can work independently.

Just keep it simple is all that I'm advocating here.

I'm much the same, hence why I originally posted here that I think I'm somewhere between you & the original author.

You write tests for logic, if your getters and setters have logic...

In that case, I'd submit that they aren't getters and setters, and have side-effects that violate SOLID principles.

Boyen86 • Jul 9 '20

"In that case, I'd submit that they aren't getters and setters, and have side-effects that violate SOLID principles."

Hence my ".... for whatever reason", there's more to logic than side effects. For example, a myriad of if statements or whatever. All things that don't belong in a getter or setter, but if you insist that it should be in the getter or setter, at least write a contract (=test) on how you intend the getter or setter should behave.

And that's the core of this whole discussion right? Is it necessary to test logic that isn't directly visible to the outside world?

You could say, if it isn't directly visible then it doesn't need to be tested. I'd say if the code is there, it is there for a reason, if it there for a reason it should be tested. If the code is not there for a reason, get rid of it. And all these questions would've been circumvented if your tests were written during the design of a class.

You can potentially test this in a big integration test, but, why would you? I'd say that's a violation of KISS principles because the coupling between a class and its contract is lost and I, as a developer working on your code need to jump through hoops just to find out how you intended your piece of software to work.

"We're deliberately staying away from languages, and maybe it's just my approach, but rigidly sticking to SOLID principles (or any principles for that matter) certainly does take longer than me writing code and then tidying it up to obey principles whenever that's needed."

Sure, perhaps, I don't think I'm necessarily faster than a two-step approach. And everything perfect in one go is utopic, sometimes it takes refactoring to get things right.

I do feel like it is our responsibility as software engineers to either convince our employer that a standardized approach is beneficial, and also that as an expert in the field, it is good to say no. I hope that writing standardized software doesn't only occur in a perfect world.

Dave • Jul 9 '20

I do feel like it is our responsibility as software engineers to either convince our employer that a standardized approach is beneficial, and also that as an expert in the field, it is good to say no.

On this, we both agree. I also know that I've been in situations in the past where arguing for standardisation has fallen on deaf ears.

There's a multitude of reasons why others in the business will try to get us to cut corners, to deliver slightly faster etc. Sometimes we can say no, sometimes we're overruled.

Hence my belief that 100% standardised code, does indeed only exist in a perfect world. Maybe 80% or so is a more realistic aim.

leob • Jul 8 '20 • Edited

I agree to an extent ... and as always, "it depends".

If your system/app is heavy on business logic, then unit tests (where you test pure functions, no side effects) may be useful.

It may then also pay off to rigorously separate the business logic ("pure functions") from the code that's exerting side effects, for instance using a 'functional core, imperative shell' architecture - unit tests then become easy to implement, without mocking and whatever (especially if you favor a more FP heavy style for that part).

However, in many apps/systems the business logic is pretty trivial, and the side effects are where it's at - in that case I think you should go for integration tests.

My experience with the "highest level" (e2e) tests is that they're often slow and brittle, repetitive, and tend to be not very precise and specific about WHICH piece of functionality they're testing. Could be me but I've had a lot of frustration with e2e tests.

So I'm not a huge fan of e2e, but I am of integration tests. Bottom line (for me at least): I think that in MANY cases integration tests are the sweet spot.

Oleksii Holub • Jul 8 '20

I think we're more in agreement than it seems actually.

I definitely agree with testing business logic directly and you can mix those tests in with the rest of your functional suite. In the example (2nd last part of the article) I actually show what you described, i.e. flattening the hierarchy by separating pure from impure concerns and avoiding mocks. Unfortunately, that's also not always possible, but that's another story.

I'm also not advocating to "always write e2e tests" (the article highlights this) but instead aim as high as you can while keeping the drawbacks at an acceptable level. As you pointed out, that sweet spot can often be sometimes in the middle of the spectrum.

leob • Jul 8 '20

You are right, we agree more than we disagree :-)

Mathew Onipe • Jul 8 '20

Thank you for this. I recently wrote a web server in golang that handles Auth request from dovecot checkpasswd plugin. I started out writing only unit tests but I began to question myself as the complexity of tests and code grew due to unnecessary mocks and dependency injection. I ended up writing integration tests by using docker to spin up a container that contains dependencies I needed for my tests. Then automate this via bash scripts to make it runnable on most CI tools.

This paid off and the value gain was immediately obvious.
I'm normally an SRE, so I can force docker and bash to do anything. This was a big advantage for me. I suspect some Devs might have trouble navigating docker etc to create integration environment.

But yes, I 100% agree with you.

Oleksii Holub • Jul 8 '20

For sure, I know too many devs who avoid anything that sounds remotely devops-y because it's "not their job".

Nicholas Finch • Jul 18 '20

I'm like 10 days late. But uh "further perpetuated" is what you're looking for. Not "further perpetrated"

And for the fun of it and my inane pickiness
Perpetrated: carry out or commit (a harmful, illegal, or immoral action).
Perpetuated: make (something, typically an undesirable situation or an unfounded belief) continue indefinitely.

Granted the two definitely sound similar. And in hindsight IDEs are perpetrators of perpetuating bad practices.

That's my ted talk thanks for listening.

Oleksii Holub • Jul 18 '20

Thanks. As a non-native speaker I'm always confused between the two.

Rob Waller • Jul 9 '20

An interesting article, you clearly know the topic of testing very well, more so than the great majority of developers.

I wouldn't be so sure though the principles of unit testing are well understood in our industry. In my experience most organisations have a very haphazard approach to testing, and this makes testing more difficult than it should be and seemingly of less value.

I think the main point is producing well designed code and proving it works is hard, and requires a lot of work. This is regardless of your testing strategy, which in my opinion needs to contain a good mix of low level and high level tests.

Žarko Đurić • Jul 7 '20

Great post! The most of what you have said sounds reasonable but I would say that business logic still benefits from unit test and should be 100% unit test covered

Michael Mroz • Jul 8 '20

You mention in a comment that these problems are largely with mocks and that mocks are often necessary.

That's where this all falls down. Mocks are only as necessary as your application design makes them. I can't remember the last time I had to mock something in a codebase I've had reasonable influence over.

If you embrace referential transparency and push side effects to modules at the edge of your application and use interfaces to test them, you just don't need mocks. The only thing you need then is a few integration tests to ensure that the effectful code at the edges interacts with the real world correctly.

Oleksii Holub • Jul 8 '20

This was shown in the article. Unfortunately, separating pure from impure code will be able to get you so far and you will still have to mock quite a few things. For example, think of a case where you need to query a small portion of a very big dataset, then perform some transformation on it, then use the result to query another portion, transform that one, and finally post the resulting data to some web service. Your pure code is interleaved with impure code and your best option is to test those parts separately, but that breaks the flow in which the code is executed and you won't have confidence that you tested the pure parts exhaustively, or in the fact that the side-effects are executed correctly. This approach works out for simple transformations that follow the [data in] -> [data out] principle, but unfortunately that's not always the case.

Michael Mroz • Jul 9 '20

Wow. Look, I don't doubt that this has been true for you, but you can't speak in absolutes like that and not expect opposition.

You're speaking from an experience informed by the codebases you've worked in. I assure you that there are other codebases out there for which your statements are categorically untrue.

Oisín • Jul 11 '20 • Edited

Great post. Although I do find test-driven development helpful as a guiding technique (when it seems appropriate to use TDD -- sometimes the cost in mocking/etc is simply too much), I tend not to do it in a way that requires excessive modularity and (mentally) costly abstractions that don't provide any benefit beyond testing. I try not to introduce abstractions and configurability until there is a concrete need for them.
As an indicator of code correctness, time and experience has shown me that unit tests are not great. They tend to only prove that a certain part of the codebase behaves correctly for a very restricted set of inputs. Often, we add unit tests only after observing an error, and the new test helps us to verify the error exists and, later, that we've solved it. The payoff for adding units tests in this way is not very high, and given how long it can take to specify unit tests for complex behaviour, it's not always worth that time. The large volume of tests also increases the cost of changing things (although it's fair to argue that it helps us make those changes with more confidence that we didn't break everything).

This is why I'm very skeptical of having fixed code coverage standards within teams. I find that they're often enthusiastically championed by junior developers who overestimate their value.

At the moment, I'm more interested in two tools that don't seem to have gained very widespread traction: property-based tests, and proof systems that allow you to specify invariants that are validated for all possible inputs by the compiler, rather than informally tested for a restricted set of inputs with unit tests.

Luke Harris • Jul 7 '20

Very detailed and well-written article!

amehmeto • Jul 8 '20

TL;DR?

Oleksii Holub • Jul 8 '20

At the end there's a summary

Charles Roth • Jul 9 '20

The takeaway points are good (well, except maybe for #4), and worth the read.

But I would argue that your initial thesis is largely a "straw-man" logical fallacy. You point at ways that people do unit-testing poorly (e.g. excessive mocking), and then conclude that unit-testing is "not worth it". Slavish adherence to a rule, instead of understanding what's behind the rule, will always produce crap. Doesn't mean the rule is wrong.

"Don't sacrifice design for testability"? Snort. I've been writing software since 1970, and IMHO unit-tests are the best damned thing that ever happened to design.

MxL Devs • Jul 8 '20

There's a summary at the bottom.

seankearon • Jul 7 '20

Well said!

Dave Flr • Jul 8 '20

I'll not read it, but I agree, whatever you say