Pato Z

Posted on Nov 8, 2021

The testing genie

#testing #codequality #story #wish

After years of hard work you decide to take a well deserved vacation. You book tickets to a tropical paradise with long quiet beaches, white sand and clear blue water.

One the second day of your vacation, as you're strolling down the beach, you see a glint of metal in the distance. There's something buried in the sand, you think. Maybe it's treasure or an artifact of pure evil... only one way to find out!

So you rush to the bright metal thing and sure enough you find an ancient metal oil lamp. Countless childhood stories prepared us well for this type of situation, you think, and you proceed to polish it with your sandy towel.

In a matter of seconds a creature of pure elemental essence appears in front of you, materializing out of nowhere.

"Howdy!", it says.

"Ehm, hi...", you reply, uncertain, "who...what are you?"

"Oh, I'm...

The testing genie

"The testing genie", you repeat slowly, "what kind of thing is that?!?"

"Ha, I'm the one that will fulfill all your testing wishes, as long as those are three", it replies, matter-of-factly.

"All my testing wishes...", you say out loud absentmindedly, thinking about the infinite choices ahead of you. This is too important, you need some sort of guide in order to ask for the right thing.

The lidless eye

"I got it!" you shout, basking in this epiphany you just had, "You know the testing pyramid? We should use that!"

"Is that the ominous looking pyramid with a lidless eye in the back of the dollar bill?" it asks with the slightest hint of sarcasm.

"What? no! I'm talking about the testing pyramid, you know, unit tests at the bottom, integration on top of that..." you reply patiently.

"Oh, the pyramid, the one that mandates without a shadow of doubt the only way of doing proper testing, regardless of the context. The one that prescribes the right way of doing things, so that we avoid pointless discussions. That pyramid...", it says cracking a sly smile and rubbing its hands, "this is going to be fun!"

For a split second there you're sure you saw a glint of evil in its eyes, but you can't be sure so you just power through, this is too good to second guess yourself.

First wish: full coverage

"I want unit tests, but not just that, I want full coverage!" you shout excited.

"So be it", it replies as it vanishes in a puff of smoke.

You look down at your hands and what you thought was a lamp is just a rock.

You dismiss this whole episode as a figment of your imagination. Maybe yesterday's cocktail party did pack a punch after all.

A strange gift

The rest of your vacation is pretty much uneventful and a few days later you're back at home, energized and ready to work.

You blow the dust collecting on your laptop, open it and pull the latest changes from your repo.

Much to your surprise there's a suspicious commit that was pushed directly to your main branch.

The commit was authored by a user id you've never seen before, someone call dj-1nn. A quick peek at the commit reveals 60 thousand files added to your testing directory.

Still shocked you spin up your test driver, setup code coverage and let it run.

Shortly after you get the results. You stare at the screen in disbelief for a couple of minutes, the results show:

60000 scenarios (60000 passed)

========= Coverage summary =========
Statements   : 100%
Branches     : 100%
Functions    : 100%
Lines        : 100%
====================================

Wow, this is amazing! Full coverage and all you had to do was wish for it. If only you had known this years ago.

From now on you make the resolution of always rubbing any metal object you find buried in the sand. Who knows, maybe next time you'll find the bat-shaped genie of abstraction.

The price of hubris

Back to the code, you decide that full coverage means bugs are a thing of the past.

How could you possibly have bugs when every single piece of your code is covered by tests?

Emboldened by this protective shield of awesome you decide any other form of testing is obsolete. Why even bother testing locally?

So you code some changes, run the tests, of course they pass, you rockstar you. And then you blindly push changes to production.

A couple of hours later the customer support people are knocking at your door. They don't seem happy and they carry torches and pitchforks.

You open the door in shock, still wondering what could possibly go wrong in this testing utopia.

It turns out they are flooded with bug reports and production is on fire. You promptly don your firefighting attire.

With a bunch of skillfully crafted git reverts you manage to put out the fires.

The aftermath

After you mange to kick all those support folk out of your home, you decide to dig deeper into what's going on.

You pull up that mysterious commit and inspect some tests at random.

You quickly realize what's going on, even though the tests do cover the code, they have absolutely no assertions!

You could change pretty much anything and the tests would still pass. You suspect you could even remove a random bunch of code and the tests would also pass.

You decide to actually check this, you open a file at random flip a bunch of ifs, delete some loops and run the tests.

The results speak for themselves:

60000 scenarios (60000 passed)

You can feel your temperature rising. 60k tests, no assertions. How long will it take you to write proper assertions?

You run the numbers in your head, 60k tests, about 3 or 4 assertions per test... there's

Only one possible solution

After a long wait at the airport, a busy flight and a hot and humid taxi you're back at the beach looking left and right for that cursed lamp.

Eventually you manage to find it and rub it angrily.

"Howdy!", the genie says cheerfully, "back so soon?"

"Sixty thousand unit tests, and no assertions?!?!" you shout at the top of your lungs.

Other people look at you shouting at the emptiness in front of you and dismiss you as yet another crazy sand dweller.

"Well...", it replies, "you didn't mention anything about assertions".

"I-want-assertions!", you fall to your knees and repeat sobbing, "I just wanted assertions *sob*".

"So be it", it replies as it vanishes again in a puff of smoke.

By now you know the drill, lamp becomes rock, etc. So you go back to your hotel.

Second wish: all those snapshots

This time you brought your laptop with you, so you hastily sign in and git pull the latest changes.

As expected there's a new dj-1nn commit, this one changes the 60k files in your testing directory and adds 60k files with snapshots for every single scenario.

Apparently the genie went for snapshot testing. Surely those snapshots are the ultimate form of assertion, they check absolutely everything.

Less than 100ms since you had that thought you get pinged. Looks like there's a bug in production.

A picture of insanity

"What now??!?!" you scream at no one in particular. You pull the bug report and it looks legit. After a few minutes you can even repro locally.

Full coverage and snapshot assertions for everything, what is going on?!?!?

You have to postpone digging into that conundrum so that you can focus on the bug fix at hand.

You quickly spot the problem and correct it. You're about to commit and decide to run the tests:

60000 scenarios (4 failed, 59996 passed)

You were just fixing a bug but it looks like you broke 4 tests in the process, this is going to be a long day.

Watching the watchmen

You decide to open the four failing tests.

It only takes a cursory glance at them to see that there's no rhyme or reason to those tests, hundreds of lines of mocks, stubs and test doubles, functions being called to produce intermediate values that are used further along the test. A beautiful example of copy/paste in its prime.

At least you can check snapshots and see if you can work your way back to what failed.

You open one such snapshot and your editor for a millisecond considers opening up a hex editor, until it realizes the file is actually a humongous and incomprehensible JSON file.

You are running out of options and out of time so you decide your code is sound. You delete the four failing snapshots and re-create them with the current values. Push to prod and get it over with.

Not everything is broken

A couple of minutes later you get a new bug report. You take a look and all signs point to your last fix. It seems one of those 4 broken tests was actually testing something you shouldn't have broken.

You feel you don't have enough hands to face-palm yourself.

You quickly fix the code, delete and recreate the broken snapshots and push.

Everything is quiet for a while so you decide it's time to look into this mess.

Thunder and lightning

You know there's lots of things going wrong here and frankly you don't know where to start.

On one hand, these snapshots turned out to be a big disappointment. They do assert everything, but in a way that is very hard to process for humans. You are tempted to just re-create them when they fail, and that leads to bugs.

But why does it lead to bugs?

You ponder about this for a while and come up with two reasons.

The first one is that whenever you change something a test fails, but it's hard to tell if a failing test is the intended consequence of your change or you just created a bug. Therefore it's hard for you to know if you should fix the code or the test.

In the end you're left with having to wade through hundreds of lines of incomprehensible and copy/pasted code, trying to piece together the functional meaning of each test.

The second reason is equally worrying. The genie produced 60k tests. But the source of truth for those tests was the code itself and not the functional specification.

This means that if the code had bugs before the tests, those bugs are still present in the code, but now we have tests asserting that the bugs remain in the code... forever.

A thunder cracks in the distance just as you think of that last word.

Withering code

As if this whole mess wasn't enough you'll soon learn that a greater evil lurks beneath the surface.

This situation has been stressful so you decide to indulge in one of the greatest things software development has to offer: refactor.

Just take some so-so code and make it great, patch leaking abstractions, inline coupled functions, extract and compose.

You pick an innocently looking piece of code and dive right into cathartic refactor.

It doesn't take long for you to realize this won't end well.

Every time you inline a function, dozens of tests start failing because of a missing function. Moreover you're not super sure if the calling function is actually being tested for those scenarios you just inlined. In other words it's unclear if you should just delete the failing tests or adapt them to test them through the calling function (potentially adding a ton of boilerplate).

Every time you extract a function you wonder if you should be writing specific tests for that function, considering those tests already exist in the calling function. And by the way, what should you do with those tests in the calling function, should you keep them there or just delete them?

Not to mention that changing a function signature requires you to change every single test for that function, potentially adding even more mocking, stubbing and nonsense.

You just realized that your code is impossible to refactor. And you know very well that a code without refactor will inevitably wither into readonlyness.

An existential crisis

"But, wait, hold on for a moment", you think.

"Why am I even bothering testing functions?"

"Who cares if this function over here returns 1 or true?"

"I only care if that change produces unintended behavior in my product"

"That function..., that function is not my product"

The clouds part and a ray of light shines through your hotel window, bounces on the laptop screen and nearly blinds you.

You realize you've been approaching this from the wrong angle. You should've known that that pyramid was trouble, specially since it kept calling you to Mount Doom.

You run a quick search for "unit test" and realize no one said "unit" meant "function". Yet another facepalm epiphany.

What if "unit" meant "use-case"? Could you just test that?

It would certainly feel more honest with your true intentions than testing every single function just because.

Returning sanity

Well, you've certainly made some progress. By choosing the right "unit" in your unit tests you now can derive some useful rules:

If you add a feature, you add tests, because that's a new use-case and you test use-cases.
If you change a feature, you change tests, because some behavior is changing and it's OK that tests asserting the old behavior should change as well. (Beware that other tests unrelated to your change should not fail nor be "fixed"!)
If you remove a feature, you remove tests, because that use-case no longer exists.
If you refactor, you don't touch the tests, you cannot add, change or remove tests, because you are not changing use-cases or otherwise affecting product behavior.

But what about our problem of how difficult it was to tell test meaning?

The Ultimate Question of Test, the Universe, and Everything

Just by not testing functions but rather use-cases this should be much easier, you decide.

But there's the issue with quality in your tests, all that copy/paste and mocking.

If you could somehow separate the boilerplate from the tests themselves, have a suite of high-level functions that describe actions with business intent, then you could just write functionally-meaningful tests with almost no syntactic noise.

You're getting hungry at this point so you decide to ransack the minibar. Luckily the only thing you manage to find is a curiously-looking jar of pickles.

A working proxy

In the midst of a brine ecstasy, you start doubting everything. What does code coverage even mean?

If you are not testing functions but use-cases, does that mean code coverage no longer makes sense?

You decide code coverage was never the end goal, what you really wanted was functional coverage. You wanted to make sure every single use-case is fully covered.

The problem with functional coverage is that it's very hard to measure, so code coverage will have to do for now, but only as a proxy for functional coverage.

Last wish: functional, not functions

You have a much clearer picture now, you know exactly what to wish for. This time you'll get it right.

You go to bed tired but resolute.

The next day you wake up early, take a cold shower, carelessly enjoy a continental breakfast and go for a walk.

As you're strolling down the beach, you see a glint of metal in the distance.

So you rush to the bright metal thing and sure enough you find an ancient metal oil lamp.

You are about to polish it smugly when it dawns on you. Some things are too good to be true. Some horses must be looked in the mouth.

You'll have to deal with this problem head on, and it all ends where it started.

You polish the lamp one last time...

"Howdy!", the genie says, "ready for your last wish?"

"I wish..., I wish for you to get back in the lamp" you say resisting the temptation of the last wish.

"What?!? Nooooooo....", its voice trails off as it vanishes for the last time.

You promptly throw the lamp in the ocean.

A vision of the Apocalypse

You realized that after the third wish the testing genie would've gone free.

You don't dare imagine a world where the genie roams free, wreaking havoc, leaving unmaintainable tests and useless snapshots in its wake.

A world where code ultimately withers into readonlyness, where refactor is a thing of the past and bugs roam free like cheerful kaiju meeting for tea in the ruins of civilization.

A note to a future self

With an apocalyptic crisis adverted, you decide human memory is a fragile thing and write a note to a future self:

Dear Me,
It's me, I mean you, but from the past, you know what I mean, right?

Anyway, I just want you to remember of the time where you almost destroyed the whole universe. So that you don't forget, I made you the following list:

Test stuff, testing is important.

Code coverage is only as good as the assertions on your tests, you can have great coverage (even full coverage) and still have bugs.

You cannot cover unwritten code, beware of bugs by omission.

Tests are not crash test dummies, they are not meant to break constantly, fixing tests is dangerous, don't make a habit of fixing tests. If you develop muscle memory and start fixing tests without even thinking about it you, I mean we, are doomed.

Binary assertions such as snapshots are deceptively attractive because you, lazy you, don't have to write them. If you decide to use them make sure you (and everyone else on your team) can understand them! Otherwise you'll end up just re-creating them every time, defeating the purpose of testing.

No one cares about that function you just wrote. I'm sorry but it's true, and it's better that you hear it from you, ehm, I mean me. Your teammates don't care, your organization doesn't care, your users certainly don't care! And neither should you. This function is a means to an end, a tiny cog in a bigger machine. We want to make sure that machine keeps working even if we replace the cog.

Refactor is the most wonderful thing to do in life. Testing functions in excruciating detail prevents refactors and snuffs out all the fun in coding. I bet you can cover just as much code by testing specifications instead.

I guess what I'm trying to say is test specifications, not implementations!.

If everything else fails, remember these rules, get them tattooed somewhere in the most Memento of ways:

If you add a feature, you add tests, because that's a new use-case and you test use-cases.

If you change a feature, you change tests, because some behavior is changing and it's OK that tests asserting the old behavior should change as well. (Beware that other tests unrelated to your change should not fail nor be "fixed"!)

If you remove a feature, you remove tests, because that use-case no longer exists.

If you refactor, you don't touch the tests, you cannot add, change or remove tests, because you are not changing use-cases or otherwise affecting product behavior.

The number of tests you need to change every time you commit is a good measure of the health of your testing strategy, this number should be very low (unless you are constantly changing the rules of your business in true startup style).

Oh, and go easy on those pistachios, those things will kill you, you know.

With love,
Me, that is, you.

DEV Community

The testing genie

The testing genie

The lidless eye

First wish: full coverage

A strange gift

The price of hubris

The aftermath

Only one possible solution

Second wish: all those snapshots

A picture of insanity

Watching the watchmen

Not everything is broken

Thunder and lightning

Withering code

An existential crisis

Returning sanity

The Ultimate Question of Test, the Universe, and Everything

A working proxy

Last wish: functional, not functions

A vision of the Apocalypse

A note to a future self

Top comments (0)

Read next

Next.js and Prisma: Efficiently Creating Seed Data for Your App

Django project - Part 3 Continuous Integration

Selenium 4: Understanding Key Features

How to Install Prometheus and Grafana using Docker