Chaos isn't usually confidence inspiring. But controlled chaos–aka randomness–is a powerful tool to navigate the unknown.
Here's the problem: we're replacing component A
with a functionally equivalent component B
. Usually because B
is better implemented: faster, easier to maintain, etc. Assuming a shared interface, B
should be a drop-in replacement for A
that behaves exactly the same.
For simple functionality, it's not hard to validate this, whether through unit tests or even analyzing the code. A trivial example is that a && b || c
is equivalent to c || a && b
, we don't need to get too fancy validating that change. Well, unless the evaluation short-circuits and the expressions have side effects… 😰
My use case isn't so simple though, I'm reimplementing scikit's reconstruction algorithm using the fast-hybrid algorithm.
I'm seeing >15x speed-ups. But, speed is unhelpful, if it comes with error (well, usually). And the algorithm is pretty tricky. I tested some examples by hand and it passed… but I also discovered & fixed issues along the way.
I want to make sure my implementation is 100% correct: because that's how I roll, but also I want to provide my future reviewers evidence of correctness. In this case correctness is defined as behaving the same as scikit, on the assumption it's correct. (It's just slow.)
Formally speaking, B
is functionally equivalent to A
if it produces the same outputs given the same inputs.
I'm a bit lucky as the implementation is purely functional: there aren't side effects. Given some image, do some processing, and return the processed result.
But I'm also unlucky in that I don't have much intuition how to test this domain (grayscale morphology & image reconstruction).
Here's where the chaos part comes in. The monkey testing applies "chaos" by simulating random input typed in by proverbial monkeys. After infinite iterations, we either get Shakespeare texts, or 100% test coverage of the implementation.
Of course, in practice we can't run the test infinite times. Then again, we're building confidence not issuing a proof. The overall confidence is predicated on 4 factors:
1- analysis/belief that the code is a faithful implementation of the algorithm
2- passing existing unit tests
3- manual verification of interesting, real-world examples
4- automated monkey testing to discover edge cases
Here's how I applied monkey testing to validate my implementation matches expectation: deepcell-imaging#127 Add chaos-monkey comparison vs scikit
Now, I already have 2 follow-up issues to make sure my monkeys are monkeying around on a fuller test space:
#128 Parameterize footprint in scikit comparison
#129 Parameterize random seed in scikit comparison
After all, if those monkeys only type on the left side of the keyboard, what kind of Shakespeare is that?
Top comments (0)