A few days ago I tried to solve the next problem with TDD and found it really hard.
I could not reach the solution through a series of small incremental test cases.
This is the problem:
flatten an array of arbitrarily nested arrays of integers into a flat array of integers.
e.g. [[1,2,[3]],4] > [1,2,3,4].
of course without using language implementation like flatten
in ruby
Can someone try to TDD this problem and comment here her/his experience?
Clarification
The process that I follow.
1 Write a set of tests (input>expectedoutput)
2 Pick the simplest test case to solve. (RED)(Write a failing spec)
3 Write the most naive code change that solves the current set of tests. (GREEN)
4 Check if DRY is needed (Refactor)
5 Go back to 2
My expectation is that at some point I reach the solution for the general case but following these rules I'm not able to reach the solution.
At some point there is no naive change to solve the tests and I need to rewrite the whole solution.
I found two solutions to this problem but none of them was following this process.
https://gist.github.com/delbetu/ff67be230f83542ac1924e766b591803
Update
I found a solution for this problem check this out https://dev.to/delbetu/atddexamplee4o
Discussion
I think the tricky thing about this particular thought experiment is the 'arbitrary' part  setting up a proper test generally requires being specific about the test case and having a tangible case to test. This problem subverts that by stating that the arrays (in this context) can be infinitely deep.
One approach that might actually help is to split this problem into two layers. The first layer is identifying two arrays that need to be merged (thus one is contained within the other). The second layer is actually combining the two arrays. That may seem a bit arbitrary, but splitting these two concepts will actually help us test.
Let's get more specific. Let's say that
ResponsibilityA
is simply determining that (if our scope is an array): we've found an array within it that needs to be decomposed. For the sake of tangible examples, this could befoo = [1, 2, [bar]]
andResponsibilityA(foo)
would determine thatfoo
needs[bar]
decomposed at position 2. That's it. It doesn't care whatbar
itself is or what it contains.Now, with that basis we can define
ResponsibilityB
as the decomposer. Its job is also fairly simple  do a single denesting. Again for tangibility, if we havebaz = [1, 2, 3]
and we callResponsibility(baz)
, it ought to return1, 2, 3
. This may seem like a small distinction since all we did was remove the square brackets, but the difference here is that it turnedbaz
from a single object into three objects. If this seems hard to understand from a practical standpoint, that's because it is 🙂 most languages don't support a function that can return an arbitrary number of return objects, so you can either consider this as returning an enumerable / enumeration, or you can abstract this entire concept to Javascript's style spread operator:...baz
The last step to making these two machines work together is setting up the structure of
ResponsibilityA
a little more. Let's say that asResponsibilityA
crawls through an array, once it finds a subarray that needs to be decomposed, it actually calls out toResponsibilityB
to decompose that subarray in realtime and in place, then reassesses that index before moving forward. To put that into the visual, let's sayfoo = [9, 9, [foo, bar, [x, y, z]], 9]
.ResponsibilityA(foo)
would then begin crawling:9
which is not an array, move forward9
which is not an array, move forward[foo, bar, [x, y, z]]
which is an array:ResponsibilityB
and pass[foo, bar, [x, y, z]]
ResponsibilityB
returnsfoo, bar, [x, y, z]
ResponsibilityA
replaces element at position 2 with the returned valueResponsibilityA
augments loop or iterations to reassess element at position 2 nextfoo
is now[9, 9, foo, bar, [x, y, z], 9]
foo
which is not an array, move forwardbar
which is not an array, move forward[x, y, z]
which is an array:ResponsibilityB
and pass[x, y, z]
ResponsibilityB
returnsx, y, z
ResponsibilityA
replaces element at position 4 with the returned valueResponsibilityA
augments loop or iterations to reassess element at position 4 nextfoo
is now[9, 9, foo, bar, x, y, z, 9]
x
which is not an array, move forwardy
which is not an array, move forwardz
which is not an array, move forward9
which is not an array move forward[9, 9, foo, bar, x, y, z, 9]
)So what's the point of splitting these two concepts? Each one is individually testable! As I mentioned above, the "arbitrary" bit of the problem statement prevents us from writing a full test for the problem since you can't write a test with arbitrary data. That would be infinite. What we can do is write a test for each of the individual responsibilities above to make sure they work independently, then just a simple test case to prove that they work together (the above walkthrough is a perfect 'simple test case to prove that they work together', so we'll use it below).
(For the math geeks out there, this process is effectively similar to constructing a math proof on the basis of induction 🤓)
So let's do it.
ResponsibilityA
is a singlelayer deep concern, meaning that if you callResponsibilityA([1, 2, [foo, [bar]])
, it will recognize that[foo, [bar]]
needs to be decomposed at position 2 but it will not dig further into that array to also determine that [bar] needs to be decomposed too. Cool? Let's write a test then. In order to cover the case of a nested array being found at the first position, middle, and last position, let's just wrap this into a single test of inputs and outputs:If I give
ResponsibilityA
an argument of[[1, 2], [[a], b], [#, $]]
, it should identify that array decompositions need to occur at positions 0, 1, and 2. How do we test that? We mockResponsibilityB
and expect it to receive a call with argument[1, 2]
. Let's also mock it to returnfoo
instead of1, 2
so we can prove that the "replace inplace" bit is working too. So overall, we will expectResponsibilityB
to:[1, 2]
and mock returnfoo
[[a], b]
and mocked return[bar], baz
[bar]
and mocked returnbar
[#, $]
, and we'll mock it to returnqux
.If we run that test, we can rely on the expectations of
ResponsibilityB
receiving those four calls with those three specific arguments, and the mock returns should guarantee thatResponsibilityA
ultimately returns a final product of[foo, bar, baz, qux]
. That's a perfectly valid test ofResponsibilityA
that proves it isResponsibilityB
and passing the subarrays as it encounters them (indicated by receiving call #3 after getting response #2 [instead of jumping straight from #2 to #4])ResponsibilityB
gives backResponsibilityB
(since #2 above sends back an array in the first position) after it does the replaceinplaceThat's awesome. That test proves that
ResponsibilityA
does exactly what it's intended to do. Now we just need to test thatResponsibilityB
actually does what it's supposed to.For the sake of brevity, let's just say that if I pass
ResponsibilityB
[1, 2, 3]
it should return1, 2, 3
and if I pass it[1, [2], 3]
it should return1, [2], 3
(just takes off the outer brackets).Since I've proven that
ResponsibilityA
correctly identifies and replaces subarrays in place (then reassesses the same index) but doesn't actually determine how to denest the array and I've proven thatResponsibilityB
can denest a single array, putting those together does indeed prove that denesting at an arbitrary length and depth is achieved. If that's hard to understand, that's totally okay! Induction is a really tough concept to wrap your head around. We're effectively proving that each responsibility works on its own arbitrarily and therefore putting them together will work arbitrarily too.Technically we ought to also have at least one test that tests the two things actually working together, so without mocking what
ResponsibilityB
should expect and mocking what it will return, we can just write a simple test that says:If I pass
[[1, 2], [[a], b], [#, $]]
toResponsibilityA
, it should return[9, 9, foo, bar, x, y, z, 9]
. Same example from above but the idea is the same prove both responsibilities individually then use a simple test to prove that they work together and you've proven their potential.I hope that makes sense.. sorry it turned into an essay!!

JonSullivanDev
Also thanks for the inspiration; might turn this into a full blog post
Well Jon, this is more a proof than an implementation. While technically correct, this approach will however fail if you have an infinite (i.e. unknown in advance) array, or a stream.
What you call "responsibilities" I call "the different cases when inspecting next element". My first solution didn't yield a lazy solution like I wanted, however; I should try again and see if the same subdivision that you envision emerges from that approach or not.
The length and depth of your answer, however, begs a question: what really is a test? is the goal of TDD to prove a piece of code "correct", or just that "it works"?
You do prove that your approach is "correct" (for finite length arrays); but is this a "unit test"? Would you call this TDD?
Thanks Jon for taking the time for thinking of such a detailed solution, but the core of my question is TDD.
I wonder if this problem can be solved in a series of micro redgreenrefactor cycles (30 seconds)
I don't want a solution I want you to try it and heard about your thoughts.
I'm questioning the usage of TDD not the problem.
Well forgive me for being a rather wordy fellow but the tail end of my solution does outline the very specific red/green tests you could write to TDD the problem... I just give a lot of foundational theory basis for why I chose those tests ;)
You can TDD anything given the right mindset :D
@michelemauro I didn't read streams or infinite lists as being part of the parameters of the problem but the same solution could be slightly adjusted (really just in
ResponsibilityB
to handle infinite sequences and/or streams pretty readily.Anyway, all around good conversation guys  cheers 👍🏻
Ok, as soon as I have some free time I'll try to follow your tests and tell you back what was my experience.
thank you!
Reading again all of your answer so you are describing an algorithm which possibly solves the problem and you identify two sub problem that can be tested individually,
This is a valid technique for solving a problem but is totally the opposite of tdd
Because in tdd you don’t know the solution in advance, as you resolve every micro test with the minimum amount of code to satisfy the test you discover the algorithm.
So you don’t know the algorithm in advance.
Well, a few tests that come to mind are:
[3] > [3]
[[3]] > [3]
[1, [2, 3]] > [1, 2, 3]
I'm writing them in Java, give me time to reach an interesting level and I'll share it for discussion.
The problem smells a lot like recursion: it's a great exercise in writing only the necessary code, as J.B. Rainsberger preaches in this video at 2:48:
Do watch it all, because it's great.
Yeah the tough part is that while individual tests like
[3] > [3]
[[3]] > [3]
[1, [2, 3]] > [1, 2, 3]
Prove that the algorithm works for those specific test cases, they don't provide a fundamental proof that the algorithm works indefinitely. Under this premise you'd have to write every possible test scenario.. but that's impossible since there are infinite!
Totally agree on the principles of recursion working at play here though! As I mentioned induction in my (length) comment above, induction and recursion are tightly coupled concepts and spot on for this problem!
Firstly, TDD is not writing a set of tests. It's writing a test that fails then writing functionality.
So (like I said earlier), you write a RED test:
You then write the simplest code to solve this problem and GREEN the test above
Then your next test case
and it passes, so next
Next
This one does RED. So we write the code to GREEN it
Both start off green
RED, so expand implementation
GREEN.
Starts off green.
There's not always a naive change but in this case I'm not sure where the non naive code change happens. You need to add a loop pretty early but that's the easy way forward.
You raise a good point when saying that starting with a list of tests is not TDD.
To clarify those tests are not run all together from the beginning.
When I start coding all of them are commented out and during the TDD cycle I just decide which of them uncomment.
Starting with a list of tests is done by J.B.Rainsberger
on his course introduction to TDD
In the other hand maybe restricting to an initial list of test is not a good way to lead to a solution.
I tried writing the test that makes more sense for the current tddcycle and then I reached a solution (updated this in the description).
your solution seems to be a valid one I didn't try it.
Thanks for taking the time to read and solve this problem.
While reading and writing the other comments, something felt missing... then it dawned on me!
@delbetu , are we answering to your question?
I assumed that your question was "I had a hard time coming up with the tests for this problem". And approached the thing writing some tests. That led me to a suboptimal solution (but I can improve it, I have tests now!)
I think @jonsullivandev thought your question meant "I had a hard time solving this problem" and gave you a correct solution, proving it beyond every (well, almost) conceivable test.
Which one was it, @delbetu ?
(not that I mind the interesting discussion that came out of this...)
"I had a hard time coming up with the tests for this problem" > this is partially correct.
I was able to get a set of test cases.
I wasn't able to reach a solution through the application of micro cycles (redgreenrefactor) based on those tests.
The point is "Does TDD work?"
There is only one answer, of course: "It depends".
It depends on what you mean for "work".
TDD has a proven track of "working" in helping produce software that exibits less bugs, that it is closer to the specification encoded by the tests (that may or may not be the desidered one, but that's another story), and that is easier to debug. And that, at least locally, is better designed.
Things get much more blurred when you have:
So, TDD "works"... in every context where it works (sorry). That however covers quite a lot of possible programming problems. There undoubtedly are contexts in which TDD is not a perfect fit or does not work at all, and other tecniques must supplement or substitute it.
I don't see a problem, you want small incremental test cases. What's the simplest case and expand it as nauseam:
Simplest case:
[] > []
An input:
[1] > [1]
Multiple Inputs:
[1,2] > [1,2]
Simple nested array:
[[1,2]] > [1,2]
Nested array and single entry:
[1,[2,3]] > [1,2,3]
Two nested arrays:
[[1,2],[3,4]] > [1,2,3,4]
Two level nested array and single elements:
[[1,2,[3,4,5]],6,7] > [1,2,3,4,5,6,7]
Two, two level nested arrays:
[[1,2,3,[4,5,6]], [7,8,[8,10,11]]] > [1,2,3,4,5,6,7,8,9,10,11]
At which point you can continue to create test cases for different combinations or even better, I would advise Theory or Property testing to generate arbitrary numbers of entries, depths and values. Any non conformity found via these methods become hard coded test cases of their own.
I think that I'm not expressing my problem correctly.
Just wrote a clarification in the description.
Well, Jon Sullivan's answer is quite... interesting. And it's quite, but not completely, different from the path that I took.
In this repository: bitbucket.org/michelemauro/flatten... (sorry, it's Java; I don't do Ruby) you'll find:
I tried a "the simplest thing that can possibily work" approach, but came out with a really unsatisfying solution because:
All in all, my implementation seems very similar to what Jon describes, with the caveat that I don't separate the two responsibilities. I also don't think that the best solution (a recursive, lazy one that can work with infinite collections) can work with this approach, because it requires unpacking a whole subarray before returning the next element, and that loses the lazyness.
So, this simple question is becoming a really interesting little problem, almost worth a kata. There is a lot to be pondered on.
Thanks for sharing this!
I faced the same issue, at some early point in the TDD cycles I got stuck and couldn't pass the test without writing the solution that was in my mind.
Anyway, testing first helped me to think about the problem and came out with a solution.
Share my my solution
So it seems that the hard part of TDD is finding
The simplest thing that can possibly work