DEV Community

loading...

Nested list flattening

Pete Tsamouris
Functional programmer
Updated on ・6 min read

Disclaimer

As per the previous post in this series, in the process of helping out the community of people alphatesting the Unison programming language, I am solving the 99 Haskell problems in unison, and making notes, tailored for a beginner-friendly audience.

One of the points of the 99 exercises is to try alternative implementations: I will go out of my way (within reason) and explore such alternatives, even if a faster, more obvious, or easier approach is readily available.

Understanding of some concepts of functional programming might be required, and the intention of this series is to be tailored to a beginner-level audience. For all concepts mentioned in the post, I have made the effort to embed relevant hyperlinks, so in the event you encounter something you don't fully understand, feel free to either read to the end then search, or why not get a quick glimpse before you proceed.

Exercise 7: Flatten a nested list structure

Can I play too?

To follow along you will need:

  • the Unison codebase manager ucm. Optionally in the $PATH.
  • A new project created with ucm -codebase path init.
  • ucm running on your target codebase: ucm -codebase path.
  • The standard library (called base), cloned (from inside ucm): pull https://github.com/unisonweb/base .base.
  • An editor of your choice to open scratch.u, under the codebase location: ucm will pick up any changes to the scratch file, and provide output as necessary.
Wait, nested lists are a thing?

Unison does support user defined types, and these will be the focus of this post.

Consider the following definition, where the nested structure is abstracted behind a single type signature:

type Nested a = Element a | List [Nested a]

Instances of the Nested type, can contain:

  • a single element of type a, or
  • a list of elements of this type, as noted by [Nested a]

Element and List are the two constructors of the algebraic data type Nested.

And it's as easy as that?

Let's ask ucm:

⍟ These new definitions are ok to `add`:

  type Nested a

  Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels.

To give the semblance of uniformity with the rest of the Unison data types, I will cd base before adding the type to the codebase. You can be reminded of the constructors' signatures by using ls:

.> ls .base.Nested

  1. Element (a -> Nested a)
  2. List    ([Nested a] -> Nested a)
So what about tests?

We begin by adding the signature of the flatten method, with a default implementation, to be able to write unit tests.

The nested list does encode structure in it, so I will try and cover as many types of structure in my tests:

flatten: Nested a -> [a]
flatten any = []
test> tests.flatten.empty = 
  run( expect( flatten ( List []) == []))
test> tests.flatten.elem  = 
  run( expect( flatten ( Element 1) == [1]))
test> tests.flatten.elems =
  run( expect( flatten ( List [Element 1, Element 2]) == [1, 2]))
test> tests.flatten.nestOne =
  run( expect( flatten ( List [ List[Element 1]]) == [1]))
test> tests.flatten.nestedMany =
  run
    (expect
      (   flatten
           (List
             [ Element 1,
               List [List [Element 2, List [Element 3]]],
               Element 4 ])
      == [1, 2, 3, 4]))

Here are the scenarios tested:

  • Empty List
  • Single Element
  • List containing multiple Element values
  • List inside a List, containing an Element
  • Element followed by List, followed by List followed by Element

The above could have been broken down to simpler, more granular test cases, but I find that this is quickly becoming tedious, and does not really offer much additional value to you, the reader. By all means, feel free to experiment with more granular test cases. The last one overlaps in coverage with the third and fourth.

ucm agrees with the types of the above, and runs the tests, all of which, except the first one, are of course failing (omitted for brevity).

⍟ These new definitions are ok to `add`:

  flatten                  : Nested a -> [a]
  tests.flatten.elem       : [Result]
  tests.flatten.elems      : [Result]
  tests.flatten.empty      : [Result]
  tests.flatten.nestOne    : [Result]
  tests.flatten.nestedMany : [Result]
Can we now fill in the implementation of flatten?

Let's by all means. Here's one way to go about it:

flatten: Nested a -> [a]
flatten e =
  case e of
    Element el -> [el]
    List (l +: ls) -> flatten l ++ flatten (List ls)
    List [] -> []

If flatten encounters an Element, which only wraps a single el, all it needs to do is wrap it in a list. If it encounters a List of any structure of nested values, it breaks it in head +: tail, and separately recurses into head and tail, concatenating the two calculated results with (++). This approach uses simple recursion.

⍟ These new definitions will replace existing ones of the same name and are ok to `update`:

  flatten : Nested a -> [a]

⍟ I've updated to these definitions:

  flatten : .Nested a -> [a]

After the update, tests are passing!

  ◉ tests.flatten.empty            : Passed 1 tests.
  ◉ tests.flatten.elem             : Passed 1 tests.
  ◉ tests.flatten.nestOne          : Passed 1 tests.
  ◉ tests.flatten.elems            : Passed 1 tests.
  ◉ tests.flatten.nestedMany       : Passed 1 tests.
Can we generalise the recursion?

As mentioned in the first post of this series, since the original flatten is using simple recursion, it can be expressed in terms of foldb.

flatten: Nested a -> [a]
flatten e =
  case e of
    Element el -> [el]
    List l -> foldb (flatten) (++) [] l

As a reminder, here is the foldb signature (find/view), along with the type signatures, as well as orchestration to implement flatten, layed out one by one.

.> view base.List.foldb
  base.List.foldb : (a -> b) -> (b -> b -> b) -> b -> [a] -> b
  base.List.foldb f op z as

(Nested a -> [a]) ->                             flatten
([a] -> [a] -> [a]) ->                           (++) 
[a] ->                                           [] 
[Nested a] ->                                    l

foldb starts (and ends prematurely if the input is empty), with []. Whenever it encounters a component of [Nested a] (beginning with the head of l), it runs the f function, in this case flatten. It then concatenates (++) the output of op a with the previously accumulated result.

After an update, the tests are still passing.

Can the recursion/foldb be turned into an iteration?

By relying on a small helper function, the last implementatoin can be expressed in differently formulated manner that avoids foldb.

First, consider the following signature, where f can turn a value of a to a number of values of [b]:

f: a -> [b]

f can be passed to a function that, for the sake of uniformity with Haskell, will be called concatMap. concatMap then iterates over the [a] and converts each instance to a [b].

I will at this point make a quick detour, and add tests for concatMap: I rely on it to implement flatten, so for the sake of thoroughness, it should be functioning as expected:

test> tests.concatMap.empty =
  run (expect (concatMap (a -> [a]) [] == []))
tests.concatMap.idOne =
  run (expect (concatMap (a -> [a]) [1] == [1]))
tests.concatMap.doubleOne =
  run (expect (concatMap (a -> [a, a]) [1] == [1, 1]))

These tests pass, and prove that concatMap indeed only applies f to each a in the list, without performing any additional transformations. Feel free to be more thorough, but in the confines of Unison, the type signatures can be trusted.

Going back to the task at hand, for our purposes f is flatten, and a -> [b] is Nested a -> [a]

concatMap: (a -> [b]) -> [a] -> [b]
concatMap f as = foldl (acc a -> acc ++ f a) [] as

flatten: Nested a -> [a]
flatten e =
  case e of
    Element el -> [el]
    List ls -> concatMap flatten ls

As before, if flatten encounters an Element, the a -> [a] conversion is a "wrap in a list" operation. When it encounters a [Nested a], flatten does not recurse into itself by calling itself. Instead it calls concatMap that uses flatten as the function to work with. Internally, concatMap runs foldl.

⍟ These new definitions will replace existing ones of the same name and are ok to `update`:

  flatten : Nested a -> [a]
.> update

⍟ I've updated to these definitions:

  flatten : Nested a -> [a]

With tests passing, that's a wrap!

Discussion (0)