DEV Community

Pete Tsamouris
Pete Tsamouris

Posted on • Updated on

Eliminate consecutive duplicates

Disclaimer

As per the previous posts in this series, in the process of helping out the community alphatesting the Unison programming language, I am solving the 99 Haskell problems in Unison and making notes, tailored for a beginner-friendly audience.

One of the points of the 99 exercises is to try alternative implementations. I will go out of my way (within reason) and explore such alternatives even if a faste, more obvious or easier approach is readily available.

Understanding of some concepts of functional programming might be required and the intention of this series is to be tailored to a beginner-level audience. For all concepts mentioned in the post I have made the effort to embed relevant hyperlinks. In the event you encounter something you don't fully understand, but does come with a hyperlink-hint feel free to either read to the end then search, or why not get a quick glimpse of the term before you proceed.

Exercise 8: Eliminate consecutive duplicates of list elements.

Can I play too?

To follow along you will need:

  • the Unison codebase manager ucm. Optionally in the $PATH.
  • A new project created with ucm -codebase path init.
  • ucm running your project under codebase by means of: ucm -codebase path.
  • The standard library (called base), cloned from inside ucm: pull https://github.com/unisonweb/base .base.
  • An editor of your choice to open scratch.u under codebase: ucm will tell you the folder in which it will pick up any changes to the scratch file, and provide output as necessary.
So how do we know this thing will work?

Why don't we write some tests first. The function we are trying to write will be called compress by convention and will reduce a list of elements to a list of maybe less elements. It can directly be placed in the base.List namespace, by adding the path to the namespace before the definition, but that is not necessary.

base.List.compress: [a] -> [a]
base.List.compress as = []

test> tests.compress.empty =
  run ( expect ( base.List.compress [] == [] ))

test> tests.compress.nonCompressable1 =
  run ( expect ( base.List.compress [1] == [1] ))

test> tests.compress.nonCompressable2 =
  run ( expect ( base.List.compress [1, 2] == [1, 2] ))

test> tests.compress.nonCompressable3 =
  run ( expect ( base.List.compress [1, 2, 3] == [1, 2, 3] ))

test> tests.compress.compressable1 =
  run ( expect ( base.List.compress [1, 1, 1, 2, 2, 3] == [1, 2, 3] ))

test> tests.compress.compressable2 =
  run ( expect ( base.List.compress [1, 2, 2, 3] == [1, 2, 3] ))

test> tests.compress.compressable3 =
  run ( expect ( base.List.compress [1, 1] == [1] ))

test> tests.compress.example =
  run(
    expect(
      (base.List.compress [?a, ?a, ?a, ?a, ?b, ?c, ?c, ?a, ?a, ?d, ?e, ?e, ?e, ?e] ==
       [?a, ?b, ?c, ?a, ?d, ?e])
    )
  )

Note: compress is directly coded into the base.List namespace above, and all tests point to it by full path. This is equivalent to first using the cd .base.List command, to move namespace, then coding compress without a namespace in the declaration and implementation.

  I found and typechecked these definitions in :scratch.u. If
  you do an `add` or `update`, here's how your codebase would
  change:

    ⍟ These new definitions are ok to `add`:

      base.List.compress              : [a] -> [a]
      tests.compress.compressable1    : [Result]
      tests.compress.compressable2    : [Result]
      tests.compress.compressable3    : [Result]
      tests.compress.empty            : [Result]
      tests.compress.example          : [Result]
      tests.compress.nonCompressable1 : [Result]
      tests.compress.nonCompressable2 : [Result]
      tests.compress.nonCompressable3 : [Result]

  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

    5 |   run ( expect ( base.List.compress [] == [] ))

    ✅ Passed : Passed 1 tests.

    8 |   run ( expect ( base.List.compress [1] == [1] ))

    🚫 FAILED 

    11 |   run ( expect ( base.List.compress [1, 2] == [1, 2] ))

    🚫 FAILED 

    14 |   run ( expect ( base.List.compress [1, 2, 3] == [1, 2, 3] ))

    🚫 FAILED 

    17 |   run ( expect ( base.List.compress [1, 1, 1, 2, 2, 3] == [1, 2, 3] ))

    🚫 FAILED 

    20 |   run ( expect ( base.List.compress [1, 2, 2, 3] == [1, 2, 3] ))

    🚫 FAILED 

    23 |   run ( expect ( base.List.compress [1, 1] == [1] ))

    🚫 FAILED 

    26 |   run(

    🚫 FAILED 

The test cases above contain the following inputs:

  • the empty list []
  • a non compressible list containing one, two and three elements
  • a compressible list beginning with a compressible sequence, and containing a compressible sequence afterwards
  • a compressible list beginning with a non compressible sequence, followed by a compressible one
  • the example from the problem definition, which on closer scrutiny and in good humour almost provides more coverage than the previous cases put together.
So what next?

ucm agrees with the types and syntax so far so we can add these definitions, and clear the scratch file.

.> add

  ⍟ I've added these definitions:

    base.List.compress              : [a] -> [a]
    tests.compress.compressable1    : [Result]
    tests.compress.compressable2    : [Result]
    tests.compress.compressable3    : [Result]
    tests.compress.empty            : [Result]
    tests.compress.example          : [Result]
    tests.compress.nonCompressable1 : [Result]
    tests.compress.nonCompressable2 : [Result]
    tests.compress.nonCompressable3 : [Result]

So can we now implement compress?

Let's have a first try, while relying on list look-ahead. Previously I would view a definition on the terminal and then copy-paste that into the scratch file. I will from now on use the edit command, so that ucm will do this for me.

base.List.compress: [a] -> [a]
base.List.compress as = case as of
  [] -> []
  [x] -> [x]
  [x, y] ++ rest ->
    if (x == y) then base.List.compress (y +: rest)
    else x +: base.List.compress (y +: rest)
  I found and typechecked these definitions in :scratch.u. If
  you do an `add` or `update`, here's how your codebase would
  change:

    ⍟ These new definitions will replace existing ones of the
      same name and are ok to `update`:

      base.List.compress : [a] -> [a]

  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

If the input is the empty list there is nothing to be done and compress returns the empty list. Alternatively the (x +: (y +: rest)) clause matches on a list of at least two elements, so it can proceed to check these two elements. If they are equal y is a duplicate of x, and the next function call will have input of y +: rest so as not to completely discard the x|y element. If they are not equal, x is not discarded and equally y is not discarded either. Notice that I jumped ahead to a list of two elements: there is a remaining clause to be matched which I am also adding at this time: [x] -> [x]. A list containing a single element, just like the empty list, is left intact.

My first take of the function above, give or take some minor attempts to get the syntax right, was using the pattern (x +: (y +: rest)) to refer to a list of at least two elements. After a comment by Paul Chiusano, turns out the same can be coded as [x, y] ++ rest, which saves some bracket noise.

.> update

  ⍟ I've updated to these definitions:

    base.List.compress : [a] -> [a]

  ✅

  No conflicts or edits in progress.

.> test

  ✅  

    New test results:

  ◉ tests.compress.compressable1       : Passed 1 tests.
  ◉ tests.compress.compressable3       : Passed 1 tests.
  ◉ tests.compress.compressable2       : Passed 1 tests.
  ◉ tests.compress.example             : Passed 1 tests.
  ◉ tests.compress.nonCompressable2    : Passed 1 tests.
  ◉ tests.compress.nonCompressable3    : Passed 1 tests.
  ◉ tests.compress.nonCompressable1    : Passed 1 tests.
  ◉ tests.compress.empty               : Passed 1 tests.

  ✅ 8 test(s) passing

  Tip: Use view tests.compress.compressable1 to view the source
       of a test.

With the function under test updated the tests have "new results". Note however that running the test command again does not rerun any tests, but shows cached tests results, as long as the function under test has not been updated since the last run.

So can we do another one? With less pattern matching?

There might be quite a few alternatives worth exploring and we can begin by looking into an iterative approach using foldl.

First a reminder of the signature and implementation of foldl:

.> view base.List.foldl
  base.List.foldl : (b ->{𝕖} a ->{𝕖} b) -> b -> [a] ->{𝕖} b
  base.List.foldl f b as =
    go b i =
      case List.at i as of
        None -> b
        Some a ->
          use Nat +
          go (f b a) (i + 1)
    go b 0

Going over each element of the list while using an accumulator, if the last element encountered matches the current element then it can be discarded, otherwise this element is appended to the accumulator and will be the basis of comparison in the next iteration.

base.List.compress: [a] -> [a]
base.List.compress as = 
  f acc elem = case acc of
    [] -> [elem]
    (init :+ last) -> if (last == elem) then acc else (acc :+ elem)
  foldl f [] as
  I found and typechecked these definitions in :scratch.u. If
  you do an `add` or `update`, here's how your codebase would
  change:

    ⍟ These new definitions will replace existing ones of the
      same name and are ok to `update`:

      base.List.compress : [a] -> [a]

  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

After updating compress, tests are still passing.

.> test

  ✅  

    New test results:

  ◉ tests.compress.example             : Passed 1 tests.
  ◉ tests.compress.empty               : Passed 1 tests.
  ◉ tests.compress.nonCompressable3    : Passed 1 tests.
  ◉ tests.compress.compressable3       : Passed 1 tests.
  ◉ tests.compress.nonCompressable2    : Passed 1 tests.
  ◉ tests.compress.nonCompressable1    : Passed 1 tests.
  ◉ tests.compress.compressable2       : Passed 1 tests.
  ◉ tests.compress.compressable1       : Passed 1 tests.

  ✅ 8 test(s) passing

  Tip: Use view tests.compress.example to view the source of a
       test.

Are there more alternative approaches?

Yes there are: one using dropWhile (not part of the Unison base library yet) and one using group (also not part of base yet).

Care to go ahead regardless?
Using "dropWhile"

There are probably multiple ways to implement dropWhile or in other words a function that given a predicate, will keep dropping elements from the head of the list for as long as the predicate holds. For example:

dropWhile (x -> x < 3) [1, 2, 3, 4, 1, 2] -> [3, 4, 1, 2]

The straightforward way is to just implement it as per its definition, without relying on an abstraction.

base.List.dropWhile: (a -> Boolean) -> [a] -> [a]
base.List.dropWhile pred as =
  case as of
    [] -> []
    h +: t ->
      if pred h then dropWhile pred t
      else as

  I found and typechecked these definitions in :scratch.u. If
  you do an `add` or `update`, here's how your codebase would
  change:

    ⍟ These new definitions are ok to `add`:

      dropWhile : (a ->{𝕖} Boolean) ->{𝕖} [a] ->{𝕖} [a]

  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

For dropWhile the predicate is tested and if it is false there are no items to drop. On the other hand if it is true then the head is dropped and dropWhile recurses into the remaining elements of the tail, using the same mechanism.

So finally compress can be implemented in terms of dropWhile as follows:

base.List.compress: [a] -> [a]
base.List.compress as = 
  case as of
    [] -> []
    h +: t -> 
      dropped = dropWhile (x -> x == h) t
      h +: compress dropped
  I found and typechecked these definitions in :scratch.u. If
  you do an `add` or `update`, here's how your codebase would
  change:

    ⍟ These new definitions will replace existing ones of the
      same name and are ok to `update`:

      base.List.compress : [a] -> [a]

  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

In this version compress firstly acts as a means of breaking the list down to the head +: tail clause, which then allows the remaining elements of tail to be checked for equality with head and dropped if matching. Once dropWhile finishes filtering after the current head compress recurses into the remaining elements of the dropped intermediate result list.

.> update

  ⍟ I've updated to these definitions:

    base.List.compress : [a] -> [a]

  ✅

  No conflicts or edits in progress.

.> test

  ✅  

    New test results:

  ◉ tests.compress.nonCompressable1    : Passed 1 tests.
  ◉ tests.compress.compressable3       : Passed 1 tests.
  ◉ tests.compress.nonCompressable2    : Passed 1 tests.
  ◉ tests.compress.compressable1       : Passed 1 tests.
  ◉ tests.compress.empty               : Passed 1 tests.
  ◉ tests.compress.nonCompressable3    : Passed 1 tests.
  ◉ tests.compress.example             : Passed 1 tests.
  ◉ tests.compress.compressable2       : Passed 1 tests.

  ✅ 8 test(s) passing

  Tip: Use view tests.compress.nonCompressable1 to view the
       source of a test.

Using "group"

Feel free to experiment with this one which I will describe but not code. Given a function that "groups" same elements into lists, you can run group on the input and end up with a list of lists of same elements. Collecting the head of each of those lists will give you a neat solution.

(For those who are a bit more advanced, or curious for more detail, group could be implemented in terms of groupBy (spoiler here), which in turn could be implemented in terms of span (spoiler here))

Thanks for reading!

Note:

A great many thanks to Daniel Skinner for reviewing and commenting the original draft of this post.

Discussion (0)