DEV Community

Pete Tsamouris
Pete Tsamouris

Posted on • Updated on

Packing consecutive list elements

Before we begin!

Quick reminder that Unison is still in public alpha!

Parts of the language, testing functionality and libraries that appear in this post might be subject to change.

Exercise 9: Pack consecutive duplicates into sublists

The aim of the pack function is to take a list of type [a] and group all consecutive duplicate elements into sublists, the output type being [[a]]. At the same time, the unpack function reverses the packing the aim being to allow for better ease of testing (more on this below) .

Please note the use of the term sublist for every list embedded in the outer list, for example in [[1], [2, 2], [3]] each of [1], [2, 2] and [3] will be referred to as a sublist.

How will we know we have built the right thing?

Let's consider pack first:

  • An empty list still needs to return a [[a]] (yet technically [] and [[]] both type-check against [[a]]).
  • Any single element list [x] will be wrapped in a list to form a sublist: [[x]]
  • Generalising the previous property: a list of non duplicate elements will end up with each element wrapped in a single element list.
  • The minimum list containing a duplicate element [x, x] will see the elements wrapped in a single sublist: [[x, x]]
  • So in the most general case any list consisting of consecutive duplicate and non consecutive duplicate elements will end up with sublists where length > 1, as well as sublists where length == 1.

Let's now have a look at unpack:

  • Any list containing an empty list ([[]]) will unpack to [].
  • Consecutive sublists, regardless of the number of elements in each, will be concatenated to a flattened list.

From a property based perspective:

  • For any not packed list U the length of U will be equal to the sum of the lengths of the sublists that are contained in the packed list P, where pack U produces P.
  • Any list L that can be packed to a list P can subsequently be unpacked to a resulting list L' with the two lists L == L' being equal.
On testing:

Having recently been made aware of the check and check' test helpers I will be making use of them to write my unit and property tests from now on.

For reference the tour (for the time being!) only mentions expect and run (expect (...)) but the check and check' functions can also be used to yield slightly more concise test cases.

Let's write some tests then!

Lets add the signatures of pack and unpack and implement a minimum hard coded and clearly wrong version, only to allow the compiler to run the tests.

use .base

pack : [a] -> [[a]]
pack as = []

unpack : [[a]] -> [a]
unpack aa = []

Here are the unit tests for pack:

test> test.pack.empty = 
  check (pack [] == [])

test> test.pack.single = 
  check (pack [1] == [[1]])

test> test.pack.duplicate = 
  check (pack [1, 1] == [[1, 1]])

test> test.pack.list = 
  check (pack [1, 2, 2, 3, 4, 4] == [[1], [2, 2], [3], [4, 4]])

And here are the unit tests for unpack:

test> test.unpack.empty = 
  check (unpack [[]] == [])

test> test.unpack.empty' = 
  check (unpack [] == [])

test> test.unpack.single = 
  check (unpack [[1]] == [1])

test> test.unpack.duplicate = 
  check (unpack [[1, 1]] == [1, 1])

test> test.unpack.list = 
  check (unpack [[1], [2, 2], [3], [4, 4]] == [1, 2, 2, 3, 4, 4])
Repetitive much?

You might have noticed a ...slight repetition in the testing department. I am allowing a slight peek in the test-case-writing-process on purpose to better highlight why property tests might be helpful in this instance. Since the pack and unpack methods take us from [a] to [[a]] and back again, instead of testing pack with a set of inputs and outputs then unpack with the set of outputs as inputs it is (usually) quicker to test both at the same time using the approach sometimes referred to as there and back again.

A quick find for generators under base.v1.test reveals that a small number of additional generators would come handy in our case to allow freedom to better shape the test inputs to our needs.

Originally I wrote a non empty list generator (inspired from the stock list generator, with the range being always a minimum two elements) as well as a consecutive elements list generator, that when forced produces a list containing exclusively consecutive elements.

base.test.v1.non_empty_list : '{Gen} a -> '{Gen} [a]
base.test.v1.non_empty_list g =
  'let
    size = !nat
    List.map (_ -> !g) (range 1 (size + 2))

base.test.v1.consecutive_list: '{Gen} a -> '{Gen} [a]
base.test.v1.consecutive_list g = 
  ' let
      size = increment (!nat)
      elem = !g
      List.map (_ -> elem) (range 1 (size + 2))

Then I noticed the lack of reuse between the two functions as well as general lack of information being communicated at the type level and most importantly, at this point the codebase contains 3 more or less identical functions for list, with slightly different arguments.

Slight detour to use a non empty list type for the above, by making use of Unison record types:

type NonEmptyList a = { head' : a, tail' : [a]}

toList : NonEmptyList a -> [a]
toList nel = case nel of
  NonEmptyList.NonEmptyList hd tl -> [hd] ++ tl

base.test.v1.nel : '{Gen} a -> '{Gen} (NonEmptyList a)
base.test.v1.nel g = 'let NonEmptyList !g !(list g) 

base.test.v1.consecutive_nel : '{Gen} a -> '{Gen} (NonEmptyList a)
base.test.v1.consecutive_nel g = 
  'let 
    non_empty = !(nel g)
    hd = head' non_empty
    NonEmptyList (hd) ([hd] ++ tail' non_empty)

So a consecutive_nel is produced in terms of a nel, while head' is added to the scope by Unison by the definition of the record type members head' and tail'.

base.test.v1.unpacked_list: '{Gen} [Nat]
base.test.v1.unpacked_list =
  'let
    large = pick ['((!nat) + 10), '((!nat) + 100)]
    small = pick ['((!nat) + 1), '((!nat) + 10)]
    s1 = !(nel large)
    m1 = !(consecutive_nel small)
    join (map toList [m1, s1, m1, s1])

Is this overly complex? Maybe, but I would like to be able to produce input-lists for my tests containing clearly discernible elements so I can be sure to notice if I break the generator.

> sample 100 '(!unpacked_list)
   ⧩
   [ [1, 1, 10, 1, 1, 10],
     [1, 1, 10, 10, 10, 10],
     [10, 10, 10, 1, 1, 10],
     [10, 10, 10, 10, 10, 10],
     [1, 1, 10, 1, 1, 100],
     [1, 1, 10, 10, 10, 100],
     [10, 10, 10, 1, 1, 100],
     ...
     truncated

Note: '(!(list nat)) produces some lists with consecutive values due to the nature and behaviour of the nat generator, but this behaviour cannot be relied upon. The {Gen} ability handlers are not random number generators but rather domain enumerators in specific order and the same holds for how pick selects a generator from the list too.

So what about the properties then?

In order to minimise the code and since all property tests are meant to test the same properties mentioned already (length of sub-lists equal to the unpacked list, unpack . pack produces the input), it is handy to write a test generator that given a generator for a list of some structure, checks if the properties hold over a number of iterations tested.

use base.Nat +
use base.Nat map

subListLengths: [[a]] -> Nat
subListLengths aas = foldl (+) 0 (map size aas)

test : '{Gen} [a] -> '{Gen} Test
test generator = 
  'let
    l = !(generator)
    packed = pack l
    unpacked = unpack packed
    check' ((unpacked == l) && (subListLengths packed == size unpacked))

test> test.pack_unpack.empty = 
  check (((unpack . pack) []) == [])

test> test.pack_unpack.list =
  runs 100 (test (list nat))

test> test.pack_unpack.consecutive = 
  runs 100 (test unpacked_list)

Caveat: runs 100 (test (list nat)) probably has more brackets than I care to write. It also took a number of attempts to get it to type-check, but I suppose this will get more intuitive given time and practice.

After a day spent writing tests, could we "pack" a list maybe?

In the past I have attempted my best to implement the function that is the main focus of the exercise in as many ways as possible. In this instance I feel that there are only so many ways that are worthwhile, and some of them rely on (currently) missing standard library functions. I will only implement pack in terms of takeWhile and dropWhile (which I have unit tested previously, and are beyond our scope at this point):

base.List.dropWhile : (a -> Boolean) -> [a] -> [a]
base.List.dropWhile pred as =
  case as of
    []              -> []
    h +: t | pred h -> dropWhile pred t
    _               -> as

base.List.takeWhile : (a -> Boolean) -> [a] -> [a]
base.List.takeWhile pred as =
  case as of
    []              -> []
    h +: t | pred h -> [h] ++ takeWhile pred t
    h +: _          -> []
pack : [a] -> [[a]]
pack as = case as of
  []  -> []
  h +: t -> (h +: (takeWhile (e -> e == h) t)) +: pack (dropWhile (e -> e == h) as)

unpack : [[a]] ->[a]
unpack = join

It helps that unpack is actually provided for free with the Unison base library. In case you don't have it immediately handy to check the implementation, here's how you can concatenate a list containing sublists into a flat list:

.> view base.List.join

  base.List.join : [[a]] -> [a]
  base.List.join =
    use List ++
    foldl (++) []


But are the tests and properties passing?

Why don't we find out:

.> test

  ✅

    New test results:

  ◉ test.pack.empty                 : Proved.
  ◉ test.pack.single                : Proved.
  ◉ test.pack.duplicate             : Proved.
  ◉ test.pack.list                  : Proved.

  ◉ test.unpack.empty               : Proved.
  ◉ test.unpack.empty'              : Proved.
  ◉ test.unpack.single              : Proved.
  ◉ test.unpack.list                : Proved.

  ◉ test.pack_unpack.empty          : Proved.
  ◉ test.pack_unpack.list           : Proved.
  ◉ test.unpack.duplicate           : Proved.
  ◉ test.pack_unpack.consecutive    : Proved.


  ✅ 12 test(s) passing

  Tip: Use view test.pack_unpack.empty to view the source of a
       test.

Turns out yes they are, so that's a wrap!

Discussion (0)