Myoungjin Jeon

Posted on Jun 21, 2022 • Edited on Jun 24, 2022 • Originally published at jeongoon.github.io

Let's Fold a Tree

#haskell #tree #algorithms #functional

folding-tree?

The term of folding-tree is not an official or common name. But I'll call the function as folding-tree here. I guess Tree Merging is more accurate term. see more.

~~I lost the original source code because link I have is broken now. So this article is licensed under MIT as I borrow from other's~~

Credit Update

I found the original link! and What I saw for the first time is this.

I found this function in another function generating prime numbers.

which I hardly understand at the time, even though it looks very interesting. Fortunately I have better understanding about it, so I'd like to share.

How to use (test)

This source code is written in literate haskell and you try to use in ghci like below:

shell> curl -sL https://raw.githubusercontent.com/jeongoon/jeongoon.github.io/main/posts/2022-06-05-foldt.lhs -o FoldTree.lhs
shell> ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/myoungjin/.config/ghc/ghci.conf
λ= :l FoldTree.lhs 
[1 of 1] Compiling FoldTree         ( FoldTree.lhs, interpreted )
Ok, one module loaded.
λ= take 7 . foldt $ [[3,6..], [5,10..], [7,14..]]
[3,5,6,7,9,10,12]

Module FoldTree

module FoldTree (foldt) where

foldt :: Ord a => [[a]] -> [a]

foldt

foldt takes a list of list of series of numbers and return a merged (union) and sorted list. It is more like making a rope with the strings.

However, there are some assumption about usage:

Each inner list contains sorted already in ascending order
All the (outter) groups are sorted by order of the first element of each group.

So If we have multiples of number three and five and seven,

λ> make_some_multiples n = [ n, n+n .. 20 ]
λ> three_multiples = make_some_multiples 3
λ> take 5 $ three_multiples
[3,6,9,12,15]
λ> -- ^ note: already sorted
λ> five_multiples = make_some_multiples 5
λ> seven_multiples = make_some_multiples 7

We need to apply the list of each multiples in this way:

λ> foldt [ three_multiples, five_multiples, seven_multiples ]

Note: In general, if your implementation has specific limitation on input value, you might need to write the function name as something imposing the limitation or your own data type.

Another option would be phantom type¹.

Those assumptions help for our thunk²s to care about only the members of list in the future, which goes pretty well with the nature of lazy evaluation!

The original foldt implementation is for the infinite list has less condition, however, I'd like to apply foldt to fixed size of list so has more edge cases:

foldt [] = []
foldt ([]:_) = []

So, on second condition, we can return empty list if the list of left hand side has empty body. Because each multiples has longer list if the initial number is less. (so if longer list is empty, we don't need to check shorter list)

foldt ((x:xs):t) = x : unionSort xs (foldt (pairs t))

The last pattern is for general condition. And foldt appears in the end again to make recursive call.

Basically the first element of leftmost group has lowest value, so it will be the first element in the result. This is the basic concept of foldt. And rest of list (xs) will be union-ed with rest of foldt-ed list.

unionSort will remove a duplicated member and take the element at lowest value out of the both list.

unionSort

unionSort :: Ord a => [a] -> [a] -> [a]
unionSort [] ys = ys
unionSort xs [] = xs
unionSort xs@(x:xt) ys@(y:yt) =
  case x `compare` y of
    LT -> x : unionSort xt ys -- x used; do unionSort on rest of them
    EQ -> x : unionSort xt yt -- x used; y is duplicated
    GT -> y : unionSort xs yt

unionSort also has recursive call to finish the union and sort on rest of members which depend on the value chosen for the head of result.

pairs

pairs do the same sort method used in foldt. foldt takes only one group each time, on the other hand, pairs tries to take every two groups each time. If the pair is not available, it returns empty or the leftmost group so that foldt will end its job earlier. Those edge cases are also only required when applied to finite list.

pairs :: Ord a => [[a]] -> [[a]]
-- edge cases ...
pairs [] = []
pairs ([]:_) = [] -- left always has longer list; no need to go further
pairs (ms:[]) = [ms] -- just return leftmost group

The second pattern matching will reduce the searching time as well -- as we saw in the pattern matching of foldt.

Those edge cases could be different when the different types of series of numbers are used.

pairs ((m:ms):ns:t') = (m : unionSort ms ns) : pairs t'

If two groups are taken to perform union, m is supposed to be the lowest value and will be the first value of pairs function -- which is actually doing amazing job to wire all the foldt and pairs smoothly because it doesn't need to go further to get first value.

Note: there is no doubt that less evaluation tends to be more efficient in lazy computation.

To organize the rest of them unionSort will be applied on rest of between two groups, and pairs will pursue the tail of the code again to finish the job.

This is one of most beautiful piece of recursive programming.

foldt examples

sumOfMultiples

This task is introduced at exercism.org. Even though, we could solve this problem by checking divisibility of all the member numbers which is given as factor list, it was worth to try because foldt is fast enough to solve by union-ing the numbers and get only one of common multiples if some are duplicated.

sumOfMultiples factors limit =
  sum $ foldt [ [n,n+n..(limit -1)] | n <- factors', n > 0 ]
  where factors' = sort factors
  -- note: sort is required due to the foldt has assumption!

λ= sumOfMultiples [2,3,5,7,11] 100000
3961010983

the origin of foldt - prime numbers

the original file contains a lot more implementation of prime numbers³.

Too many recursive call probably makes us confused at first. But if you know how foldt acts in there, it will be easier to figure out how it works!

To be honest, This code is still hard for me to understand. Or even if I could understand it, I don't think I could invent something like this. 😅

The basic idea of the generating is called Sieve of Eratosthenes. But foldt accomplish the task in unique way to perform sieving when compared to imperative pseudo code.

We could think minus function is to used sieve and foldt function used to eliminate the duplication and to sort.

module PrimeNumber
  ( primesTME
  ) where

primesTME :: [Int]
primesTME =   2 : ([3,5..] `minus` foldt [ [p*p,p*p+2*p..] | p <- primes_ ])
  where
    primes_ = 3 : ([5,7..] `minus` foldt [ [p*p,p*p+2*p..] | p <- primes_ ])
    foldt ~((x:xs):t) = x : union xs (foldt (pairs t))
    pairs ~((x:xs):ys:t) = (x : union xs ys) : pairs t

minus :: [Int] -> [Int] -> [Int]
minus xs@(x:xt) ys@(y:yt) = case compare x y of
                              LT -> x : minus xt ys
                              EQ ->     minus xt yt
                              GT ->     minus xs yt

union :: [Int] -> [Int] -> [Int]
union xs@(x:xt) ys@(y:yt) = case compare x y of
                              LT -> x : union xt ys
                              EQ -> x : union xt yt
                              GT -> y : union xs yt

As you can see foldt and pairs are now much simpler because it works on the infinite list. and ~ prefix will calm down the ghc because it tells ghc to trust that our pattern matching will serve all the cases even though it doesn't look like exhaustive.

I was able to compile this module without ~. I guess that there are some still possibility for older ghc complier to complain about non-exhaustive pattern matching.

What we can do more?

I was thinking about making business hours by using foldt. A set of trading hours could rely the public holiday and day light saving time if applicable. (because most of times, UTC will be preferred to save time information) or each weekday could have the different schedule like "shopping day" (which is Thursday in Australia).

So we could have many groups of time tables and get the final trading hours list by applying union and sort them and also filter them by removing some special occasion or public holidays!

It sounds the usage of foldt is suitable for this tasks, doesn't it?

I hope I could write an article about it! Thank you for reading!

DEV Community