DEV Community

loading...

Beautiful folds in F# - Part 1

shimmer profile image Brian Berns Updated on ・5 min read

Sequences

F# sequences have some interesting behavior. Because they are "lazy", the elements of a sequence are generated on-demand when you enumerate the sequence. This can be useful for large sequences, because its elements don't all exist in memory at the same time (unlike a true collection type, like a list or an array).

Be aware, however, that if you enumerate a sequence a second time, its elements are generated again from scratch. Thus, if a sequence generator has side-effects, those effects occur each time the sequence is enumerated. This can be a very bad outcome, and so side-effects in sequences should be avoided when possible.

In some cases, multiple enumerations of a sequence don't even produce the same results. For example, consider a sequence that represents a database cursor:

use rdr = cmd.ExecuteReader()
seq {
   while rdr.Read() do
       yield rdr.GetInt32(0)
}

This might yield some values the first time through, but subsequent iterations will all be empty because the reader can't be rewound (if it's even still open at all).

For the purposes of this article, we'll conjure artificially unrepeatable sequences that work fine once, but fail if enumerated again:

/// Creates a sequence that can only be enumerated once.
let unrepeatable items =
    let mutable active = true
    seq {
        if active then yield! items
        else failwith "Closed"
        active <- false
    }

Let's say we want to compute the mean of a sequence of integers (without using F#'s Seq.average library function). Naively, we could simply compute the sum of the sequence and its length and then divide to get the mean:

/// Computes mean from sum and length.
let toMean (sum : int) (length : int) =
    float sum / float length

/// Computes mean of a sequence by iterating it twice.
let meanNaive items =
    let sum = Seq.sum items
    let length = Seq.length items   // exception thrown here
    toMean sum length

let items = seq { 1 .. 10 } |> unrepeatable
printfn "%A" <| meanNaive items   // BOOM!

This blows up because meanNaive attempts to enumerate the sequence twice. Of course, we could try to avoid this problem by reading the entire sequence into an array and then computing the array's sum and length, but this requires enough memory to hold the entire sequence, which might be prohibitive for large sequences. Is there some way to compute the mean so that we only enumerate the sequence once, without keeping all its elements in memory?

Fold

One way is to manually track the sum and the length ourselves via a "fold":

/// Computes mean of a sequence by iterating it only once.
let meanManual items =
    let sum, length =
        Seq.fold (fun (sumAcc, lenAcc) item ->
            sumAcc + item, lenAcc + 1) (0, 0) items
    toMean sum length

In this version, we track the sequence's running sum in sumAcc and its running length in lenAcc, so at each step we compute sumAcc + item and lenAcc + 1. The two accumulators are combined in a tuple with initial value (0, 0), which we deconstruct at the end so we can divide the final sum by the final length in toMean. This version works fine and we can consider the problem solved... if this is the only computation we need to perform.

But what if there are many different computations we might want to perform on an unrepeatable sequence? For example, if we separately want to determine the sequence's minimum and maximum values, we have to hand code a new function that follows the same basic pattern. This seems like a potentially large amount of duplicate (and potentially buggy) code.

Refactor!

Surely we can refactor this code to avoid duplication. To do this, we need versions of Seq.sum and Seq.length and Seq.min and Seq.max that can be composed arbitrarily and then plugged into a fold.

First, we need something that holds just the information necessary to compute the sum of a sequence. In particular, we need a "step" function that accumulates the running sum, and we need the correct initial value for the running sum, which is 0. Let's write some quick-and-dirty code to try this idea out using a plain tuple:

let composableSum =
    (fun acc item -> acc + item),   // accumulate running sum
    0                               // initial value of running sum

Of course, the step function is just (+), but I've written it out with points to be clearer.

Now let's do the same thing for length, min, and max:

let composableLength =
    (fun acc item -> acc + 1),
    0

let composableMin =
    (fun acc item -> min acc item),
    Int32.MaxValue

let composableMax =
    (fun acc item -> max acc item),
    Int32.MinValue

For simplicity, I've cheated by using Int32.MaxValue and Int32.MinValue as initial values for min and max. This doesn't work well for empty sequences, but it's good enough to get started. (We should probably be using Option instead. Exercise for the reader.)

As the names suggest, we can compose two of these together to produce something that performs both in a single pass of the sequence:

/// Creates a new computation that's composed of the two given computations.
let compose (step1, init1) (step2, init2) =
    let step (acc1, acc2) item =
        step1 acc1 item, step2 acc2 item
    let init = init1, init2
    step, init

/// Infix version.
let (<*>) = compose

The composed step function assumes that the tracked value are in a tuple, which it updates. Similarly, the composed init function simply gloms the two individual initial values together in a tuple. We can easily compose multiple instances at once using the infix <*> version of the function . So, for example, we can define a computation that calculates a sequence's sum, length, min, and max all in a single pass like this:

composableSum <*> composableLength <*> composableMin <*> composableMax

Now that we've defined the computation, we need a way to run it using Seq.fold:

/// Runs the given computation on the given items.
let run (step, init) items =
    Seq.fold step init items

We have all the pieces we need, so let's take it out for a test drive:

let comp = composableSum <*> composableLength <*> composableMin <*> composableMax
let result = run comp (unrepeatable [3; 1; 4; 1; 9; 2; 6])

Success! We've correctly computed the sequence's sum, length, min, and max in a single pass, producing (((26, 7), 1), 9) as the final result. Note, however, that each layer of composition creates a nested tuple in the result, which seems unfortunate. Is there any way to flatten this out so the internal mechanism used in composing a computation isn't exposed to the caller?

We've covered a lot of ground already, so I'm going to pause here, and will address that last question in a follow-up, which is now available: Beautiful folds in F# - Part 2

Discussion

pic
Editor guide