loading...
Cover image for Overly Functional C++: The BenFolds Five

Overly Functional C++: The BenFolds Five

deciduously profile image Ben Lovy ・4 min read

Two-Part Series, Chapter 3

What a plot twist! I made almost no effort to avoid the pun and I'm only a little sorry.

Despite being a trilogy now, this post stands alone if you're familiar with the concept of the fold or reduce higher-order function.

In part 2 of yesterday's minddump, I documented my first stab at a specific fold in C++. I had gotten stuck, though, in trying to make it generic.

I've said it before and I'll say it again: ask DEV stuff, they know things. Thanks to @curtisfenner and @markboer I was able to write the intended generic fold function I had set out to write initially with almost no modification. This newly-generic fold function is a building block, and if you give a nerd a fold...

Five Library Functions

BenFolds is a class with five static member functions. It defines BenFolds::fold() as a generic version of the code from part 2, and then defines or, any, elem, and map in terms of this fold. I'll walk through each, or you can grab the gist. This gist compiled and executed successfully for me using g++ -std=c++11 -o benfolds benfolds.cpp on g++ 9.2.0.

Fold

This is the only definition with any substance, the rest will all specify parameters to run through this fold:

template <typename T, typename R, typename BinOp>
static R fold(std::vector<T> elems, R acc, BinOp func)
{
    int length = elems.size();
    if (length == 0)
        {
            // base case - empty list
            // Sub-problem is trivial - we're already done.  The passed accumulator holds the result
            return acc;
        }
        else
        {
            // Calculation is not complete - sub-problem is also trivial
            // Call function on accumulator using first element in vector as operand
            R newAcc = func(acc, elems[0]);
            // Create a new input from the second element onward of the current input
            std::vector<T> newInput(elems.begin() + 1, elems.end());
            // Recur with rest of list and new accumulator
            return fold(newInput, newAcc, func);
        }
}

This implementation is identical to the solution from part 2 apart from the parameterized types. The other four "backup" functions are just specific cases of this fold.

As @curtisfenner pointed out, this naive implementation is needlessly expensive. The recursion is in tail position, which the compiler does optimize for, but each call is allocating a brand new vector to swap into the stack frame. If you needed to optimize this further, you could consider instead passing ever-smaller subslice references to the same vector, or even mutating the original vector in place instead.

foldOr

The simplest application just takes a list of booleans and tells you if any of them are true. The T in BenFolds::fold() is bool:

static bool foldOr(std::vector<bool> bools)
{
    return fold<bool>(bools, false, [](bool acc, bool curr) { return acc || curr; });
}

The initial accumulator is false. We call || on this accumulator against each element of the passed collection. At the end, if any element was true, the accumulator flipped to true and stuck. Otherwise, nothing was found it's still false.

To test, I added a quick throwaway to main():

vector<bool> bools = {false, false, false, true, false};
cout << "Testing foldOr...\nThis should be true: " << bf.foldOr(bools) << "\n";

foldAny

The any function is just a generalization of or:

template <typename T, typename Predicate>
static bool foldAny(std::vector<T> elems, Predicate p)
{
    return fold(elems, false, [p](bool acc, T curr) { return acc || p(curr); });
}

It's really almost the same thing. We're still calling || on the accumulator and this element, but we're passing the element through some other predicate p before checking for truth.

For example, see if the inputs has any even numbers (it should) or anything greater than 10 (it shouldn't):

cout << "Testing foldAny...\nAre there even elements in the set: " << bf.foldAny(nums, [](int n) { return n % 2 == 0; }) << "\n";
cout << "Are there elements greater than 10: " << bf.foldAny(nums, [](int n) { return n > 10; }) << "\n";

foldElem

The elem function checks if an element exists in a collection, which is a specialization of any so we can reuse that definition:

template <typename T>
static bool foldElem(std::vector<T> elems, T elem)
{
    return foldAny(elems, [elem](T curr) { return elem == curr; });
}

It calls foldAny defining the predicate as equality against a specific element. We can test by checking for a specific number, no need to pass a lambda:

cout << "Testing foldElem...\nIs the number 2 present in the set: " << bf.foldElem(nums, 2) << "\n";
cout << "Is the number 8 present in the set: " << bf.foldElem(nums, 8) << "\n";

foldMap

We can also define a map function from this fold that builds the target vector in the accumulator:

template <typename T, typename Op>
static std::vector<T> foldMap(std::vector<T> elems, Op func)
{
    return fold(elems, std::vector<T>(), [func](std::vector<T> acc, T curr) {
        acc.push_back(func(curr));
        return acc;
    });
}

I tested this one by doubling each element in the input:

cout << "Testing foldMap...\nHere's each element doubled: " << bf.foldMap(nums, [](int elem) { return elem * 2; }) << "\n";

Running through all the tests as defined yields this output:

Set: [0, 1, 2, 3, 4]
Testing fold...
Sum: 10
Testing foldOr...
This should be true: 1
Testing foldAny...
Are there even elements in the set: 1
Are there elements greater than 10: 0
Testing foldElem...
Is the number 2 present in the set: 1
Is the number 8 present in the set: 0
Testing foldMap...
Here's each element doubled: [0, 2, 4, 6, 8]

Good enough

Photo by Oscar Keys on Unsplash

Posted on by:

Discussion

markdown guide
 

The Brick song. Man that takes me back.

Anyway, just a note that fold does not have to be declared functional-like, and it can still be used for FP. As long as mutations stay local to the function (only mutating your own declared variables), the function is still deterministic overall. For example F# implements List.fold with an imperative loop for perf reasons:

    module List =
        ...

        let fold<'T,'State> folder (state:'State) (list: 'T list) = 
            match list with 
            | [] -> state
            | _ -> 
                let f = OptimizedClosures.FSharpFunc<_,_,_>.Adapt(folder)
                let mutable acc = state
                for x in list do
                    acc <- f.Invoke(acc, x)
                acc

And even if using a recursive loop, the compiler does something similar for you with TCO. But not optimized for a list since recursive loops can have early exit or indefinite iterations.

For my own code I will probably use a recursive loop, depending on what it is, and avoid mutation. But for core data structures I might opt for performance.

I realize this is for your exploration. I suppose this is more a note for other readers. :)

 

Thanks for pointing that out! I think learning FP for the first time via Haskell has had some lingering biases, this is definitely important to note. Implementing the fold here by instead mutating in place outside the body of the function is indeed a ton more efficient, and actually about the same level of effort to implement, but you lose your guarantees.

That's interesting to see in F#...can always count on you for a reminder I really want to keep that in my toolbox!

 

Not interested in c++ whatsoever, but just had to give you a shout for mentioning Ben Folds Five. Takes me back to high school!