Dave Cridland

Posted on Feb 26, 2019

Fibonacci++

#cpp

Ah, Fibonacci!

It's a classic interview question! Generally done with recursion, the Fibonacci sequence is a simple sequence where each item is the sum of the two previous items.

1, 1, 2, 3, 5, 8, 13 ... and so on.

I've lost count of the number of times people have used this as an example. In general, progammers are asked to write a program to calculate the Nth element in the sequence.

Ah, C++!

One of the fun things with C++ is that it gives you a lot of tools to choose from. It's held up as the ultimate multi-paradigm language. These days it has everything from coroutines and generators to templates. And they're all fast, so we're told.

So if that's the case, which technique is fastest?

Some boilerplate.

This isn't going to be terribly scientific, so we'll sling together a simple enough timer. I want this to be trivial to use, so it's just going to print out the number of ticks elapsed:

// A code-block timer. Just prints how long the scope block takes.
class Timer {
private:
    using clock = std::chrono::high_resolution_clock;
    std::chrono::time_point<clock> m_start;
public:
    Timer() : m_start(clock::now()) {}
    ~Timer() {
        auto end = clock::now();
        auto duration = end - m_start;
        std::cout << "Elapsed: " << duration.count() << std::endl;
    }
};

You're allowed to complain how horrible that is. To use it, just put it at the top of a scope block - it'll automatically print out the duration in some arbitrary unit at the end.

Try one: Classic Recursion.

Recursion is the traditional way of solving the Fibonacci problem.

A simple example looks like this:

long long fibonacci_recurse(long long i) {
    if (i <= 2) {
        return 1LL;
    }
    return fibonacci_recurse(i - 1) + fibonacci_recurse(i - 2);
}

It's straightforward, and easy to understand. We've special-cased the first two items in the list (as is common), and then simply return the previous two items, summed.

Simple enough. I'll call it with this code:

const long long target = 47LL;

// ...

    {
        Timer t;
        std::cout << "(Recurse) Fibonacci of " << target << " is " << fibonacci_recurse(target) << std::endl;
    }

It outputs:

(Recurse) Fibonacci of 47 is 2971215073
Elapsed: 6912089915

Obviously I don't know what units those are, off the top of my head, but I can tell you it's a lengthy pause.

Can we make this better?

Try Two: Constant Expression.

A simple optimisation available to us since C++11 is the constexpr keyword. This allows the compiler to make a number of assumptions about a function or expression, and - hopefully - makes things a lot faster. Let's see:

constexpr long long fibonacci_const(long long i) {
    if (i <= 2) {
        return 1LL;
    }
    return fibonacci_const(i - 1) + fibonacci_const(i - 2);
}

It's only one word different to the previous code, but the compiler should be able to elide many of the calls.

(Const Expr) Fibonacci of 47 is 2971215073
Elapsed: 6918560577

Oh, good. Our optimisation saved us over -6million ticks. Harrumph.

The fundamental problem is that while the compiler knows now that (for example) fibonacci_const(45) is always the same, it does not know that we'll be reusing it again in the next call down.

Indeed, we'll calculate fibonacci_const(10) around 2^37 times, if I've got this right, but every time in a different scope, making the compiler unable to see the optimisation we could make.

The additional code the compiler is inserting - probably to try and make these results reusable, ironically - is actually slowing us down slightly.

Maybe we could try something else?

Try Three: Vector

Building up a sequence in a container, rather than recursing, might be a more useful plan. It's the non-functional way of doing things, certainly, but perhaps it's a little faster? We'd definitely only be calculating each element once.

long long fibonacci_vector(long long i) {
    std::vector<long long> fib = {1LL, 1LL};
    while (fib.size() < i) {
        fib.push_back(fib[fib.size() - 1] + fib[fib.size() - 2]);
    }
    return fib[fib.size() - 1];
}

The code simply builds up the sequence in a vector, and when it's big enough, returns the value.

(Vector) Fibonacci of 47 is 2971215073
Elapsed: 13298

Now, as I say, I don't know how long a tick is here. But I do know that 13,298 is considerably fewer of them than 6,912,089,915. It seems we're onto a winning strategy.

Can we make it faster? Probably, yes - while std::vector is pretty efficient, we're building up a vector of values we'll never use again, which seems wasteful.

Try Four: Variables

If we get rid of the vector entirely, we can just copy variables about instead. It's not as pretty, but it'll work:

long long fibonacci_vars(long long i) {
    if (i <= 2) {
        return 1;
    }
    long long a, b = 1LL, c = 1LL;
    i -= 2;
    do {
        a = b;
        b = c;
        c = a + b;
    } while (--i);
    return c;
}

The code is getting a bit arcane, but it's just about readable. Let's look at the performance:

(Vars) Fibonacci of 47 is 2971215073
Elapsed: 2030

Yeah, so... That worked.

The only way we can make it faster would be to avoid the calculation entirely.

There's a thought.

Try Five: Template functions

C++'s templates give us the interesting possibility of calculating the sequence at build time rather than at runtime. That's not normally available to use for anything non-trivial, but for interview questions and programming examples anything's fair game, right?

We'll start off with a template version of our recursive solution:

template<long long i> constexpr long long fibonacci_template() {
    return fibonacci_template<i-1>() + fibonacci_template<i-2>();
}

There's no special-casing here for the first two values, though. We'll add that with a template specialization or two:

template<> constexpr long long fibonacci_template<1>() {
    return 1LL;
}

template<> constexpr long long fibonacci_template<2>() {
    return 1LL;
}

There. Now to try. The call is a little different this time:

    {
        Timer t;
        std::cout << "(Template) Fibonacci of " << target << " is " << fibonacci_template<target>() << std::endl;
    }

And we get:

(Template) Fibonacci of 47 is 2971215073
Elapsed: 1945

Well, that's disappointing. Faster, yes, but not by nearly as much as We'd like.

What went wrong?

Interlude

Just out of curiosity, I wonder:

    {
        Timer t;
        std::cout << "Absolutely nothing " << target <<  std::endl;
    }

Run that, and we get:

Absolutely nothing 47
Elapsed: 1925

Yes, it seems that most of this is overhead. Our variable test didn't even register, really. Raising the target value to 92 - the last element in the Fibonacci sequence to fit into a 64-bit long long - doesn't make much difference either (though the earlier tests take a long time).

We can shift things about to expose that a bit more:

    {
        long long r;
        {
            Timer t;
            r = fibonacci_template<target>();
        }
        std::cout << "(Template) Fibonacci of " << target << " is " << r << std::endl;
    }

That, for example, prints 37 as the number of ticks, quite consistently.

One last attempt: Template values

Still, we'll demonstrate one more solution. The template function trick might - in principle - have to perform a function call at runtime. There's a trick to avoiding that, but it means defining a static field in a class or struct.

template<long long i>
struct fibonacci_template_struct {
    static const long long value = fibonacci_template_struct<i - 1>::value + fibonacci_template_struct<i - 2>::value;
};

The above is a template meta-function - it's really a function definition masquerading as a templated struct. The value field is our return value.

Again, we'll need specialisations for the first two values:

template<>
struct fibonacci_template_struct<1> {
    static const long long value = 1LL;
};

template<>
struct fibonacci_template_struct<2> {
    static const long long value = 1LL;
};

And again, this will complete entirely at build time - but this time, there really is not chance of it doing anything at runtime. In fact, if you raise the target still further, then while the other examples here will overflow at runtime, this one will fail to compile, and the compiler will give you a template stacktrace.

Of course, this compiles down to just printing the answer at the end, and so is entirely uninteresting to run...

How fast is fast?

We can't, it seems, get this any faster - even the templating work wasn't really needed. But if we used a big number library, it might be useful.

But in practical terms, the ability to drop down to near-assembly, as in the Variables variant, is good enough for our purposes here. There's nothing there that's not available in ANSI C.

More complex problems (perhaps with a more complex operation than a simple binary addition) might deserve other treatments, though. And it's certainly true that all of these cases are useful in some circumstances.

Happy coding!

Top comments (4)

Bosley • Mar 20 '19

Good points here. I'm pretty sure your timer should be reporting in nanoseconds but chrono seems to have some variation on reporting depending on processor / os specifics. I figured I'd throw this up too because I thought it would be fun.

long long fib(long long n)
{
    static long long i = 2;
    static std::vector<long long> seq = {0LL, 1LL};
    if(n >= seq.size())
    {
        for (; i <= n; ++i)
        {
            seq.push_back( seq[i-1] + seq[i-2]);
        }
    }
    return seq[n];
}

The idea here is that initialization takes longer but any call made after can be retrieved in about half the time iff the following calls are < the largest seen. I realize that fib sequence tasks are not ones assumed to be called multiple times but its an optimization technique that came to mind when I read your post.

When I ran and timed this the first call takes the amount of time the standard linear solution would take, but following monotonically increasing calls only took the amount of time to calculate the delta of size() and N.

All calls under size() took around 2ms to complete.

Dave Cridland • Mar 20 '19

The timer reports in whatever it feels like. More boringly, it'll be measuring std::chrono::high_resolution::period::type::Denom ticks per std::chrono::high_resolution::period::type::Num seconds - and helpfully it might go backwards, because it's not explicitly stable.

Also, yes - incremental caching is always a good plan with any expensive calculation (or other resource that takes time to obtain). That said, I think my "variables" version is sufficiently fast that you might not need to bother unless you needed the cached values an awful lot.

I thought we might further improve performance (in my vector version, too) by calling seq.reserve(n) just before the for loop - that ought to reduce the memory allocation workload significantly - but in my tests it didn't do much.

Bosley • Mar 21 '19

I also think the "variables" version is more than sufficient, just offering another look.

I've noticed that whenever I've used .reserve(n) with vectors nothing really seems to be gained time-wise. Perhaps I haven't used them in a scenario where it would be of real benefit.

Dave Cridland • Mar 21 '19

No, I think you're right - in most cases, the allocators these days are simply good enough that the marginal benefit of reserving space isn't useful.