Increase the speed execution of your functions with memoization

#javascript #webdev #tutorial #fibonacci

Today I'll try to explain to what is memoization and what could be an example of a use-case scenario. Keep in mind that I'm no expert in any way and that I'm just an enthousiast, just like some of you guys. I just happend to bump into this and I wanted to share what was my view on this subject. If I am mistaken in any way in this article, please let me know in the comment section below. We can all benefit from the correctness of the others!

Like a hash table

Memoization is a way to reduce the execution time of your functions. It does so by storing the result of every call to your functions. Like a hash table. Once you try to call the function with the same parameters than before, instead of going into the whole computation, it will just return the cached result. This, of course, help reduce the time needed for the function to return the expected result.

For instance, let's say we have a function called add. It takes two parameters being the numbers to add so that the definition of that function is

"use strict";

function add(number1, number2) {
    return number1 + number2;
}

We can now use our function and add some numbers.

add(1, 2); // 3
add(3, 4); // 7
add(1, 2); // 3

Referential transparency

There are two things to notice here. The first one is that our function is a pure function. This is an important concept to get in order to understand how memoization works. A pure function is a function that is free of side-effects and that always return the same result for the same parameters, also called referential transparency.

A side effect would make our function inpure, making its result unpredictable, thus canceling its property of being referentially transparent. Referential transparency is the fact that a function, that always return the same result for the same parameters, can always be replaced by its result anywhere in the code. This means that these two pieces of codes are equal.

console.log(add(1, 2) === 3);
console.log(add(3, 4) === 7);
console.log(add(1, 2) === 3);

console.log(3 === 3);
console.log(7 === 7);
console.log(3 === 3);

Now that we are sure that for a given set of parameters, we always have the same result, we could totally rewrite our function to get rid of that costy addition process, and use a dictionnary (or an object in JavaScript) to return the result and speed up our function.

"use strict";

function add(...parameters) {
    const cache = {
        [[1, 2]]: 3,
        [[3, 4]]: 7,
        [[5, 6]]: 11
    }; 

    return cache[parameters];
}

console.log(add(1, 2) === 3);  // true
console.log(add(3, 4) === 7);  // true
console.log(add(5, 6) === 11); // true

Gain a lot by caching

But when we try to add two numbers that are not cached, we would have to compute it ourselves. Think about those cached numbers as some numbers that got out of a statistic study showing the most added numbers. We could gain a lot by caching the most used numbers in addition, and compute the rest ourselves.

"use strict";

function add(...parameters) {
    const cache = {
        [[1, 2]]: 3,
        [[3, 4]]: 7,
        [[5, 6]]: 11
    }; 

    if (parameters in cache) {
        return cache[parameters];
    }

    return parameters[0] + parameters[1];
}

console.log(add(1, 2) === 3);   // true (cached)
console.log(add(3, 4) === 7);   // true (cached)
console.log(add(5, 6) === 11);  // true (cached)
console.log(add(7, 8) === 15);  // true (computed)

As you can see, cached numbers are the ones for the parameters that we anticipated. The rest gets computed like usual. But this is not really handy. In fact, the most used numbers in addition are changing from time to time and it is really not efficient to have a large cache to start with. What could be great is to feed our cache following the usage of our function. Like some sort of global variable that would hold the cache. This is what memoization is all about.

Let's use some more advanced use-case scenario. Like the Fibonacci sequence. If you are not at ease with maths, don't worry because we are two! But this is a great example to show you how you could benefit from memoizing a function. I think about the fibonacci sequence as a family tree that is growing exponentially.

Here is the recursive definition of this function.

"use strict";

function fibonacci(number) {
    if (number === 1) {
        return 1;
    }

    if (number < 1) {
        return 0;
    }

    return fibonacci(number - 1) + fibonacci(number - 2);
}

This means that each time we compute the fibonacci sequence of the N-1 and N-2 and add them together. The stopping condition is when we reach the 0th and 1st numbers of the sequence that we know are 0 & 1. But since it is a recursive function, and given the way the Fibonacci's sequence is computed, it will possibily be called multiple times with the same parameters. Let's try to compute the time necessary for the 40th number of the sequence.

const start = new Date().getTime();

fibonacci(40);

const stop = new Date().getTime();

console.log(`Fibonacci(40) executed in ${stop - start}ms.`);
// Fibonacci(40) executed in 1966ms.

It is hard to believe

Now let's try to compute it using memoization (I'll explain the details of the implementation in a minute).

let start = new Date().getTime();

console.log(fibonacci(40));
// 102334155

let stop = new Date().getTime();

console.log(`fibonacci(40) executed in ${stop - start}ms.`);
// Fibonacci(40) executed in 1966ms.

start = new Date().getTime();

console.log(memoizedFibonacci(1250));
// 7.674768958056894e+260

stop = new Date().getTime();

console.log(`memoizedFibonacci(1250) executed in ${stop - start}ms.`);
// memoizedFibonacci(1250) executed in 1ms.

And here is the implementation of the memoizedFibonacci function.

const memoizedFibonacci = (function() {
    const cache = {};

    return function $fibonacci(number) {
        if (number === 1) {
            return 1;
        }

        if (number < 1) {
            return 0;
        }

        if (number in cache) {
            return cache[number];
        }

        const result = $fibonacci(number - 1) + $fibonacci(number - 2);

        cache[number] = result;

        return result;
    };
})();

I admit it: it is hard to believe. So I suggest you test it on your own since practicing is the best way to learn. Note that if you are testing on an online playground like Repl It, try to use a smaller value for the unoptimized fibonacci version since it can possibly take much longer to compute on their servers.

I myself doubted it for a moment (I was not using any logging so I added it afterward while writing this article). But nothing is wrong here since I got this huge number. In fact, I cannot go beyond this value on my computer before having an Infinity value. Since I was not sure whether Node.js gave me Infinity because it could not compute this number or because there was a problem with my function, I search the most meaningful and higher value to demonstrate.

But not only it is way, way faster than the original definition of the function we wrote, we also have used a much higher value. And this all thanks to a closure and an object. That simple!

If you are not familiar with closure, think about it as a way to hold a value in a global way, while keeping this value only availble for our function (meaning the outside world wont modify this value, that way we are sure our cache is not poisoned by other modules in our code).

Also, I used an IIFE (Immediately Invoked Function Expression) to keep my cache in the scope of my closure. For the same purpose as explained above. Don't keep hitting your head on these terms if you are not familiar with them and just make a quick search to know more about them.

But what's even more powerful with memoization in this case is that now that we successfully computed the value for the 1250nth number of the sequence, we wont have to compute it at all since it is cached. Next calls to our memoizedFibonacci function with the parameter 1250 will only cost a comparison and an object access. Nothing more.

Conclusion

To sum up, I would say that memoization is part of a greater scheme that is functional programing. Referential transparency is what enables us to have a reliable cache, and thus using memoization to speed up consequent calls for the same parameters. This is not a magic method since it requires us to compute the value for a given set of parameters at least once. But it is really useful in a world of reusability and factorization, where we don't have to compute more than once a value.