Background
Have you ever written a factorial
function? If you have then you might have done something like:
function factorial(n) {
let result = n;
for (let i = n - 1; i > 1; i++) {
result = result * i;
}
return result;
}
Or even something like:
function factorial(n) {
return a > 1
? n * factorial(n - 1)
: 1;
}
Both are valid approaches but there is something about the second approach that makes it easier to understand what it is doing, we can easily read what factorial
does is n * (n - 1)!
and that it calls itself until n
equals 1
and then we finish, that is what we call a recursive function:
A function that either calls itself or is in a potential cycle of function calls.
The problem
Recursion is great, it helps us write more concise, readable and simple code. However, there is a big drawback regarding recursion, take for instance our factorial
function, when we call factorial(5)
we get 120
, however, if we call the same function with a way bigger value, let's say 12,000
we get a whole different result:
RangeError: Maximum call stack size exceeded
You see, every runtime has a maximum stack size (node has a limit of around 11k), so when we make long recursive cycles our program crashes as there is no more stack space.
The solution
Fortunately, there is an alternative that allows us to safely write recursive functions, Tail Call Optimization.
TCO is a process that many languages have implemented to deal with long recursive chains. It is based on the premise that when a procedure/function calls a subroutine as its final action, then it is possible to replace the current call stack frame with the frame of the new invocation, hence, being as performant as the looped version of that function.
So, how would we modify our factorial
function to fulfill this constraint? We could do the following:
function factorial(n, acc = 1) {
return n > 1
? factorial(n - 1, n * acc)
: acc;
}
As you see we added a property, acc
, which allows us to pass down any relevant information (our current accumulative product) to the next factorial invocation, thus, rendering all the information of the previous call useless and allowing us to get rid of that stack frame, so, instead of having 11k+ stack frames, we would be replacing the same frame 11k+ times.
Pretty neat right?
Sadly, even though TCO is part of the javascript specification, many engines have decided not to implement it.
An interesting alternative
Despite this, there is still a safe way to use recursion. We can implement our own version of TCO.
According to what we have seen of TCO our aim should be to make a way for recursive functions to behave in a way that instead of having a linear growth of the stack size we keep a constant size, so let's ask ourselves, what control-flow structure do we know that behaves that way? Loops! So what if we had a loop that executed functions repetitively? Well, that is what we call a trampoline.
A trampoline is a special kind of loop that executes thunked-functions, that is, functions that return the next function to call. So, what if we converted each of our recursive calls into a thunk, and pass it to a trampoline? Would our stack maintain a constant size? Let's see:
First, we've got to rewrite our factorial function to be a thunked-function, which would be something like:
function factorial(n, ret = res => res) {
return n > 1
? () => factorial(n - 1, res => ret(n * res))
: ret(1);
}
Let's analyze what we did there, shall we?
- We added an argument to the function signature,
ret
, which as you see is a function, that fulfills a special role, it allows us to compose our thunks. - We now return a function instead of the value of the factorial computation, by doing that we intend to defer the execution of that function until our trampoline decides to call it.
So let's get into our trampoline implementation.
As we said a trampoline is a loop that executes thunked-functions one at a time, so, taking advantage of the decorator pattern we could write the following:
function trampoline(fn) {
return function(...args) {
let result = fn(...args);
while (result && typeof result === 'function') {
result = result();
}
return result;
};
}
As you realize the implementation is rather simple, we decorate our recursive function with our trampoline in order to do TCO. There are somethings worth noticing here:
- The
while
runs until there are no morefunctions
to call. - Our
fn
parameter is only used at the beginning since each result represents the nextfunction
to call.
So our final result would be something like:
As you can see our call stack never steps pass the 13
frames, which allows us to work with longer recursive chains without worrying about a stack overflow.
A little extra
Even though the trampoline
function works nicely I would still add something else to our API, a Symbol
! yeah, one of those new things with ES6 that allow us to do metaprogramming, so my final implementation would be:
function factorial(n, ret = res => res) {
return n > 1
? {
fn: () => factorial(n - 1, res => ret(n * res)),
[Symbol.for('recurse')]: true
}
: ret(1);
}
function trampoline(fn) {
return function(...args) {
let result = fn(...args);
while (result && result[Symbol.for('recurse')]) {
result = result.fn();
}
return result;
};
}
// Or with Decorator syntax
@trampoline
function factorial(n, ret = res => res) {
// ...
}
That way we can be sure that we stop when we are supposed to, not after.
Finale
Recursion is great, one of the pillars of functional declarative programming, however, it has an interesting drawback, which can cause some unintended issues. Here we saw how to optimize a recursive call using tail calls. It is also important to note that by making the execution path more complex the performance (time-wise) decreases, so use this method with consideration and avoid adding extra layers of complexity where is not needed.
I hope you find this article useful, please let me know what you think about this implementation of TCO.
p.s. While doing research in this topic I stumbled with this awesome article which mentions the possibility of accomplishing a similar effect using python generators, so I'll be researching a way to use ES6 generators to improve the way we are optimizing our recursive calls.
Top comments (11)
I think this solution will keep a growing linked list of all the
ret
callbacks in heap memory for each loop of the trampoline. Onceret(1)
is executed in the base case forfactorial
, the whole chain of length nret
functions will get called one after the other:This is rather wasteful: Yes, we get around the problem of overflowing the stack, but we're just moving that memory overhead from the stack over to the heap.
However, on reflection, it occurs to me that we can implement tail call optimization for the trampolined version of the code too. That way we don't have to keep all those callbacks around! Here is the code with the tail call optimization added:
I think this is the way to go. We realize the full benefit of tail call optimization: We're neither wasting space on the stack nor on the heap.
@mortoray seems to suggest that the overhead of using the trampoline is itself a performance penalty, but I don't really agree with that idea. To put it another way: If we're concerned about that level of performance, we should be coding in something other than JavaScript in the first place, like C, C++, Rust, Go, etc...
My concern is about the creation of the closure
() => factorial(n - 1, n*acc)
. This involves the creation of an object, presumably on the heap. Your tail-call-optimization won't change this: you still allocate the same number of objects.For a low-call count function like factorial the performance loss is likely negligible. I worry about such structures though because it's easy to start creating too many such wrappers and start suffering layers of overhead. As you do this you likely avoid the ability of the optimizers to make up for JavaScript's innate inefficiency. If you were to start doing high-call count functions this way you might start seeing performance issues.
I know premature optmization is bad, but there's also a problem if we blindly ignore potential performance issues.
That is right, the thunks would be allocated on the heap, so for a simple function like
factorial
a solution like this might just be overkill, however for more complex algorithms it could work. Still, I'm doing more research on the topic of continuation in javascript in order to find a way to improve efficiency.@nestedsoftware @mortoray
That's an interesting point that having higher-order functions can potentially obscure what's going on from the VM and thereby prevent the JIT from performing certain optimizations. I hadn't thought about that at all. That makes sense and it wouldn't surprise me. It's hard to know what effect it would have, but I guess you're right that it's an area where one should tread with some caution if performance is a consideration.
Yes, adding the tail call optimization doesn't reduce the number of factorial 'thunks' that are created. I believe it (hopefully) will make a thunk available for garbage collection each time that the trampoline loop runs rather than having to wait for the very end of the calculation.
At one point I did some reading on how objects are represented in memory in JavaScript, and it seems to allocate a lot of stuff on the heap, so I'm pretty certain that you're right: The thunks would likely be allocated on the heap as well.
Actually TCO is not implemented in the majority of the mainstream languages, so I avoid recursion by all means. Hopefully things will change now that Java and C# are adding FP goodies.
You can do hacks like these in dynamic languages but you'll not convince a lot of ppl to adopt them.
I continue to use recursion where it's difficult to implement an algorithm without (lots of tree structures come to mind), or where unrolling the recursion makes it harder to understand. So long as it isn't too deep it's fine.
Ah yes, if the amount of data is pretty deterministic and small and the iterative version is too cumberstone I do the same.
This is definitely interesting, and cool, but I'm not sure it's practical.
You might be trading stack space for heap space by using the trampoline. Closures, those returned functions, will require some kind of allocation and tracking. That is likely the dominant cost in the trampoline setup. I would thus argue that efficiency can't be the motivation.
If the motivation is clarity, I also tend to see the trampoline version as less clear than the original recursive form, and the loop form.
You are absolutely correct, closures can lead to memory leaks, specially in this type of situation. That's why it should be used carefully and only when necesary, being that as it may, I would have seriosly loved for TCO to have been implemented.
Just wondering: has anyone ever run into a stackoverflow that was not caused by a faulty implementation but by actually running out of stack space? If so I'm very interested in that actual case.
Hi Simon, actually I just ran into this stackoverflow question