Iteration, recursion, and tail-call optimization in Elixir

#elixir #programming #beginners

Elixir uses recursion for looping over data structures like lists. Although most of it's hidden away through higher-level abstractions on the Enum class, in some cases writing a recursive function yourself can be more performant. In this episode of Elixir Alchemy, we’ll try to find some of these cases.

Along the way, we’ll learn about body-recursive and tail-recursive functions. Although the latter is often touted as being faster, we’ll also look at cases where that’s not necessarily true. Let’s dive right in!

Enumeration

Elixir provides functions on the Enum module to enumerate over collections.

def sum_numbers1(list) do
  list
  |> Enum.filter(&is_number/1)
  |> Enum.reduce(0, fn n, sum -> n + sum end)
end

This example takes a list and returns the sum of all numbers in that list. Because there might be non-numeric items in the input list, it uses Enum.filter/2 to select only the items that is_number/1 returns true on. After that, the remaining values are added together through Enum.reduce/3.

sum_numbers1([1, 2, "three", 4, %{}]) # => 7

While this solution works, it iterates over the whole list twice to get to the result. To make this more performant, we might instead choose to write a recursive function that can do this in one go.

Recursion

A recursive function can do both tasks (removing the non-numeric items and adding the rest together) in one run. It will call itself for every item in the list, and use pattern matching to decide what to do with each item.

# 1
def sum_numbers2([head | tail]) when is_number(head) do
  sum_numbers2(tail) + head
end

# 2
def sum_numbers2([_head | tail]) do
  sum_numbers2(tail)
end

# 3
def sum_numbers2([]) do
  0
end

Instead of using the functions provided by the Enum module, this solution calls itself until the list is empty. It does so using three function heads:

When the first item in the list (the head) is a number, we'll call sum_numbers2/1 with the rest of list (anything but the head, called the tail). This call returns a number we'll add our current head to.
When the head is not a number, we'll drop it by calling the sum_numbers2/1 function with just the tail.
When the list is empty, we'll return 0.

sum_numbers2([1, 2, "three", 4, %{}]) # => 7

When calling this function with [1, 2, "three", 4, %{}], the sum_numbers2/1 function is called repeatedly in the following order to get to the result:

sum_numbers2([1, 2, "three", 4, %{}])
sum_numbers2([2, "three", 4, %{}]) + 1
sum_numbers2(["three", 4, %{}]) + 1 + 2
sum_numbers2([4, %{}]) + 1 + 2
sum_numbers2([%{}]) + 1 + 2 + 4
sum_numbers2([]) + 1 + 2 + 4
0 + 1 + 2 + 4

This list shows the how the function is simplified until we get to a result, which shows the function being called six times. Once for each item, plus once more for the empty list at the end.

Because the function calls itself for each iteration, each item in the list causes a stack frame to be added to the call stack. That’s because each item warrants a call to a function, which needs to be executed to claim its result. Adding a stack frame to the call stack requires some memory to be allocated, and can cause some overhead.

Tail recursion and tail-call optimization

To keep the memory footprint to a minimum, some languages—like Erlang and thus Elixir—implement tail-call optimization. With a small rewrite of our code, we can prevent the stack frame being added and that memory allocated.

# 1
def sum_numbers3(list) do
  do_sum_numbers3(list, 0)
end

# 2
defp do_sum_numbers3([head | tail], sum) when is_number(head) do
  do_sum_numbers3(tail, sum + head)
end

# 3
defp do_sum_numbers3([_head | tail], sum) do
  do_sum_numbers3(tail, sum)
end

# 4
defp do_sum_numbers3([], sum) do
  sum
end

This example is yet another implementation of the function from before. However, this example is tail-recursive, meaning it doesn’t need to await a call to itself before continuing. It does so by keeping an accumulator (sum) that keeps the current value instead of stacking function calls.

The sum_numbers3/1 function is the public function, which calls the private do_sum_numbers3/2 function with the passed list and an accumulator of 0 to start with.
Much like the previous example, do_sum_numbers3/2's first function head matches lists with a numeric head. It calls itself with the list's tail, and the head added to the accumulator.
When the head is not a number, the third function head drops it by calling itself with the tail and the current accumulator, without changing anything.
Finally, the exit clause returns the accumulator.

By calling itself at the end of the function and passing the accumulator with all calculated values, the Erlang VM can execute a trick called tail-call optimization, which allows it to circumvent adding new stack frames. Instead, it uses the current frame by jumping back up to the beginning of the function.

Tail call optimization allows functions calling themselves without running into stack problems.

def sleep, do: sleep

This example implements a function that continually calls itself. Without tail-call optimization, this function would produce a stack error.

Which is faster?

We've written three implementations of the same function. The first used Enum.filter/2 and Enum.reduce/3 to filter and sum the list, and iterated over the list twice. To fix that, we used a recursive function that did it in one go, which we further improved using tail-call optimization.

letters = Enum.map(?a..?z, fn x -> <<x::utf8>> end)
numbers = Enum.to_list(0..10_000)
list = Enum.shuffle(letters ++ numbers)

Benchee.run(
  %{
    "Enum.filter/2 and Enum.reduce/3" => fn -> Recursion.sum_numbers1(list) end,
    "Body-recursive" => fn -> Recursion.sum_numbers2(list) end,
    "Tail-recursive" => fn -> Recursion.sum_numbers3(list) end
  },
  time: 10,
  memory_time: 2
)

To benchmark our solutions, we'll use Benchee. We'll prepare a shuffled list of strings and numbers for each function to sum for ten seconds.

Name                                      ips        average  deviation         median         99th %
Tail-recursive                       107.35 K        9.32 μs    ±21.39%           9 μs          14 μs
Body-recursive                        55.65 K       17.97 μs   ±662.77%          14 μs          34 μs
Enum.filter/2 and Enum.reduce/3       20.86 K       47.94 μs    ±51.46%          46 μs          85 μs

Comparison:
Tail-recursive                       107.35 K
Body-recursive                        55.65 K - 1.93x slower
Enum.filter/2 and Enum.reduce/3       20.86 K - 5.15x slower

Memory usage statistics:

Name                               Memory usage
Tail-recursive                              0 B
Body-recursive                              0 B
Enum.filter/2 and Enum.reduce/3         16064 B

In this test on this machine, we see that the body-recursive version is almost three times as fast as the original implementation that used Enum.filter/2 and Enum.reduce/3. The tail-recursive version was almost twice as fast ad the body-recursive one.

Is tail-call recursion always faster?

While the results of that benchmark look quite convincing, tail-recursion isn't always faster than body recursion. In fact, that's one of the 7 myths of Erlang performance.

While tail-recursive calls are usually faster for list reductions—like the example we’ve seen before—body-recursive functions can be faster in some situations. That's often true when mapping over a list, where the function takes a list and returns a list without reducing it to a single value.

def double_numbers1(list) do
  list
  |> Enum.filter(&is_number/1)
  |> Enum.map(fn n -> n * 2 end)
end

def double_numbers2([]) do
  []
end

def double_numbers2([head | tail]) when is_number(head) do
  [head * 2 | double_numbers2(tail)]
end

def double_numbers2([_head | tail]) do
  double_numbers2(tail)
end

def double_numbers3(list) do
  list
  |> do_double_numbers3([])
  |> Enum.reverse()
end

defp do_double_numbers3([], acc) do
  acc
end

defp do_double_numbers3([head | tail], acc) when is_number(head) do
  do_double_numbers3(tail, [head * 2 | acc])
end

defp do_double_numbers3([_head | tail], acc) do
  do_double_numbers3(tail, acc)
end

This example is a lot like the example we've seen before, but instead of adding all numbers together and reducing them to a single number, the double_numbers/1 function doubles all numbers and returns them in a list.

Like before, we have three implementations. The first uses Enum.filter/2 and Enum.map/2 to iterate over the list twice, the second is body-recursive and the last is tail-recursive. Before returning, the tail-recursive function needs to reverse the list, because the values get prepended to the accumulator.

Note: Check out the example project for a comparison of more possible implementations, like list comprehensions and a tail-recursive version that doesn't reverse the list.

Name                                   ips        average  deviation         median         99th %
body-recursive                     36.86 K       27.13 μs    ±39.82%          26 μs          47 μs
tail-recursive                     27.46 K       36.42 μs  ±1176.74%          27 μs          80 μs
Enum.filter/2 and Enum.map/2       12.62 K       79.25 μs   ±194.81%          62 μs         186 μs

Comparison:
body-recursive                     36.86 K
tail-recursive                     27.46 K - 1.34x slower
Enum.filter/2 and Enum.map/2       12.62 K - 2.92x slower

Memory usage statistics:

Name                            Memory usage
body-recursive                      15.64 KB
tail-recursive                      20.09 KB - 1.28x memory usage
Enum.filter/2 and Enum.map/2        31.33 KB - 2.00x memory usage

When running a benchmark for each of these implementations, we can see that the body-recursive version is fastest in this case. Having to reverse the list in the tail-recursive version negates the improved performance tail-call optimization adds in this specific case.

As always, it depends

This concludes our look into recursion in Elixir. As a rule of thumb; tail-recursive functions are faster if they don't need to reverse the result before returning it. That's because that requires another iteration over the whole list. Tail-recursive functions are usually faster at reducing lists, like our first example.

Which is the best approach depends on the situation. For mission-critical loops, writing a benchmark to measure which solution is faster is usually your best bet, as it's not immediately obvious which will perform better.

To learn more about recursion, also be sure to read this overview of the available methods, and when to use which. If you liked this article, don't forget to subscribe to the mailing list if you'd like to read more Elixir Alchemy!