DEV Community

loading...

Improving Elm's compiler output

robinheghan profile image Robin Heggelund Hansen ・7 min read

Elm is fast.

This is not because of any innovation in the compiler. In fact, the Elm compiler hardly does any optimization at all.

Elm is fast because it compiles to Javascript, which dedicated and talented engineers from all over the globe have been optimizing for more than a decade.

But here's a question: does the Elm compiler output Javascript which makes it easy for browsers to optimize it and, if not, is there any performance to gain by changing how the compiler outputs Javascript?

Let's have a look.

Hidden classes

Javascript is a dynamic language. Javascript allows objects to change in shape at any time. Variables and data structures can contain different values of different types at any time. In practice, however, most programs are fairly static and browsers try to take advantage of this.

In Chrome, every object literal and class is seen as a shape. If the shape of an object or class changes, like a property being dynamicly added or removed, Chrome will see the resultant new shape and try to convert between them. In some cases, Chrome will just treat objects as hashmaps (kinda like Elm's Dict) if it has trouble figuring out the precise shape of something. We call these shapes for hidden classes.

We get the best performance when our Javascript code doesn't seem to deal with many different shapes at a time. Fortunately, Elm is a static language so this should be pretty common. There is, however, a case where Elm does produce different shapes that can pass as the same type. Let's look at an example:

type Maybe a
  = Just a
  | Nothing
Enter fullscreen mode Exit fullscreen mode

This is how Elm's Maybe is defined. It is compiled into the following Javascript (using --optimized):

var elm$core$Maybe$Just = function (a) {
    return {$: 0, a: a};
};

var elm$core$Maybe$Nothing = {$: 1};
Enter fullscreen mode Exit fullscreen mode

As you can see, the Javascript object literal for Just and Nothing has different shapes. As a result, every Javascript code which deals with Maybe has to be able to deal with two different shapes. But is that costly?

To measure the effect I've done two things:

1) Make a benchmark which can hopefully pick up the performance difference, then make two versions where one is handcoded to avoid the overhead.
2) Look at the assembly code which Node outputs (Node and Chrome uses the same JS engine, V8).

You can find the code for this experiment at github.

I've focused on Elm' built-in List type, as it's used in pretty much every program. A performance improvement here would be a big benefit to everyone who uses Elm.

We'll be focusing on the following benchmark:

benchmark "int +" <|
  \_ -> foldl (+) 0 intList
Enter fullscreen mode Exit fullscreen mode

Simply a left fold over all the elements, adding them together. A change in performance here will tell us how quickly we can iterate through a List, the theory being that removing any overhead dealing with multiple hidden classes should increase performance.

We compile the benchmark. Looking through the compiled JS output, we can find this piece of code:

var _List_Nil = { $: 0 };
function _List_Cons(hd, tl) { return { $: 1, a: hd, b: tl }; }
Enter fullscreen mode Exit fullscreen mode

The empty List looks different from a List element (if you're wondering how Lists work, I explained it decently well at Elm Europe 2017).

We copy the JS file, and make the following modificaiton:

var _List_Nil = { $: 0, a: null, b: null };
function _List_Cons(hd, tl) { return { $: 1, a: hd, b: tl }; }
Enter fullscreen mode Exit fullscreen mode

A List should no longer be polymorphic (of many shapes).

The result:

  • Firefox, before modification: 75,843 ops/sec
  • Firefox, after modification: 84,549 ops/sec
  • Safari, before modification: 248,531 ops/sec
  • Safari, after modification: 248,530 ops/sec
  • Chrome, before modification: 294,434 ops/sec
  • Chrome, after modification: 302,569 ops/sec

So, no difference in Safari, however both Chrome and Firefox see a pretty decent improvement: ~11% in Firefox, ~4% in Chrome. Keep in mind that this is something that can be implemented in the compiler, no Elm code would have to change, it's a performance improvement for no effort on the part of application developers.

We can also look at the code that V8 generates by running the following script:

node --print-opt-code --code-comments index.js > jit_log
Enter fullscreen mode Exit fullscreen mode

By reading the jit_log for the benchmark run without modifications, we can see:

--- Optimized code ---
optimization_id = 1
source_position = 48049
kind = OPTIMIZED_FUNCTION
name = $List$foldl
stack_slots = 10
compiler = turbofan
address = 0x28bd2bd6e9a1
Body (size = 2292)
Instructions (size = 2012)
<Assembly code>
Enter fullscreen mode Exit fullscreen mode

While for the modified code we see

--- Optimized code ---
optimization_id = 0
source_position = 48067
kind = OPTIMIZED_FUNCTION
name = $List$foldl
stack_slots = 10
compiler = turbofan
address = 0x2081135eec01
Body (size = 1848)
Instructions (size = 1600)
<Assembly code>
Enter fullscreen mode Exit fullscreen mode

As expected, it generated less code when the code doesn't have to deal with polymorphism.

There is, however, something in both of this logs which give me pause:

Inlined functions (count = 1)
 0x3f2705632551 <SharedFunctionInfo A2>
Enter fullscreen mode Exit fullscreen mode

This section of the log lists how many functions have been inlined. In this case, only a function which evaluates the passed in function has been inlined (I'll explain what this mean in a second). This isn't actually that surprising. foldl only contains a single function call, which is done once per loop. This function is usually never the same either, so it makes sense that it wasn't inlined. However, if you look at all the other functions that have been optimized, the only functions which are inlined are either functions which take a single argument (also called arity-1 functions) or functions called A2, A3, A4 etc.

What gives?

Inlining

Inlining function calls (replacing the function call with the implementation of the function) is one of the most important optimizations a compiler can make. This is not necessarily because function calls are all that expensive, but because it allows the compiler to better understand what the code does and perform optimizations based on that.

Let's look at our benchmark again:

benchmark "int +" <|
  \_ -> foldl (+) 0 intList
Enter fullscreen mode Exit fullscreen mode

Without inlining, this would call the foldl function, which is a loop calling a function for every element in the list. Since foldl can accept any function, it would store the intermediary number value as a reference (even though it could be stored as a number on the stack), performing a lookup in memory each time the function is called. If we weren't storing ints as intermediary values, as it would be the case if we were folding over other things, the Javscript optimizer would likely treat all values as a generic hashmap inside of foldl.

With inlining, however, this function is likely to be compiled down to a single javascript loop, without any function calls at all, and with specialized code for the types actually used in the loop. This is like getting the benefits of a monomorphising compiler, without actually having a monomorphishing compiler.

But why aren't many functions being inlined, and what are all these A2 things?

Currying

Elm has this concept of currying. Given the following function:

add : Int -> Int -> Int
add a b =
  a + b
Enter fullscreen mode Exit fullscreen mode

You can create a new function like this:

add2 : Int -> Int
add2 =
  add 2
Enter fullscreen mode Exit fullscreen mode

So if you call a function with enough arguments, it will execute. If you call a function without all the arguments it requires, it returns a new function which accepts the missing arguments.

This is how the above add function is compiled to JS:

function F(arity, fun, wrapper) {
  wrapper.a = arity;
  wrapper.f = fun;
  return wrapper;
}

function F2(fun) {
  return F(2, fun, function(a) { return function(b) { return fun(a,b); }; })

var author$project$Main$add = F2(function(a, b) {
  return a + b;
});
Enter fullscreen mode Exit fullscreen mode

Our add function, as well as every other Elm function, is wrapped in an object which contains the original function, the arity that function expects, and a curried function. Calling the function has to be done with A2, which is implemented like this:

function A2(fun, a, b) {
  return fun.a === 2 ? fun.f(a, b) : fun(a)(b);
}
Enter fullscreen mode Exit fullscreen mode

So A2 takes a F2 object, and calls the function directly if the provided function actually takes two arguments, or does a curried call if not.

From the perspective of a javascript engine, this has a big problem: unless we do whole program analysis (which is too expensive) there's no way to know if the function itself should be called, or if it is to be called using currying. The A2 call itself can be inlined, but nothing more.

But what if we made the Elm compiler smarter? If the Elm compiler knew how many arguments a function required, it could re-write this:

A2(author$project$Main$add, 1, 2)
Enter fullscreen mode Exit fullscreen mode

into

author$project$Main$add.f(1, 2)
Enter fullscreen mode Exit fullscreen mode

We make a copy of our benchmark, and make these changes by hand for every function call in our benchmark, and in the functions called by the benchmark.

This time we're going to focus on the results for the following function:

benchmark "* 2" <|
  \_ -> map (\a -> a * 2) intList
Enter fullscreen mode Exit fullscreen mode

The result:

  • Firefox, before modification: 24,291 ops/sec
  • Firefox, after modification: 50,927 ops/sec
  • Safari, before modification: 35,723 ops/sec
  • Safari, after modification: 49,029 ops/sec
  • Chrome, before modification: 39,253 ops/sec
  • Chrome, after modification: 58,491 ops/sec

Pretty good. The performance in Firefox is doubled, while we're seeing ~30% improvements in Chrome and Safari.

When looking at the inlining results of the unmodified code:

Inlined functions (count = 1)
 0x13f84c332341 <SharedFunctionInfo A2>
Enter fullscreen mode Exit fullscreen mode

We can see there's been some changes after the modifications:

Inlined functions (count = 5)
 0x1f31bec396e1 <SharedFunctionInfo $map>
 0x1f31bec395a9 <SharedFunctionInfo $foldr>
 0x1f31bec39541 <SharedFunctionInfo $foldrHelper>
 0x1f31bec32049 <SharedFunctionInfo F2>
 0x1f31bec31fe1 <SharedFunctionInfo F>
Enter fullscreen mode Exit fullscreen mode

However, looking over the generated assembly code I'm seeing a lot of lines containing the following:

call 0x1e5fad48abe0  (Call_ReceiverIsNotNullOrUndefined)
Enter fullscreen mode Exit fullscreen mode

When we're calling functions using someObject.f(args), Chrome has to make sure that someObject isn't null or undefined.

I've run one more benchmark. This time I've placed functions outside of F wrappers, and call them directly.

The result:

  • Firefox, before modification: 50,927 ops/sec
  • Firefox, after modification: 59,632 ops/sec
  • Safari, before modification: 49,029 ops/sec
  • Safari, after modification: 43,695 ops/sec
  • Chrome, before modification: 58,491 ops/sec
  • Chrome, after modification: 63,619 ops/sec

Chrome and Firefox see some nice speedups, ~16% for Firefox and ~8% for Chrome. Safari actually sees a slowdown but I don't know why. Re-running the benchmarks several times gives wildly different results, and I don't know what to make of that.

Conclusion

Elm is fast, but there's still room to become faster. By making changes to how the Elm compiler outputs Javascript, we can increase the performance of Elm programs by a significant margin.

Further reading

If you want to know more on how Chrome makes Javascript fast, these are the two best resources I came across:

Whats up with monomorphism
V8 Kinds

Discussion

pic
Editor guide
Collapse
justgage profile image
Gage

Awesome changes! Are these changes going to be implemented soon?

Collapse
robinheghan profile image
Robin Heggelund Hansen Author

The honest answer to that is: don’t know 😅

Collapse
justgage profile image
Gage

Corollary, are you helping Evan directly now?

Thread Thread
robinheghan profile image
Robin Heggelund Hansen Author

I'm a member of the core team, so I do talk with Evan from time to time. But I still write PRs which he may, or may not, merge.

Thread Thread
justgage profile image
Gage

I just thought it was a core team of one, so that's cool 😁

Thread Thread
robinheghan profile image
Thread Thread
jxxcarlson profile image
James Carlson

Wonderfully informative post.

Collapse
mshaka profile image
m-shaka

Interesting article! Can I translate it into Japanese?

Collapse
robinheghan profile image
Collapse
mshaka profile image
m-shaka

I'm sorry there is something that I can't understand in Inlining section.

  1. Is the intermediary number value is accumlator?
  2. What does we were folding over other things mean actually? And why aren't ints stored? In my knowledge, all local variables are stored on the stack or the heap.

I'm looking forward to your answer

Thread Thread
robinheghan profile image
Robin Heggelund Hansen Author
  1. yes
  2. other things like Booleans, Maps, Sets, Floats or other Objects. «If we weren’t storing ints» refers to the option of storing other intermediate values like objects.
Thread Thread
mshaka profile image
m-shaka

Thanks for your quick response! I understood.

I'm so sorry I have one more question, but this is last one.

It's about this sentence.

This is like getting the benefits of a monomorphising compiler, without actually having a monomorphishing compiler.

What I understand is below:
This talks about Inline Caching. If foldl is inlined, the function as argument of foldl is also inlined, so without Inline Caching code of inlined foldl's loop is specilized for argument functions type.
Is this right?

Thread Thread
robinheghan profile image
Robin Heggelund Hansen Author

This talks about Inline Caching => inline caches is something else, inlining is just replacing a function call with the implementation of the function.

without Inline Caching code of inlined foldl's loop is specilized for argument functions type => without inlining, the foldl's loop cannot be specialized because it has to handle all possible argument types.

Thread Thread
mshaka profile image
m-shaka

Thank you! You mean monomophishing is to specialize code by inlining function call, right? If so, I can't understand what without actually having a monomorphishing compiler means...

Thread Thread
robinheghan profile image
Robin Heggelund Hansen Author

monomorphising doesn't do inling. When a monomorphising compiler sees the following code:

listmap : (a -> b) -> List a -> List b
listmap fn list =
    List.foldl (\a acc -> fn a :: acc) list
    |> List.reverse

strs = listmap String.fromInt [ 1, 2, 3 ]

strs = listmap isEven [ 1, 2, 3 ]
Enter fullscreen mode Exit fullscreen mode

it will generate two versions of listmap, one which has the type (Int -> String) -> List Int -> List String and one which has the type (Int -> Bool) -> List Int -> List Bool. It will then use the specialized listmap implementations where it can.

This allows the compiler to specialize code better, because it has precise type information.

Thread Thread
mshaka profile image
m-shaka

Oh...I misread that you used "monomorphishing" like as this article explained.

I've published my translation! zenn.dev/mshaka/articles/633027bef...

I appreciate your patience and kindness!