DEV Community

Ary Borenszweig
Ary Borenszweig

Posted on • Updated on

Incremental compilation for Crystal - Part 1

I strongly believe that the Crystal programming language has great potential. The only thing that, in my mind, is holding it back a bit is its slow compile times, which become more and more noticeable as your project grows.

Why is the compiler slow? And can we improve the situation? This is what I'm going to talk about here, and think about solutions as we go.

Why the compiler is slow?

The main reason is that the compiler compiles the entire program, from scratch, every time you invoke it. Well, it at least does a full semantic analysis of your program: there is some caching going on regarding producing object files. But semantic analysis is always done, from scratch, for the entire program.

And when I say "the entire program" it's the code that you wrote, but also the code from shards you use, and even the standard library. And, well, it's not the "entire program" because only those types and methods that are effectively used need to be analyzed and compiled, but it can still be quite a lot of code.

The actual algorithms used in the compiler are good and fast. We could try to make them even faster. But, as you add more and more code to your program, inevitably, no matter how fast the algorithms, it will take more and more time to compile programs.

Why is semantic analysis done from scratch every time?

Not doing semantic analysis from scratch means doing it by chunks, and reusing that previous knowledge.

Most, if not all, compiled programming languages will analyze and compile individual files and then link them afterwards. Can we do the same in Crystal?

Let's consider this file:

# dependency.cr

def add(x, y)
  x + y
end
Enter fullscreen mode Exit fullscreen mode

Well, we have no idea what are the types of x and y. If we call it with two ints then we can type and compile that method with that information. And the same is true if we pass it two strings. Or any object that has a + method with an argument. So we can't type and compile that file just like that.

That's one thing. Another thing is not knowing what types resolve to. For example in this file:

# dependency.cr

def something
  Foo.bar
end

something
Enter fullscreen mode Exit fullscreen mode

Here something doesn't take any arguments so we could try to type and compile it right away. But if we do so... what's Foo? And what's the class method bar in it? We can't know. Well, maybe a require is missing in that file:

# dependency.cr

require "foo"

def something
  Foo.bar
end

something
Enter fullscreen mode Exit fullscreen mode

and now if foo.cr defines Foo all is good. But Crystal (and Ruby) allow not requiring a dependency like that, and instead it can come from somewhere else. For example:

# caller.cr

require "dependency"

class Foo
  def self.bar
    1
  end
end
Enter fullscreen mode Exit fullscreen mode

Here caller.cr defines a class Foo with a class method bar, and requires dependency which in turns defines something that uses that Foo. And this is perfectly fine, even though there's no explicit mention of where does Foo come from in something.

The conclusion so far is that it's almost always impossible to look at a single file and type it without knowing the full program.

Why Crystal allows writing programs like that?

At this point one might wonder "is it a good idea that a file can work without it specifying where do types and methods come from?" But here I'm not going to try to answer that question, which is mainly subjective, and instead try to focus on how we could improve Crystal as it is right now. Of course Ruby could be "improved" (made faster) in many ways if we changed it (like, disallow some dynamisms, force some type declarations, etc.) but Ruby is being improved while not changing the way it is, while keeping its philosophy. Ruby has a vision, and it's being improved while keeping that vision. And I think that with Crystal we should strive to do the same.

Incremental compilation

So let's discard for now the idea of being able to compile files individually. What else can we do?

One idea is to compile the entire program first, and then try to reuse that knowledge for subsequent compilations. It would work like this:

  1. We compile the entire program and build a dependencies graph: which files depend on which other files?
  2. We also remember the types used in methods? For example if add(x, y) was called with two ints, we remember that it returns an int.
  3. Next time we compile a program, if we find a call to add(x, y) with two ints, we check if the file where add was defined, or any of its dependencies (recursively) changed, and if not, we could avoid typing that method again as we would know that it returns an int.

Would that really work?

Looking at file dependencies in real programs

I created a branch that modifies the compiler to output a file dependencies graph for a given program. You can try it by checking out that branch, creating a compiler from it and then compiling any program. This will generate a dependencies.dot file which you can then turn into a PDF by using this command:

$ dot -Tpdf dependencies.dot > dependencies.pdf
Enter fullscreen mode Exit fullscreen mode

(you'll need to install graphviz first)

Running it against this program:

# foo.cr
puts "Hello world!"
Enter fullscreen mode Exit fullscreen mode

will produce this graph:

Image description

What a lovely tree-like graph! It's very easy to see how no cycles exist here.

They say "beauty lies in the eyes of the beholder" but I hope we can all agree that the graph above is a mess.

Still, within all that mess, we can find this:

Image description

kernel.cr defines puts, so it's natural that foo.cr dependes on it. But what's interesting is that nobody depends on foo.cr (naturally.) And that also means that if next time we only change foo.cr then we can reuse everything else from the previous compilation.

Trouble ahead

This looks promising. But running the tool with some bigger programs I found something. Let's take a look at this program:

# foo.cr

class Foo
  def to_s(io : IO)
    io << "foo"
  end
end

puts Foo.new
Enter fullscreen mode Exit fullscreen mode

The graph near foo.cr now looks like this:

Image description

Now foo.cr has more dependencies, but something curious is that someone is now depending on foo.cr. Who?

We could play the "trace the line" game to find it, but looking at the dot file there's this:

  "src/io.cr" -> "foo.cr"
Enter fullscreen mode Exit fullscreen mode

The above means "src/io.cr" depends on "foo.cr". WAT? How can the file that defines IO depend on "foo.cr"? It should be the other way around!

Debugging this a bit, we can see that the call the program makes is this one:

puts Foo.new
Enter fullscreen mode Exit fullscreen mode

Next we have that the definition of puts is:

def puts(*objects) : Nil
  STDOUT.puts *objects
end
Enter fullscreen mode Exit fullscreen mode

So that's calling puts on an IO (STDOUT). Let's take a look at that:

class IO
  def puts(obj : _) : Nil
    self << obj
    puts
  end
end
Enter fullscreen mode Exit fullscreen mode

This is calling IO#<<(obj) where obj is an instance of Foo. Let's take a look at that << method:

class IO
  def <<(obj) : self
    obj.to_s self
    self
  end
end
Enter fullscreen mode Exit fullscreen mode

So this is calling obj.to_s(self), where obj is an instance of Foo. And where is that method defined? In foo.cr! Bingo!

That's why suddenly io.cr depends on foo.cr.

One way to understand this is to know that in Crystal all methods are monomorphized. Monomorwhat? It just means that if you call IO#puts with a type X, a method will be generated for that particular X. If you call it with a Y, a method will be generated for that particular Y (for that particular type.) In the example above, an IO#<<(obj) method was generated where obj is of type Foo.

This is one reason that, I think, explains how messy the graph above is.

In other compiled, statically typed programming languages, this doesn't happen. Usually obj responds to some explicit interface and then the method lies behind a virtual table and a virtual dispatch. None of this happens in Crystal.

Does it matter?

Does it matter that io.cr depends on foo.cr? Maybe it's not a big deal. Well, it probably is. It means that changes to a file are likely to affect seemingly unrelated files. And so if we try to use the "reuse previous compilation" strategy, not a lot could be reused in the end.

The end?

This is not the end of the story. It's just the end of this blog post. I didn't continue thinking about how to approach this problem, but hopefully these posts spark some interest and some thoughts from others about how to solve this interesting problem.

Top comments (7)

Collapse
 
matthewmcgarvey profile image
Matthew McGarvey • Edited

Does this mean typing parameters to an interface (a module) still generates overloads of the method for each concrete class that uses it?

module Writer
  def write(io: IO)
    io << "Dear diary"
    io << "Sincerely,"
    io << self.class.name
  end
end

class Foo
  include Writer
end
class Bar
  include Writer
end

def write_to_diary(writer: Writer)
  writer.write(STDOUT)
end
Enter fullscreen mode Exit fullscreen mode

Does that produce overloads of write_to_diary when they use the method?

I mainly ask because it seems like IO#<< could type it's parameter as Object if that was allowed and avoid the cyclical dependency but if that kind of thing still generates the overloads I see why that's not worthwhile to allow.


Also, how does the cyclical dependency affect compilation? Does the whole of IO have to be recompiled now? How does that affect its other dependencies? I'm interested in learning about the ripples of what the recompile looks like

Collapse
 
asterite profile image
Ary Borenszweig

That's a great question!

So, if you call the method like this:

write_to_diary(Foo.new)
write_to_diary(Bar.new)
Enter fullscreen mode Exit fullscreen mode

then compiling with --emit llvm-ir we can see these function definitions:

  • write_to_diary<Foo>:IO::FileDescriptor
  • write_to_diary<Bar>:IO::FileDescriptor
  • Foo@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
  • Bar@Writer#write<IO::FileDescriptor>:IO::FileDescriptor

So one method per class was generated.

However, if we have a variable whose type is Writer:

writer = Foo.new.as(Writer)
write_to_diary(writer)
Enter fullscreen mode Exit fullscreen mode

(the above can also happen if you have an instance variable of type Writer and you assign a Foo to it, and then read it back)

then looking at the generated LLVM IR we find:

  • write_to_diary<Writer>:IO::FileDescriptor
  • Foo@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
  • Bar@Writer#write<IO::FileDescriptor>:IO::FileDescriptor

So the first call didn't produce a multidispatch. The second one did, but it could have been avoided. I think we did it like that because modules in Ruby are mainly used to mix-in code into other types, not like interfaces. Eventually we added abstract methods and started using modules as interfaces. And we even thought about distinguishing between modules as mix-ins and modules as interfaces... but we never got to do anything about that. One idea was to automatically detect whether a module was used as an interface or as a mixin. But maybe something more explicit could work. But... for interfaces we'd probably require explicit type annotations. Anyway, these were all just thoughts.

Now, if instead of a module you use a class as a base type, and do the same as above:

class Writer
  def write(io : IO)
    io << "Dear diary"
    io << "Sincerely,"
    io << self.class.name
  end
end

class Foo < Writer
end

class Bar < Writer
end

def write_to_diary(writer : Writer)
  writer.write(STDOUT)
end

writer = Foo.new.as(Writer)
write_to_diary(writer)
Enter fullscreen mode Exit fullscreen mode

we can find these functions:

  • write_to_diary<Writer+>:IO::FileDescriptor
  • Writer+@Writer#write<IO::FileDescriptor>:IO::FileDescriptor

That is, the write call didn't produce a multidispatch.

Why it works fine for classes? I think now I remember a reason. Let's say you have this code:

class Writer
  def write(io : IO)
    io << @a
  end
end

class Foo < Writer
  @a = 1
end

class Bar < Writer
  @a = "hello"
end

def write_to_diary(writer : Writer)
  writer.write(STDOUT)
end

writer = Foo.new.as(Writer)
write_to_diary(writer)
Enter fullscreen mode Exit fullscreen mode

So we output the value of @a inside Writer, but it can be an int or a string on each subclass. So if the write method is only produced once, for Writer, what happens when trying to access @a? Well, the answer is that the code above doesn't compile. It will complain that you didn't define @a in Writer. It works fine if you hide @a behind a method, because that will produce a multi-dispatch.

For modules you aren't required to define their instance variables, because their methods don't go through a base type (like the example above.)

Anyway, this reply is becoming very long, but now that I think about it, we later introduced multi-dispatch for instance var reads, so maybe we could avoid doing a multidispatch on a module as much as possible. Not sure. I'm also not sure how much it will help with compilation times.

Collapse
 
destynova profile image
Oisín

Excellent writeup! During the Advent of Code 2022 puzzle season I solved some of the problems in different languages, but mostly Crystal and Nim. I did some very informal benchmarking and found that my programs in both languages ran quickly and had low memory consumption, and the Crystal ones usually had slightly more concise source code, but took much longer to compile. Like 10 times longer. I think part of this is due to Nim requiring all functions to have type annotations, while Crystal is a bit more relaxed.
It would be great to improve things though, because I imagine larger programs might take a very long time to build in release mode.

Collapse
 
straightshoota profile image
Johannes Müller

Thank you Ary, this is a great article! It goes into the details of compiler operations and explains a complex topic. But it's easy to follow along even without much compiler experience.
I'm eager to discuss more options for compiler optimization.
One aspect I'm wondering about: You demonstrate an example where incremental compilation doesn't work / is very complicated.
But I suppose it could work well for other code?
Maybe it would still be useful to have incremental compilation for pieces of code that can be pretty much isolated? Even if we can only cache a portion of the program's code, that should be something. I don't know how much that would affect and what would be necessary to even detect possible cases.

Collapse
 
asterite profile image
Ary Borenszweig

Maybe it would still be useful to have incremental compilation for pieces of code that can be pretty much isolated?

That's a great point! And in fact I was thinking about writing about that in my next article. I don't know the answer yet, so we'll find out soon :-)

Collapse
 
dominic_sisneros_7aa2fa0a profile image
Dominic Sisneros

Have you looked at using Red Green tree for the AST. This comes from the Roslyn compiler, I think.

Collapse
 
asterite profile image
Ary Borenszweig

Thanks. I just looked into it. It seems those are used to reuse syntax. But here we need to reuse semantic.