Making Slate [Part 2.1]

#slate #computerscience #llvm

I realized I left a big part out of my last article so today I'm here to fill you all in.

The Making of Slate [Part 2]

Meghan (she/her) ・ Sep 9 '20 ・ 4 min read

#slate #computerscience #tutorial

One of the big changes I mentioned over part 1 was the switch from using Wasm to using LLVM and I accidentally completely glossed over it. Super sorry about that one.

So,
LLVM is a pseudo-assembly and program collection that takes this "intermediate representation" as they call it and does the actual compilation into machine code. It's very type strong and looks familiar yet foreign.

And once the program is able to produce an AST from the code, it's the Slate compiler's job to then produce LLVM IR.

https://gist.github.com/nektro/a61777bbcb818d99dddf579b031e9798

This gist contains both the code we're working on and the desired LLVM IR output wer're going to produce. Thankfully, the LLVM project provides a Go library we can use. Actually generating IR takes many things to consider, so today is going to cover the first part.

Picking Declaration Order

In our code we have 4 const declarations that equal to functions with some calls in each one. This is no matter, except for if we go to compile these in the order they're written. If this is the case, then we'll create a function object for main, but then when we try to find the function object for print it isn't there (because it's lower down in the file and we haven't compiled it yet). Bummer. Unfortunately this is our problem because we don't want to copy C's behavior of having to put your functions in the order they're used.

The Current Approach

First we set up two variables. The first is a completed map that associates const names to their IR objects. This will contain the declarations that have been completed and can be referenced later at any time.

At the start this will be empty. Secondly, we also need a todo string array of the uncompiled const declarations. It will contain their names so long as they still need to processed. At the start all declarations will be placed into todo. (For our example this will be [main, print, write, syscall3])

Then, it loops through the todo array and calculates the "dependencies" of each item. Remember earlier when we were going to parse main and needed print? That info is calculated here. After that we then have the following information:

main: [print]
print: [write]
write: [syscall3]
syscall3: []

Note: write calls syscall3 because the convention is the direct call-path for x86_64 Linux, which is what this base example was made for.

syscall3 has no "dependencies" because the only call it makes is inline assembly and will just about be directly inserted into the binary with its arguments.

Now, it will make the IR conversion for all AST objects whose calculated dependencies are all elements of the completed map object and remove them from todo.

For our example, that will be syscall3. syscall3 will then get sent to the LLVM Go library and placed into completed. The compiler will then repeat this loop until todo is empty and finally exit.

Future Improvements

This approach worked because the example was very direct and there are no imports or anything like that. Additionally, it's not very type-safe.

I'm conflicted on whether I want to eventually allow name overloading, but I want it to be able to throw an error if print(string) exists but you try and call print(int), etc.

On top of that, tree shaking for the benefit of smaller binary outputs would be great. This could be achieved by starting at main and diving into each dependency until it parses all that have none. It would then work backwards and only compile the functions/objects referenced through the main pathway. This would have the added benefit of giving me dead code analysis for free.

Wrap up

I hope this explains the internals a lot better than the last article and I'll try my best to leave out jarring wholes next time. Blogging is very much its own skill and kudos to those who do this regularly.

The project can still be found here https://github.com/nektro/slate and I can found here on DEV, Github, and Twitter all @nektro .

Thanks for reading, and I'll see you next time ❤