This is the introductory post of a series, so normally you would find a motivational section here. Why should you learn how to build a compiler? I'm not going to bother with that; Crafting Interpreters did it better than I could, anyway. The short of it is, well, writing compilers is fun, and covers a lot of techniques for solving problems.
We're going to start by implementing an interpreter for the Tiny-C language. Tiny-C is a significantly stripped down C-style language where there is only one data type, no variable declarations, no functions, the only control flow options are if
and while
, and the only expressions are +
, -
, and <
. Because of this, the grammar is simple enough I can include it here inline:
Program = Statement;
Statement =
| If:{ "if" cond:ParenExpr then:Statement else:{ "else" then:Statement } }
| While:{ "while" cond:ParenExpr then:Statement }
| Block:{ "{" then:Statement* "}" }
| Expr:{ then:Statement? ";" }
;
ParenExpr = "(" Expr ")";
Expr =
| Assign:{ id:Id "=" val:Expr }
| Test:{ lhs:Expr "<" rhs:Expr }
| Sum:{ lhs:Expr "+" rhs:Term }
| Diff:{ lhs:Expr "-" rhs:Term }
;
Term =
| Id:{ 'a'..='z'+ }
| Int:{ '0'..='9'+ }
| Expr:ParenExpr
;
(Note that this is slightly modified from the original Tiny-C to better fit how we are going to build the compiler.) The simplicity of the language allows us to focus on the structure of the compiler rather than the intricacies of the language. If you don't fully understand the above grammar, don't worry: we'll be covering it in more detail in the future.
Additionally, we'll be writing the compiler in a fully IDE-ready style. This means our compiler will also serve as the intelligence engine powering a language server protocol server. Though our language is simple, we want to write the compiler as if it were a full-fat compiler for a full-fat language. By doing so, we can transfer what we learn here to real languages' intelligence engines.
Because we're trying to build a real compiler, we'll be taking full advantage of the available tooling for building compilers in Rust. Rust's ownership semantics make the compartmentalization and complex data flows through a compiler easier to see. For libraries, we'll be using at least rowan, salsa, and lsp-server.
With any luck, once we've implemented the interpreter for Tiny-C, we'll continue on to write a backend with Cranelift, and maybe even continue on to designing and writing a full language 👀
I plan to have a post in the series out at least every other Friday. Next time: we set up the repository structure and lex the language into its component tokens. We'll also be getting our first venture into code generation along the way.
Discuss this below! | on Reddit! | on Twitter! | on Discord! | Watch me live!
Related Reading
- Crafting Interpreters
- Matklad's Modern Parser Generator
- Matklad's The Essence of Lexer
- KartikTalwar's Tiny-C Compiler
- rust-analyzer's architecture overview
- the Salsa book (WIP) and hello world
Top comments (2)
Oh man, am I looking forward to this series! Especially the LSP portion. Do you plan to hand-write the parser?
I do plan on hand-writing the parser, as for the kind of error-tolerance, you want in an IDE environment, there's no real great modern parser generator.
That's the big reason for implementing such a simple language as Tiny-C: it's simple enough to do everything by hand and understand how everything works.