DEV Community

loading...

Adding our own custom statement to Rust language

Dave
K8s, Infra, Backend, and Distributed Systems!
Updated on ・8 min read

I wanted to learn more about Rust internals and decided why not add my own custom statement and see how it goes.

Without knowing much about either the code base or compilers in general, I was able to add my statement. Buckle up, this gonna be interesting!

Goal

Goal is to add the unless cond { block } statement, which only executes the block when cond is not met:

fn num_is_odd(n: u32) -> bool {
    return n % 2 == 1
}

fn main() {
    for num in 1..10 {
        unless num_is_odd(num) {
            print!("{} ", num);
        }
    }
}

// expected output:
// 2 4 6 8 ⏎
Enter fullscreen mode Exit fullscreen mode

Setup

You need to clone the rust-lang/rust repository and setup everything it needs:

$ git clone https://github.com/rust-lang/rust.git
$ cd rust
$ ./x.py build -i library/std # this builds the compiler
Enter fullscreen mode Exit fullscreen mode

This should take about 30 minutes, so get comfortable!

Start investigating

Once I had everything setup, meaning I could compile my own compiler, I started looking for keyword Stmt in the project.

The interesting file containing Stmt is at compiler/rustc_ast/src/ast.rs which is the module responsible for building the abstract syntax tree (AST).

After parsing the source code files, Rust creates the AST because it's easier to work with at further steps.

The file has an interesting enum containing expressions:

  pub enum ExprKind {
      .
      .
      /// A binary operation (e.g., `a + b`, `a * b`).
      Binary(BinOp, P<Expr>, P<Expr>),
      /// A unary operation (e.g., `!x`, `*x`).
      Unary(UnOp, P<Expr>),
      .
      .
      /// An `if` block, with an optional `else` block.
      ///
      /// `if expr { block } else { expr }`
      If(P<Expr>, P<Block>, Option<P<Expr>>),
      .
      .
Enter fullscreen mode Exit fullscreen mode

Creating Unless

Our unless has the same logic as an if statement so we're going to model after that. Right under If we're going to place:

      /// An `unless` block, with an optional `else` block.
      ///
      /// `unless expr { block } else { expr }`
      Unless(P<Expr>, P<Block>, Option<P<Expr>>),
Enter fullscreen mode Exit fullscreen mode

We can do without the optional expression but we chose to keep it. Save the file and try to compile, don't worry it will error out and won't take that much time.

$ ./x.py build -i library/std

error[E0004]: non-exhaustive patterns: `Unless(_, _, _)` not covered
    --> compiler/rustc_ast/src/ast.rs:1192:15
     |
1192 |           match self.kind {
     |                 ^^^^^^^^^ pattern `Unless(_, _, _)` not covered
...
1265 | / pub enum ExprKind {
1266 | |     /// A `box x` expression.
1267 | |     Box(P<Expr>),
1268 | |     /// An array (`[a, b, c, d]`)
...    |
1313 | |     Unless(P<Expr>, P<Block>),
     | |     ------ not covered
...    |
1411 | |     Err,
1412 | | }
     | |_- `ast::ExprKind` defined here
     |
     = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
     = note: the matched value is of type `ast::ExprKind`

error[E0004]: non-exhaustive patterns: `&mut Unless(_, _, _)` not covered
    --> compiler/rustc_ast/src/mut_visit.rs:1204:11
     |
1204 |       match kind {
     |             ^^^^ pattern `&mut Unless(_, _, _)` not covered
     |
    ::: compiler/rustc_ast/src/ast.rs:1265:1
     |
1265 | / pub enum ExprKind {
1266 | |     /// A `box x` expression.
1267 | |     Box(P<Expr>),
1268 | |     /// An array (`[a, b, c, d]`)
...    |
1313 | |     Unless(P<Expr>, P<Block>),
     | |     ------ not covered
...    |
1411 | |     Err,
1412 | | }
     | |_- `ast::ExprKind` defined here
     |
     = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
     = note: the matched value is of type `&mut ast::ExprKind`

error[E0004]: non-exhaustive patterns: `Unless(_, _, _)` not covered
    --> compiler/rustc_ast/src/visit.rs:738:11
     |
738  |       match expression.kind {
     |             ^^^^^^^^^^^^^^^ pattern `Unless(_, _, _)` not covered
     |
    ::: compiler/rustc_ast/src/ast.rs:1265:1
     |
1265 | / pub enum ExprKind {
1266 | |     /// A `box x` expression.
1267 | |     Box(P<Expr>),
1268 | |     /// An array (`[a, b, c, d]`)
...    |
1313 | |     Unless(P<Expr>, P<Block>),
     | |     ------ not covered
...    |
1411 | |     Err,
1412 | | }
     | |_- `ast::ExprKind` defined here
     |
     = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
     = note: the matched value is of type `ast::ExprKind`

error: aborting due to 3 previous errors
Enter fullscreen mode Exit fullscreen mode

Since we added a new Enum, we need to fill in our type in places matching against that, those 3 files mentioned in errors:

In compiler/rustc_ast/src/ast.rs:

    pub fn precedence(&self) -> ExprPrecedence {
          match self.kind {
              .
              .
              ExprKind::If(..) => ExprPrecedence::If,
              ExprKind::Unless(..) => ExprPrecedence::If,
              .
              .
Enter fullscreen mode Exit fullscreen mode

In compiler/rustc_ast/src/visit.rs:

pub fn walk_expr<'a, V: Visitor<'a>>(visitor: &mut V, expression: &'a Expr) {
    walk_list!(visitor, visit_attribute, expression.attrs.iter());

    match expression.kind {
        .
        .
        ExprKind::If(ref head_expression, ref if_block, ref optional_else) => {
            visitor.visit_expr(head_expression);
            visitor.visit_block(if_block);
            walk_list!(visitor, visit_expr, optional_else);
        }
        ExprKind::Unless(ref head_expression, ref unless_block, ref optional_unless) => {
            visitor.visit_expr(head_expression);
            visitor.visit_block(unless_block);
            walk_list!(visitor, visit_expr, optional_unless);
        }
        .
        .
Enter fullscreen mode Exit fullscreen mode

In compiler/rustc_ast/src/mut_visit.rs:

pub fn noop_visit_expr<T: MutVisitor>(
    Expr { kind, id, span, attrs, tokens }: &mut Expr,
    vis: &mut T,
) {
    match kind {
        .
        .
        ExprKind::If(cond, tr, fl) => {
            vis.visit_expr(cond);
            vis.visit_block(tr);
            visit_opt(fl, |fl| vis.visit_expr(fl));
        }
        ExprKind::Unless(cond, tr, fl) => {
            vis.visit_expr(cond);
            vis.visit_block(tr);
            visit_opt(fl, |fl| vis.visit_expr(fl));
        }
        .
        .
Enter fullscreen mode Exit fullscreen mode

Initially I'd negated the head_expression and cond, but I'd get:

error[E0600]: cannot apply unary operator `!` to type `&mut P<ast::Expr>`
    --> compiler/rustc_ast/src/mut_visit.rs:1250:28
     |
1250 |             vis.visit_expr(!cond);
     |                            ^^^^^ cannot apply unary operator `!`
     |
     = note: an implementation of `std::ops::Not` might be missing for `&mut P<ast::Expr>`

error[E0600]: cannot apply unary operator `!` to type `&P<ast::Expr>`
   --> compiler/rustc_ast/src/visit.rs:792:32
    |
792 |             visitor.visit_expr(!head_expression);
    |                                ^^^^^^^^^^^^^^^^ cannot apply unary operator `!`
    |
    = note: an implementation of `std::ops::Not` might be missing for `&P<ast::Expr>`
Enter fullscreen mode Exit fullscreen mode

Which means, we have to use std::ops::Not to achieve this. For now, we proceed without it. All good, let's build the compiler now:

$ ./x.py build -i library/std
.
.
error[E0004]: non-exhaustive patterns: `Unless(_, _, _)` not covered
    --> compiler/rustc_ast_pretty/src/pprust/state.rs:1871:15
     |
1871 |         match expr.kind {
     |               ^^^^^^^^^ pattern `Unless(_, _, _)` not covered
     |
    ::: /.../rust-lang/rust/compiler/rustc_ast/src/ast.rs:1314:5
     |
1314 |     Unless(P<Expr>, P<Block>, Option<P<Expr>>),
     |     ------ not covered
     |
     = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
     = note: the matched value is of type `ExprKind`

error: aborting due to previous error
Enter fullscreen mode Exit fullscreen mode

We missed that, so in rustc_ast_pretty/src/pprust/state.rs:

    fn print_expr_outer_attr_style(&mut self, expr: &ast::Expr, is_inline: bool) {
        .
        .
        match expr.kind {
            .
            .
            ast::ExprKind::If(ref test, ref blk, ref elseopt) => {
                self.print_if(test, blk, elseopt.as_deref())
            }
            ast::ExprKind::Unless(ref test, ref blk, ref elseopt) => {
                self.print_if(test, blk, elseopt.as_deref())
            }
            .
            .
Enter fullscreen mode Exit fullscreen mode

Try again:

$ ./x.py build -i library/std
.
.
error[E0425]: cannot find value `Unless` in module `kw`
    --> compiler/rustc_parse/src/parser/expr.rs:1119:40
     |
1119 |         } else if self.eat_keyword(kw::Unless) {
     |                                        ^^^^^^ not found in `kw`
     |
help: consider importing one of these items
     |
1    | use crate::parser::ExprKind::Unless;
     |
1    | use rustc_ast::ExprKind::Unless;
     |

   Compiling rustc_middle v0.0.0 (/.../rust-lang/rust/compiler/rustc_middle)
error[E0004]: non-exhaustive patterns: `Unless(_, _, _)` not covered
    --> compiler/rustc_ast_lowering/src/expr.rs:29:30
     |
29   |             let kind = match e.kind {
     |                              ^^^^^^ pattern `Unless(_, _, _)` not covered
     |
    ::: /.../rust-lang/rust/compiler/rustc_ast/src/ast.rs:1314:5
     |
1314 |     Unless(P<Expr>, P<Block>, Option<P<Expr>>),
     |     ------ not covered
     |
     = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
     = note: the matched value is of type `rustc_ast::ExprKind`

error: aborting due to previous error
Enter fullscreen mode Exit fullscreen mode

kw::Unless doesn't exist, if we follow kw definition, we reach rustc_span/src/symbol.rs and there we can add our keyword:

        If:                 "if",
        Unless:             "unless",
        Impl:               "impl",
Enter fullscreen mode Exit fullscreen mode

In rustc_parse/src/parser/expr.rs:

    fn parse_bottom_expr(&mut self) -> PResult<'a, P<Expr>> {
        .
        .
        } else if self.eat_keyword(kw::If) {
            self.parse_if_expr(attrs)
        } else if self.eat_keyword(kw::Unless) {
            self.parse_if_expr(attrs)
        }
        .
        .
Enter fullscreen mode Exit fullscreen mode

and in compiler/rustc_ast_lowering/src/expr.rs#L115:

                ExprKind::Unless(ref cond, ref then, ref else_opt) => match cond.kind {
                    ExprKind::Let(ref pat, ref scrutinee) => {
                        self.lower_expr_if_let(e.span, pat, scrutinee, then, else_opt.as_deref())
                    }
                    ExprKind::Paren(ref paren) => match paren.peel_parens().kind {
                        ExprKind::Let(ref pat, ref scrutinee) => {
                            // A user has written `if (let Some(x) = foo) {`, we want to avoid
                            // confusing them with mentions of nightly features.
                            // If this logic is changed, you will also likely need to touch
                            // `unused::UnusedParens::check_expr`.
                            self.if_let_expr_with_parens(cond, &paren.peel_parens());
                            self.lower_expr_if_let(
                                e.span,
                                pat,
                                scrutinee,
                                then,
                                else_opt.as_deref(),
                            )
                        }
                        _ => self.lower_expr_if(cond, then, else_opt.as_deref()),
                    },
                    _ => self.lower_expr_if(cond, then, else_opt.as_deref()),
                },
Enter fullscreen mode Exit fullscreen mode

When do I stop copying?

So far, we only copied whatever If had for our Unless type. We have everything we want now to compile the compiler, but wait!

Before you start another compile and spend a lot of time, you can quickly check the status with:

$ ./x.py check
Enter fullscreen mode Exit fullscreen mode

This is a faster command that is only going to check whether your code can compile. Don't worry if it errors at src/tools/clippy/clippy_utils/src/sugg.rs, you're good to go if you reach that far!

Implementing Unless logic

Right now, we have unless as a proxy for if but that's not what we want. We want our block to be run only when the condition is not met. We already tried with ! above but it told us to use std::ops::Not.

I tried that too, but that's not exactly the type we're looking for. Thank you, compiler!

Instead we need to modify our parser, because by the time code reaches ast module, expressions are set and we only have Visit logic.

So in rustc_parse/src/parser/expr.rs change:

} else if self.eat_keyword(kw::Unless) {
            self.parse_unless_expr(attrs)
}
Enter fullscreen mode Exit fullscreen mode

and then:

    /// Parses an `unless` expression (`unless` token already eaten).
    fn parse_unless_expr(&mut self, attrs: AttrVec) -> PResult<'a, P<Expr>> {
        let lo = self.prev_token.span;
        let cond = self.parse_cond_expr()?;

        // Verify that the parsed `if` condition makes sense as a condition. If it is a block, then
        // verify that the last statement is either an implicit return (no `;`) or an explicit
        // return. This won't catch blocks with an explicit `return`, but that would be caught by
        // the dead code lint.
        let thn = if self.eat_keyword(kw::Else) || !cond.returns() {
            self.error_missing_if_cond(lo, cond.span)
        } else {
            let attrs = self.parse_outer_attributes()?.take_for_recovery(); // For recovery.
            let not_block = self.token != token::OpenDelim(token::Brace);
            let block = self.parse_block().map_err(|mut err| {
                if not_block {
                    err.span_label(lo, "this `if` expression has a condition, but no block");
                    if let ExprKind::Binary(_, _, ref right) = cond.kind {
                        if let ExprKind::Block(_, _) = right.kind {
                            err.help("maybe you forgot the right operand of the condition?");
                        }
                    }
                }
                err
            })?;
            self.error_on_if_block_attrs(lo, false, block.span, &attrs);
            block
        };
        let els = if self.eat_keyword(kw::Else) { Some(self.parse_else_expr()?) } else { None };
        let neg_cond = self.mk_expr(
            lo.to(self.prev_token.span),
            self.mk_unary(UnOp::Not, cond),
            AttrVec::new()
        );
        Ok(self.mk_expr(lo.to(neg_cond.span), ExprKind::If(neg_cond, thn, els), attrs))
    }
Enter fullscreen mode Exit fullscreen mode

Our parser pretty much looks like If, except we negate the condition using Unary type, store that in neg_cond and then create a normal If block using our new condition.

Not only is this not a bad practice, but actually encouraged as it narrows down the building blocks in later stages and improvements on one of them would trickle. In our case a new compiler optimization for If would translate to a better Unless.

Compiling our goal

$ export RUST_SRC_PATH=/home/path-to-source/rust-lang/rust
$ export RUSTC_DEV=$RUST_SRC_PATH/build/x86_64-unknown-linux-gnu/stage1/bin/rustc
$ $RUSTC_DEV -vV # notice the dollar sign
rustc 1.56.0-dev # notice -dev here which tells us it's compiled from source
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.56.0-dev
LLVM version: 12.0.1

$ cat unless.rs
fn num_is_odd(n: u32) -> bool {
    return n % 2 == 1
}

fn main() {
    for num in 1..10 {
        unless num_is_odd(num) {
            print!("{} ", num);
        }
    }
}

$ $RUSTC_DEV unless.rs
$ ./unless
2 4 6 8 
Enter fullscreen mode Exit fullscreen mode

Github

You can see the commit here and if you want to try out the whole thing, checkout add-unless-statement branch.

Discussion (4)

Collapse
chayimfriedman2 profile image
Chayim Friedman

AFAIK, the Rust compiler never changes the AST. It should be a complete mirror of the source code. Desugaring happens on the HIR. But nice write-up :)

Collapse
drazendotlic profile image
Drazen Dotlic

I think the translation should happen before HIR, no? I mean, unless cleanly translates into not if, which then translates into jumps in the HIR, no? I admittedly know only general compiler techniques, not rustic.
I concur about the nice write-up though 😃

Collapse
chayimfriedman2 profile image
Chayim Friedman

HIR doesn't contain jumps. MIR contains them. HIR is very similar to the AST, and is directly generated from (then converted to type-checked HIR during typeck). HIR desugars features like async/await (to generators), for and while let loops (to loop with breaks), if let to match, and more.

Collapse
ashtonsnapp profile image
Ashton Scott Snapp

interesting