Imaculate

Posted on Sep 17, 2020 • Edited on Sep 28, 2020

That's so Rusty: Metaprogramming

#rust #macros #metaprogramming

I'm writing this post while recovering from writer's cramp from all the keystrokes I've been striking. I've read Hanselman's concept of limited keystrokes in a lifetime before but this is the probably the first time it hit home. Ideally I would like a programming language that rids zero of boiler plate. I'm in luck because that is something Rust has done a fairly good job at. The simplest demonstration is when declaring variables. Although Rust is strongly typed, declaring a variable type is optional since it can be inferred. Declaring the type is still an option but not required. Below are two valid ways of declaring an integer.

let x = 9;
let x: i32 = 9;

In addition to that, Rust allows you to write code that generates code, a concept known as metaprogramming. As advanced as it sounds, metaprogramming has practical applications that can tremendously improve the quality of code and life. We will explore how with two problems.

How would you implement a function that takes a variable number of arguments i.e variadic function?
How would you implement a function that returns true if a struct has at least one vector field i.e has_vector_field()?

1. Variadic functions

Variadic functions are functions that can be passed with different number of arguments that may or not be of the same type. An example is a function that calculates the sum of numbers. The example below shows how sum_of_numbers() can calculate sum of 3 or 6 numbers that can be integers or floating point numbers.

fn main() {
    println!("Sum of three numbers is: {}", sum_of_numbers(1, 4.2, 5));
    println!(
        "Sum of six numbers is: {}",
        sum_of_numbers(-6, 8, 10, 12, 67, 90)
    );
}

One way to achieve this would be to define a function for each possible way arguments will be passed. Lets find out how feasible that is. We need to factor in the number of arguments a function can have, which is limitless but we can limit it to 100 for simplicity. These arguments could be integers or floating point numbers of varying widths hence various types, we can limit the types to 20. The number of functions to define is equal of unique permutations of upto 100 elements that can be of 10 different kinds. This problem is now a matter of combinatorics mathematics.
We can get a sense of the magnitude procedurally as follows:

Pick n starting from 1.
Find all possible type combination (up to 10 types) for chosen n.
For each of combination, find all possible arrangements (permutations) of the elements, sum the counts.
Sum the counts of all combinations for n to the total count.
Advance n by 1, repeat 1-4 till n is 100.

Even without knowledge of combinatorics, its easy how fast the total count will grow, even with simplified numbers. Given that this is the number of functions to be defined, a lot of code will be duplicated making it impossible to maintain, not to mention the pain of keystrokes. Although this option is possible, it is not feasible.

Another alternative would be to make the function take array argument instead. This has limitations for strongly typed languages where array elements can be only one type and Rust is one such language. To workaround that, we can make the array to be of floating point numbers since all numbers can be converted to floats. This adds overhead of creating an array and populating it with floating points versions of the arguments before calling the function. Although this option is feasible, it is not the most elegant.

Metaprogramming can give us both feasible and elegant solution to our problem through macros. Macros are similar to functions except their input and output is Rust code, not data. Rust supports two kinds of macros: Declarative and Procedural. Procedural macros allow you to extract patterns and generate appropriate code for each pattern. They were designed for this type of problem.

Here is a macro definition that solves the problem:

macro_rules! sum_of_numbers
{
    ( $( $x:expr ),* ) => {
        {
            let mut result = 0.0;
            $(
                result += $x as f64;
            )*
            result as i64
        }
    };
}

The definition is prepended with macro_rules! similar to fn for functions. The body syntax is similar to match guard, the Rust equivalent to switch-case statement. This guard is slightly different since it is matching against Rust code. It has one arm in parenthesis ( $( $x:expr),* ) which represents a pattern. If the pattern matches, the code after => will be executed.

In the inner parenthesis is ($x:expr), which matches a Rust expression and assigns it to x. Following that is a comma which represents a separator, the star next stands for the cardinality of 0 or more. This pattern therefore matches comma-separated Rust expressions. For each expression, the code in $() is generated, manipulating the expression as x.

When the macro is called with arguments like sum_of_numbers!(1, 4.2, 5), it effectively expands to:

{
    let mut result = 0.0;
    result += 1 as f64;
    result += 4.2 as f64;
    result += 5 as f64;
    result
}

This is as accurate as the function we would have hard-coded had we gone with the naive approach.

2. `has_vector_field()` function for structs

As the name suggests, we want has_vector_field() to examine all fields of a struct and return true if any of them is a vector. Other than demonstration it may be hard to imagine the practicality of this function but it may prove useful for assessing and optimizing runtime performance. For instance, it may be used to for assessment to avoid cloning large objects like below.

struct Book {
    title: String,
    publication_year: u32,
    authors: Vec<String>,
}

fn main() {
    let b = Book {...}
    if (!Book::has_vector_field()) {
        let b_clone = b.clone();
    }
}

Lets start with the most straight forward solution. Can we implement has_vector_field() for Book struct? Yes and No.

impl Book {
    fn has_vector_field() -> bool {
        // ....
    }
}

Yes, we can hardcode the result of the function because we know that Book has one vector field; authors. To maintain correctness, this function will have to be evaluated (and possibly updated), every-time the struct fields change. This becomes tedious when all structs in the program require the same attention. Maintaining this code will not feasible even for a highly disciplined team. The best option would be to dynamically introspect the struct fields in the function but Rust doesn't support reflection.

There has got to be a better way, this is Rust, after all, it is bound to have an elegant solution. First off, since we want has_vector_field() to be available to multiple structs, in other languages we would put it behind an interface to enforce all types to implement it. Rust has traits in place of interfaces. For our example we can implement the trait VectorField for Book as below:

pub trait VectorField {
    fn has_vector_field() -> bool;
}

impl VectorField for Book {
    fn has_vector_field() -> bool {
        // ...
    }
}

We still have to implement the function and this is where macros come to the rescue. In addition to declarative macros described in previous example, Rust supports another kind of macros called procedural macros. Procedural macros take Rust code as input and return Rust code. There are three kinds of procedural macros.

a. Function-like macros
b. Custom Derive macros
c. Attribute macros.

With derive macros we can generate trait implementations for types. Simply annotating a type with #[derive(<TraitName>)], makes the trait functions available. Snippet below shows barebones definition of the macro and how it can be used with Book struct.

// vectorfield_derive.rs
#[proc_macro_derive(VectorField)]
pub fn derive_vector_fields(input: TokenStream) -> TokenStream {
    TokenStream::new()
}

// main.rs
#[derive(VectorField)]
struct Book {
    title: String,
    publication_year: u32,
    authors: Vec<String>,
}

The input TokenStream can be parsed into an Abstract Syntax Tree (AST) for further processing.

#[proc_macro_derive(VectorField)]
pub fn derive_vector_fields(input: TokenStream) -> TokenStream {
    let ast = parse_macro_input!(input as DeriveInput);
    TokenStream::new()
}

To manipulate the AST, we need to figure out its structure. It is possible to print it on enabling full or extra-traits Rust features on the project. Doing so reveals the tree structure and how to extract field information from it. A truncated version of the AST for Book struct looks like:

ident: Ident {
        ident: "Book",
        span: #0 bytes(277..281),
    },
    ...
    data: Struct(
        ...
        Field {
               ...
                ident: "title",
                ...
                type:
                   ...
                   ident: "String"
        },
...

Of interest are the field type identifiers and the code below shows how to extract them from the AST.

#[proc_macro_derive(VectorField)]
pub fn derive_vector_fields(input: TokenStream) -> TokenStream {
    let ast = parse_macro_input!(input as DeriveInput);
    let mut vec_field = false;
    let fields = match &ast.data {
        Data::Struct(DataStruct {
            fields: Fields::Named(fields),
            ..
        }) => &fields.named,
        _ => panic!("expected a struct with named fields"),
    };

    for field in fields {
        if let Type::Path(tp) = &field.ty {
            if tp.path.segments[0].ident.to_string() == "Vec" {
                vec_field = true;
                break;
            }
        }
    }
    TokenStream::new()
}

For struct types, the data field of the AST is a Data::Struct Enum variant. Since we are implementing has_vector_field() for structs with named fields for simplicity, the match statement panics (throws exception in other languages) if another variant is found or fields are unnamed. Panicking is generally not a good idea, but its practical for our simple example. We then iterate through the fields, extracting each field type to get the identifier and compare it to the vector identifier Vec. We shortcircuit the loop on finding the first vector.

After dynamically determining the return value of has_vector_field(), the next step is to create a TokenStream of the function definition. This is done with the quote! macro as shown below.

extern crate proc_macro;

use proc_macro::TokenStream;
use quote::quote;
use syn;

#[proc_macro_derive(VectorField)]
pub fn derive_vector_fields(input: TokenStream) -> TokenStream {
    let ast = parse_macro_input!(input as DeriveInput);
    let name = &ast.ident;
    let mut vec_field = false;

    let fields = match &ast.data {
        Data::Struct(DataStruct {
            fields: Fields::Named(fields),
            ..
        }) => &fields.named,
        _ => panic!("expected a struct with named fields"),
    };

    for field in fields {
        if let Type::Path(tp) = &field.ty {
            vec_field = tp.path.segments[0].ident.to_string() == "Vec";
            if vec_field {
                break;
            }
        }
    }

    let output = quote! {
        impl VectorField for #name {
            fn has_vector_field() -> bool {
                #vec_field
            }
        }
    };
    TokenStream::from(output)
}

And with that, we can call has_vector_field() on any Struct with named fields as long as it derives the VectorField trait. Snippet below shows how Book makes use of it.

#[derive(VectorField)]
pub struct Book {
    title: String,
    publication_year: u32,
    authors: Vec<String>,
}

fn main() {
    println!("Do Books have vector fields? {}", Book::has_vector_field());
}

With metaprogramming we have gotten rid of code duplication and hard coding, freeing up headspace for actual business logic. Moreover this comes at no runtime performance penalty since the code is generated at compile time. The two examples above are just one of the many ways macros sprinkles magic. The need for code deduplication has opened doors for some very cool macros. Unlike C macros, Rust macros are hygienic in that they don't interfere with outer scope and local variables, hence preventing interesting compiler errors. This is indeed very Rusty.

DEV Community

That's so Rusty: Metaprogramming

1. Variadic functions

2. `has_vector_field()` function for structs

Top comments (0)

Read next

Rust + WASI: Application Monitoring

Extending a CLI Program with Subcommands in Rust

Rust vs Go: Choosing Your Backend Language 🚀

Rust TUI Chat Application - Mastering Terminal User Interfaces

1. Variadic functions

2. has_vector_field() function for structs

Read next

Rust + WASI: Application Monitoring

Extending a CLI Program with Subcommands in Rust

Rust vs Go: Choosing Your Backend Language 🚀

Rust TUI Chat Application - Mastering Terminal User Interfaces

2. `has_vector_field()` function for structs