DEV Community

Cover image for Procedural macro in Rust 101
Matteo Bertini
Matteo Bertini

Posted on • Edited on

Procedural macro in Rust 101

Cover Image by Photo by Ricardo Gomez Angel on Unsplash

Procedural macro in Rust 101

Today, with Rust 1.51, no macro is required! (see below)

How to pick a function and make it a macro with added superpowers

One upon a time I had the need to take a list of items and return all the combinations of all couples.

This task in Python is easily accomplished using the built-in itertools.combinations:

itertools.combinations(iterable, r)
Return r length subsequences of elements from the input iterable.
Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

That offers a baseline implementation in Python (the real one is in C and based on iterators).

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)
Enter fullscreen mode Exit fullscreen mode

What we have here is a functions that takes a list and an integer r and returns a tuple of length r... hmm, not (yet) possible in plain Rust, but let's focus on my problem, I currently only need couples!

Here is a basic porting in Rust (generic enough to be modified for r > 2):

fn combinations2<T>(pool: &Vec<T>) -> Vec<[&T; 2]> {
    let r = 2; // <----- is the length of this ^
    let mut res = vec![];
    let n = pool.len();
    if r > n {
        return res;
    }
    let mut indices: Vec<_> = (0..r).collect();
    // we cannot `yield` (yet) in Rust, so we need an accumulator
    res.push([&pool[indices[0]], &pool[indices[1]]]);
    loop {
        // since this is not a generator like in Python, we cannot `return`, so the code
        // is slightly changed, `find` replaces the first `for` in Python.
        match (0..r).rev().find(|&i| indices[i] != (i + n - r)) {
            Some(i) => {
                indices[i] += 1;
                for j in (i + 1)..r {
                    indices[j] = indices[j - 1] + 1;
                }
                res.push([&pool[indices[0]], &pool[indices[1]]]);
            }
            None => break,
        }
    }
    res
}
Enter fullscreen mode Exit fullscreen mode

Months later, I tried to achieve something using basic macros by example1 in Rust, but with no luck, it's not easy to take a single integer and expand2 it into a list of something, the code burden was going too high, and I'd have to chose upfront some maximum value for the r number 3.

So I gave up till the more powerful procedural macro reached the almost stable point!

But here be dragons! syn, proc_macro, proc_macro2, quote!, WTF?

Parsing and quoting

Luckily the syn crate repository contains some very useful examples, in particular: examples/lazy-static is An example of parsing a custom syntax within a functionlike!(...) procedural macro.4

But lets start from the syn crate, that quoting from the README is:

Syn is a parsing library for parsing a stream of Rust tokens into a syntax tree of Rust source code.

So, given some Rust-like code, Syn will give us a syntax tree, ready to be used through all the basic types of the syntax tree, like Expr, Ident, Token, Type, and many others.
As noted in the README, Syn offers proc_macro2 a Proc macro shim , that allows Syn to support back to Rust 1.15.

The little quirk of proc_macro2 is that we are not allowed to use everything from a single place, because:

In general all of your code should be written against proc-macro2 rather than proc-macro. The one exception is in the signatures of procedural macro entry points, which are required by the language to use proc_macro::TokenStream.

Once we have parsed our input, we need to generate back some Rust code, and here kicks in the quote crate, that:

... provides the quote! macro for turning Rust syntax tree data structures into tokens of source code.

Our workflow will be:

  1. define some syntax for our procedural macro5, for example combinations!(comb3, 3) that will declare a function comb3(values: &[T]) -> Vec<[T; 3]>,
  2. define a struct to store the parsed parts from the macro call: comb3 and 3,
  3. use those values to parametrize the specialized code of combinations2,
  4. return the abstract new function and build back the Rust code for the compiler,
  5. call the new macro-generated function!

Putting the pieces together

The first piece of the puzzle is a struct that will contain the user values in the macro definition, in this case we have only a name and a number:

struct  Combinations {
    name: syn::Ident,
    n: syn::LitInt,
}
Enter fullscreen mode Exit fullscreen mode

And now we need some way to parse something into this struct, and here kicks in the Parse trait a Parsing interface implemented by all types that can be parsed in a default way from a token stream:

impl Parse for Combinations {
    fn parse(input: ParseStream) -> Result<Self> {
        let name = input.parse()?;
        input.parse::<syn::Token![,]>()?;
        let n = input.parse()?;
        Ok(Combinations { name, n })
    }
}
Enter fullscreen mode Exit fullscreen mode

Thanks to the Rust type inference super powers, calling input.parse()? let us parse the values into the right type or return and error if the parsing fails.
The strange line input.parse::<syn::Token![,]>()?; is a special syntax to parse simple tokens.
Now that we have parsed our input, we can write our super minimal macro, to test that everything worked as expected:

#[proc_macro]
pub fn minimal(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    quote!{
        fn #name() -> i32 {
            #n
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Ops, compilation error :/

   Compiling comb v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git/comb)
error[E0308]: mismatched types
  --> comb/src/lib.rs:27:5
   |
25 |   pub fn minimal(input: TokenStream) -> TokenStream {
   |                                         ----------- expected `proc_macro::TokenStream` because of return type
26 |       let Combinations { name, n } = parse_macro_input!(input as Combinations);
27 | /     quote!{
28 | |         fn #name() -> i32 {
29 | |             #n
30 | |         }
31 | |     }
   | |_____^ expected struct `proc_macro::TokenStream`, found struct `proc_macro2::TokenStream`
   |
   = note: expected type `proc_macro::TokenStream`
              found type `proc_macro2::TokenStream`
Enter fullscreen mode Exit fullscreen mode

Remember, syn offers a drop-in replacement for everything, except TokenStream, that is supposed to be the original from the compiler interfaces. Luckily the conversion from one into() the other is simple.

#[proc_macro]
pub fn minimal(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    (quote!{
        fn #name() -> i32 {
            #n
        }
    }).into()
}
Enter fullscreen mode Exit fullscreen mode

Without the extra parentheses you will have another strange compilation error, sadly is not even possible to expand the macro to see exactly what's happening, ...

   Compiling comb v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git/comb)
error: expected expression, found `.`
  --> comb/src/lib.rs:31:6
   |
31 |     }.into()
   |      ^ expected expression
error[E0308]: mismatched types
  --> comb/src/lib.rs:27:5
   |
27 | /     quote!{
28 | |         fn #name() -> i32 {
29 | |             #n
30 | |         }
31 | |     }.into()
   | |_____^ expected (), found struct `proc_macro2::TokenStream`
   |
   = note: expected type `()`
              found type `proc_macro2::TokenStream`
Enter fullscreen mode Exit fullscreen mode

Did I said expand? Yes, we can look the macro expanded code with the cargo expand command, and this is the result:

$ cargo expand --bin comb
    Checking circle v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git)
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::v1::*;
#[macro_use]
extern crate std as std;
fn mini3() -> i32 { 3 }    // <----- here is our expansion!
// ...
Enter fullscreen mode Exit fullscreen mode

Summing all up, we have this pipeline:

TokenStream -> parse_macro_input!(...) -> quote!{...} -> TokenStream
Enter fullscreen mode Exit fullscreen mode

And inside quote!{} the variables are accessed with the #name syntax.

Extra powers

Here we are, we have written a lot more code to reach a point we already know how to reach with the good old macros by example, but we had a problem, how to expand a number into a list, because this is the monster of this level:

res.push([&pool[indices[0]], &pool[indices[1], ...]])
Enter fullscreen mode Exit fullscreen mode

We need some way to accumulate a little piece of syntax, &pool[indices[#i]], and at the end join the parts into a comma separated list. Digging a bit into syn, what we need is syn::punctuated:

A punctuated sequence of syntax tree nodes separated by punctuation.
Lots of things in Rust are punctuated sequences.

  • The fields of a struct are Punctuated<Field, Token![,]>. ...

What we can do is:

#[proc_macro]
pub fn punctuated(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as i32)
        .map(|i| quote! { #i })  // <-- add the quoted variable, `i` as a token
        .collect();
    (quote! {
      fn #name() -> Vec<i32> {
        vec![#indices]  // put the inner part of the list into something
      }
    })
    .into()
}
Enter fullscreen mode Exit fullscreen mode

We can finally call the macro punctuated!(punct4, 4); that expands into:

fn punct4() -> Vec<i32> { <[_]>::into_vec(box [0i32, 1i32, 2i32, 3i32]) }
Enter fullscreen mode Exit fullscreen mode

We now have all the pieces to make our combinations function generic on the size r of the returned array:

#[proc_macro]
/// `combinations!(comb, N)` macro will define a function `comb` that takes a `&[T]` and returns
/// a vector of arrays of length _N_, `Vec<[&T; N]>`, covering all the combinations of _N_ elements
/// from the source vector.
pub fn combinations(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let pool_indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as usize)
        .map(|i| quote! { &pool[indices[#i]] })
        .collect();
    let comb = quote! {
        fn #name<T>(pool: &[T]) -> Vec<[&T; #n]> {
            let r = #n;
            let mut res = vec![];
            let n = pool.len();
            if r > n {
                return res;
            }
            let mut indices: Vec<_> = (0..r).collect();
            res.push([#pool_indices]);
            loop {
                match (0..r).rev().find(|&i| indices[i] != (i + n - r)) {
                    Some(i) => {
                        indices[i] += 1;
                        for j in (i + 1)..r {
                            indices[j] = indices[j - 1] + 1;
                        }
                        res.push([#pool_indices]);
                    },
                    None => break,
                }
            }
            res
        }
    };
    comb.into()
}
Enter fullscreen mode Exit fullscreen mode

Here we are, we can now define with a simple macro call a function with the wanted size, in the future we may also create and call a macro in a single shot, but not yet 5.

A final note, the macro can be used only from another crate, the suggested layout is highlighted in the lazy-static example.

Thanks to @dodomorandi for the review of the first draft and for the suggestions about a more idiomatic combinations2, and for his Iterator based implementation, and thanks to @lu-zero for the review and for the suggestion to avoid syn and search for a simpler method, task I left as an exercise for the reader.

Iterator based implementation

#[proc_macro]
pub fn iter_combinations(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let pool_indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as usize)
        .map(|i| quote! { &self.pool[self.indices[#i]] })
        .collect();

    let initial_indices: syn::punctuated::Punctuated<_, syn::Token![,]> =
        (0..n.value() as usize).collect();

    let iter_name = proc_macro2::Ident::new(
        &format!("CombIndicesIter{}", n.value()),
        proc_macro2::Span::call_site(),
    );

    let comb = quote! {
        pub struct #iter_name<'a, T> {
            pool: &'a [T],
            indices: [usize; #n],
            started: bool,
        }

        impl<'a, T> Iterator for #iter_name<'a, T> {
            type Item = [&'a T; #n];

            fn next(&mut self) -> Option<Self::Item> {
                if !self.started {
                    self.started = true;
                    Some([#pool_indices])
                } else {
                    let n = self.pool.len();
                    (0..#n).rev().find(|&i| self.indices[i] != i + n - #n)
                        .map(|i| {
                                self.indices[i] += 1;
                                for j in (i + 1)..#n {
                                    self.indices[j] = self.indices[j - 1] + 1;
                                }
                                [#pool_indices]
                        })
                }
            }
        }

        fn #name<T>(pool: &[T]) -> impl Iterator<Item = [&T; #n]> {
            #iter_name {
                pool,
                indices: [#initial_indices],
                started: false,
            }
        }
    };
    comb.into()
}
Enter fullscreen mode Exit fullscreen mode

"Modern" Rust combinations()

Today, with Rust 1.51, there is no need to use any* macro:

Rust Playground

use std::convert::TryInto;

fn combinations<T, const R: usize>(pool: &Vec<T>) -> Vec<[&T; R]> {
    let mut res = vec![];
    let n = pool.len();
    if R > n {
        return res;
    }
    // may have used `array-init` https://docs.rs/array-init/2.0.0/array_init/index.html
    macro_rules! into_array {
        ($it:expr) => {
            $it.collect::<Vec<_>>().as_slice().try_into().expect("should be of size R")
        };
    }
    let mut indices: [usize; R] = into_array!(0..R);
    let reversed: [usize; R] = into_array!((0..R).rev());
    res.push(into_array!(indices.iter().map(|i| &pool[*i])));
    loop {
        let idx = (|| {
            for i in reversed.iter() {
                if indices[*i] != (*i + n - R) {
                    return Some(*i);
                }
            }
            None
        })();
        if let Some(i) = idx {
            indices[i] += 1;
            for j in (i + 1)..R {
                indices[j] = indices[j - 1] + 1;
            }
            res.push(into_array!(indices.iter().map(|i| &pool[*i])));
        } else {
            break;
        }
    }
    res
}

fn main() {
    let numbers = vec![1i32,2,3,4];
    let combs: Vec<[_; 2]> = combinations(&numbers);
    // `let combs = combinations::<_, 2>(&numbers)` works, but IMHO the constraint
    // on the returned type is more explanatory and futureproof
    println!("{:?}", combs);
}
Enter fullscreen mode Exit fullscreen mode

  1. Super in-depth intro by DanielKeep@github: A Practical Intro to Macros in Rust 1.0 

  2. I'd have to expand 2 -> 0, 1, and then map it to &pool[indices[0]], &pool[indices[1]] 

  3. Again by DanielKeep, The Little Book of Rust Macros: Counting 

  4. One may try to avoid syn and use a combination of macro by example and procedural macros, but this is left as an exercise for the reader. 

  5. Sadly the usage of a macro in expression position, like let res = combinations!(&data, 3), is feature gated behind #![feature(proc_macro_hygiene)] and limited to the nightly edition, so we use another approach. 

Top comments (2)

Collapse
 
pigozzifr profile image
Francesco Pigozzi

I wish to read more articles like this 😁

Collapse
 
naufraghi profile image
Matteo Bertini • Edited

Today, with Rust 1.51, there is no need to use any* macro (edit: see article bottom)