ProgramCrafter

Posted on Feb 24

Example of Rust attribute macros: data serialization (part 1 - structures)

#tutorial #rust #blockchain

Recently, I decided to create Rust library that allows to serialize objects into "cells" (generally, bytes) for usage in TON blockchain. There are two obvious ways to do that:

For each structure, write out buffer.store_value(self.a); buffer.store_value(self.b); .... For parsing that, repeat the work once again.

Example

impl Serializable for InternalMessageHeader {
  fn write_to(&self, cell: &mut BuilderData) -> Result<()> {
    cell
        .append_bit_zero()?              //tag
        .append_bit_bool(self.ihr_disabled)?
        .append_bit_bool(self.bounce)?
        .append_bit_bool(self.bounced)?;

    self.src.write_to(cell)?;
    self.dst.write_to(cell)?;

    self.value.write_to(cell)?;         //value: CurrencyCollection

    self.ihr_fee.write_to(cell)?;       //ihr_fee
    self.fwd_fee.write_to(cell)?;       //fwd_fee

    self.created_lt.write_to(cell)?;    //created_lt
    self.created_at.write_to(cell)?;    //created_at

    Ok(())
  }
}

(quoted from ton-labs-block / messages.rs). In particular, notice ? each time. It's most definitely not normal if data serialization can fail in the middle.

Create macros that will take order in which to store fields and create serialization code automatically. When deserialization is added, it will certainly match the order of writes so fields don't mix up.
Use Serde and just write custom serializer. Unfortunately, it doesn't allow to specify order of fields serialization, nor compile-time checks that serialization could be fallible or infallible.

Rust has two kinds of macros: by-example (defined with macro_rules!) and procedural (they can be attached as attributes to structs, enums, etc, just like #[derive(...)]). I've thought that procedural macros will look cleaner in place of usage.

An attribute macros accepts two sequences of tokens: whatever is within its invocation (for Derive, that would be list of traits), and what it is applied to.

Required libraries

We need three external modules: quote+proc-macro2 to avoid forming token sequences by hand and instead having ability to substitute variables into the template, plus syn to parse code we receive (for instance, to iterate over enum variants).

[package]
name = "tlb_macro"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

# See more keys and their definitions at
#     https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
syn = {version = "^2.0.50", features = ["full"]}
quote = "^1.0.8"
proc-macro2 = "^1.0.78"

extern crate proc_macro;
use syn::{parse_macro_input, DeriveInput, Data, Expr,
    Fields, Ident, ItemEnum, Meta, MetaList, Lit, spanned::Spanned};
use quote::{quote_spanned, quote, ToTokens};
use proc_macro2::Span;

use std::collections::HashMap;

type OldTokenStream = proc_macro::TokenStream;
type V2TokenStream = proc_macro2::TokenStream;

Entry point

Let's call our attribute tlb_serializable (after TL-B language), and decide it should be used as follows:

#[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
pub struct Address {
    workchain: u8,
    hash_high: u128,
    hash_low: u128
}

Then, main function will look so:

#[proc_macro_attribute]
pub fn tlb_serializable(attr: OldTokenStream, mut item: OldTokenStream) -> OldTokenStream {
    let struct_item = item.clone();
    let input: ItemStruct = parse_macro_input!(struct_item);
    let name = input.ident;

    let serializers = create_serialization_code(
        &attr.to_string(), &input.fields);
    item.extend(OldTokenStream::from(quote! {
        impl crate::ton::CellSerialize for #name {
            fn serialize(&self) -> ::std::vec::Vec<::std::string::String> {
                let mut result : ::std::vec::Vec<::std::string::String> = ::std::vec![];
                #serializers
                result
            }
        }
    }));

    item
}

We parse the incoming thing as struct, panicking if it is not (this would result in compilation error, noting what went wrong). Then we take its name to substitute into impl template (what is serialization trait implemented on), and pass fields into next function. With the resulting sequence of store-statements, we extend struct definition (attribute macros replaces code it is applied to with whatever it returns).

The code generator itself

fn create_serialization_code(attr: &str, struct_fields: &Fields)
        -> V2TokenStream {
    let Fields::Named(ref fields) = struct_fields else {
        panic!("For unambiguous parsing, normal structs must consist of named fields");
    };
    let mut field_spans: HashMap<String, (Ident, Span)> = HashMap::new();
    for field in fields.named.iter() {
        let id = field.ident.clone().expect("unnamed field");
        field_spans.insert(id.to_string(), (id, field.span()));
    }

    // Mapping each part of serialization TL-B to block of code that stores value into cell
    let serializations = attr.split(",").map(|part_whitespaced| {
        let part = part_whitespaced.trim();
        if part.is_empty() {
            quote!{}
        } else if part.starts_with("u ") {
            quote! { result.push(#part.to_owned()); }
        } else {
            let (name, span) = &field_spans[part];
            quote_spanned! {span.clone()=>{
                let mut s_field = crate::ton::CellSerialize::serialize(&self.#name);
                result.append(&mut s_field);
            }}
        }
    });

    // Constructing function of all those code chunks
    quote!{{
        #(#serializations)*
    }}
}

The code is quite straightforward! First, we take a list of fields in struct and store identifier + span for each of them. Identifier is just handy not to create extra time, while span allows to direct any errors onto the field definition instead of macros:

error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
  --> src\main.rs:17:9
   |
17 |         hash_high: i128,
   |         ^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`

If we replace quote_spanned! with quote!, we won't see even name of field that caused the error:

error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
  --> src\main.rs:14:5
   |
14 |     #[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`

One more detail I feel I should mention, is why in some places there is quote!{ code } and somewhere quote!{{ code }}. The answer is simple: the second option creates a separate block of code that isolates any local variables which are there so they don't clash with ones defined in serialization of the next field.

A disputable question

Would I be better off using macros-by-example? On one hand, they can parse text describing order of fields easier. On the other hand, it would be hard to use field names as expressions (since those macros are hygienic). And finally, I wouldn't learn how procedural macros work and wouldn't write this article!

DEV Community

Example of Rust attribute macros: data serialization (part 1 - structures)

Required libraries

Entry point

The code generator itself

A disputable question

Top comments (0)

Read next

Systemd vs. Docker: Exploring a Surprising Alternative

How to Design a Tangram Puzzle Using 3D CAD Software

CSS Cleanup Tools: Top Picks for Cleaner, Faster Code

Day 8: Week 1 Quiz