DEV Community

Elizabeth Mattijsen
Elizabeth Mattijsen

Posted on • Updated on

A practical example of RakuAST

A while ago someone asked on #raku if it is possible to create a Raku character class with the valid characters being supplied by a string. This is not possible by default in Raku at the moment. But it is possible using RakuAST!

Let's first see how one can create characters classes with RakuAST by applying the .AST method to an example.

By the way, all of these examples assume that a use experimental :rakuast; or use v6.e.PREVIEW is active.

say 'my token {<[abc]>}'.AST.statements.head.expression;
Enter fullscreen mode Exit fullscreen mode

What this does is

  • create the AST (.AST) for an anonymous token with a charclass for the letters "a", "b" and "c"
  • then skips to the first statement (.statements.head)
  • then skips to its .expression

because the .AST returns a statement list, and we're only interested in the expression of the first statement.

The result is:

RakuAST::TokenDeclaration.new(
  body => RakuAST::Regex::Assertion::CharClass.new(
    RakuAST::Regex::CharClassElement::Enumeration.new(
      elements => (
        RakuAST::Regex::CharClassEnumerationElement::Character.new("a"),
        RakuAST::Regex::CharClassEnumerationElement::Character.new("b"),
        RakuAST::Regex::CharClassEnumerationElement::Character.new("c"),
      )
    )
  )
);
Enter fullscreen mode Exit fullscreen mode

So it looks like each character in the charclass is a separate RakuAST::Regex::CharClassEnumerationElement::Character object. With this knowledge, it is pretty easy to make a custom token for the characters in a given string. Let's create a subroutine "chars-matcher" that will create a token with a charclass of the characters for a given string:

sub chars-matcher($string) {
    my @elements = $string.comb.unique.map: {
        RakuAST::Regex::CharClassEnumerationElement::Character.new($_)
    }
    RakuAST::TokenDeclaration.new(
      body => RakuAST::Regex::Assertion::CharClass.new(
        RakuAST::Regex::CharClassElement::Enumeration.new(:@elements)
      )
    ).EVAL
}
Enter fullscreen mode Exit fullscreen mode

First we create an array "@elements" and fill that with an enumeration object for each unique char in the given string ($string.comb.unique). And then we create the TokenDeclaration object as from the example, but with the @elements array as the specification of the characters. And then we convert that into an actual usable token by running .EVAL on it.

An example of its usage:

my $matcher = chars-matcher("Anna Mae Bullock");
say "Tina Turner" ~~ $matcher;       # 「n」
say "Tina Turner" ~~ / $matcher+ /;  # 「na 」
Enter fullscreen mode Exit fullscreen mode

As you can see, you can use the generated token direct in a smart-match. Or you can use it as part of a more complicated regex.

If you're ready to further dabble with RakuAST, it is probably a good idea to know a little bit of how it is currently being implemented. So let's dive a little bit into that, to better understand some of the errors you might encounter when writing Raku Programming Language code to create ASTs.

Prerequisites for RakuAST

It is the intent to have all Raku source code be parsed by the (new) Raku grammar (and associated Actions) in the future, just as it is now with the legacy grammar. Since the Raku core setting (which contains the Raku code for most of the implementation of the Raku Programming Language) must also be parsed by this, it implies that the RakuAST classes must exist before there is any Raku Programming Language.

This is a chicken-and-egg problem that is solved in Rakudo by the so-called "bootstrap". This is quite a sizeable chunk of NQP code that "manually" creates enough functionality to allow the Raku core setting to build itself up to a fully functional implementation of the Raku Programming Language.

When Jonathan Worthington started the RakuAST project, they didn't want to have to implement all of that functionality in NQP yet again. So they devised a neat hack by creating a rather simple parser that would read Raku-like code, and convert that to NQP source code that would create all of the 360+ RakuAST classes when run. Of course, that Raku-like code still does not have the full Raku capabilities, but it does make the implementation task a lot easier than it would have been if it had be all written in NQP.

Example of a RakuAST class' internals

Let's have look a simple example of such a RakuAST class. For instance, the RakuAST::StrLiteral class that we've seen earlier in the first instalment.

class RakuAST::StrLiteral is RakuAST::Literal {
    has Str $.value;

    method new(Str $value) {
        my $obj := nqp::create(self);
        nqp::bindattr($obj, RakuAST::StrLiteral, '$!value', $value);
        $obj
    }

    # other methods

    method IMPL-EXPR-QAST(RakuAST::IMPL::QASTContext $context) {
        my $value := $!value;
        $context.ensure-sc($value);
        my $wval := QAST::WVal.new( :$value );
        QAST::Want.new($wval,'Ss', QAST::SVal.new(:value(nqp::unbox_s($value))))
    }
}
Enter fullscreen mode Exit fullscreen mode

For simplicity's sake, only two methods are shown. The new method, taking a single positional Str $value. Which needs some NQP to create the object and bind the value to the $!value attribute.

And we see an IMPL-EXPR-QAST method. That method will be called whenever that RakuAST object needs to generate QAST (a precursor to actual bytecode) for itself.

If you find this very interesting, you probably want to read the RakuAST README. And the actual source code of the RakuAST classes can be found in the same directory. And if you're really feeling adventurous and you have the Rakudo repository checked out, you can have a look at the generated NQP code in gen/moar/ast.nqp.

What's the point?

So why am I even mentioning this? Because the RakuAST classes look like they're actual Raku classes, but they're really NQP subroutines wrapped up to appear like Raku classes. Which results in unexpected failure modes if there's some error in your calls to RakuAST classes. In other words: the edges are a little bit sharper with RakuAST classes, and LTA error messages can and will happen. It's one of the "benefits" of living on the edge!

Of course, as a user of RakuAST classes, you should only be interested in the new method, and any other non-internal methods. Sadly, it is way too early in the bootstrap to mark internal methods with an is implementation-detail trait, so another heuristic is needed. And that would be: "consider any ALL-CAPS methods to be off-limits".

Conclusion

This installment gives an actual example of how you can use RakuAST in your code today. And gives some technical background on the implementation of RakuAST classes, which still is a little sharp around the edges.

The intended audience are those people willing to be early adopters of these exciting new features in the Raku Programming Language.

Top comments (0)