DEV Community

loading...

Python Lark Parser introduction

Vicente Maldonado
Rh-
Originally published at Medium on ・3 min read

Lark is a Python parsing library. Unlike parser generators like Yacc it doesn’t generate a source code file from a grammar — the parser is generated dynamically. Let’s see hot it works. You import Lark:

from lark import Lark

then specify the grammar:

grammar = """
start: WORD "," WORD "!"
%import common.WORD
%ignore " "
"""

The grammar can be a Python string or read from a separate file. After that, just create a Lark class instance, initializing it with the grammar:

parser = Lark(grammar)

and you are ready to parse:

def main():
    print(parser.parse("Hello, world!"))
    print(parser.parse("Adios, amigo!"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()

parser.parse returns a Tree instance containing the parse tree:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'world')])
Tree(start, [Token(WORD, 'Adios'), Token(WORD, 'amigo')])

That’s it, clean and simple. It’s up to you to decide what to do with the parsed string. Let’s see where we can go from there. Here is an example of a simple arithmetic expression parser:

from lark import Lark

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER

sub\_expr: NUMBER "-" NUMBER

%import common.NUMBER
%ignore " "
"""

The grammar ignores spaces. Also note that the grammar terminals are written in uppercase letters (NUMBER) while the grammar rules are written in lowercase letters (start, add_expr and sub_expr). %import and %ignore are directives. You can find the grammar reference in the Lark documentation. We can import definitions from other grammars — in this case common.lark .( common.lark just contains some useful definitions). The above grammar will successfully parse addition and subtraction expressions, like:

1+1
2-1
3 - 2

and nothing else. Next, create the Lark object:

parser = Lark(grammar)

and we are ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))    

if \_\_name\_\_ == '\_\_main\_\_':
    main()

The output is as expected:

Tree(start, [Tree(add\_expr, [Token(NUMBER, '1'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '2'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '3'), Token(NUMBER, '2')])])

Note that this example just prints the parse tree as before. Let’s transform it to something more useful:

from lark import Lark, Transformer

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER -> add\_expr

sub\_expr: NUMBER "-" NUMBER -> sub\_expr

%import common.NUMBER
%ignore " "
"""

add_expr and sub_expr on the right hand side of the grammar rules are the names of the functions that are to be applied when a rule is successfully parsed. Let’s write them:

class CalcTransformer(Transformer):

    def add\_expr(self, args):
        return int(args[0]) + int(args[1])

    def sub\_expr(self, args):
        return int(args[0]) - int(args[1])

Uh. For instance, when parsing

2-1

args[0] will contain "2" and args[1] will contain "1" . In our transformer functions we convert both to integers and add or subtract them returning the result. Now create the Lark object:

parser = Lark(grammar, parser='lalr', 
    transformer=CalcTransformer())

For it to be able to accept transformers the parser needs to be a LALR parser. We are finally ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()

The output is now:

Tree(start, [2])
Tree(start, [1])
Tree(start, [1])

Better? 1+1 is 2, 2–1 is1 and 3–2 is also 1.

Of course this is just scratching the surface. If you are interested, you can find the full examples on Github.

Discussion (0)