DEV Community

loading...

Python Lark Parser introduction

vicentemaldonado profile image Vicente Maldonado Originally published at Medium on ・3 min read

Lark is a Python parsing library. Unlike parser generators like Yacc it doesn’t generate a source code file from a grammar — the parser is generated dynamically. Let’s see hot it works. You import Lark:

from lark import Lark

then specify the grammar:

grammar = """
start: WORD "," WORD "!"
%import common.WORD
%ignore " "
"""

The grammar can be a Python string or read from a separate file. After that, just create a Lark class instance, initializing it with the grammar:

parser = Lark(grammar)

and you are ready to parse:

def main():
    print(parser.parse("Hello, world!"))
    print(parser.parse("Adios, amigo!"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()

parser.parse returns a Tree instance containing the parse tree:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'world')])
Tree(start, [Token(WORD, 'Adios'), Token(WORD, 'amigo')])

That’s it, clean and simple. It’s up to you to decide what to do with the parsed string. Let’s see where we can go from there. Here is an example of a simple arithmetic expression parser:

from lark import Lark

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER

sub\_expr: NUMBER "-" NUMBER

%import common.NUMBER
%ignore " "
"""

The grammar ignores spaces. Also note that the grammar terminals are written in uppercase letters (NUMBER) while the grammar rules are written in lowercase letters (start, add_expr and sub_expr). %import and %ignore are directives. You can find the grammar reference in the Lark documentation. We can import definitions from other grammars — in this case common.lark .( common.lark just contains some useful definitions). The above grammar will successfully parse addition and subtraction expressions, like:

1+1
2-1
3 - 2

and nothing else. Next, create the Lark object:

parser = Lark(grammar)

and we are ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))    

if \_\_name\_\_ == '\_\_main\_\_':
    main()

The output is as expected:

Tree(start, [Tree(add\_expr, [Token(NUMBER, '1'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '2'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '3'), Token(NUMBER, '2')])])

Note that this example just prints the parse tree as before. Let’s transform it to something more useful:

from lark import Lark, Transformer

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER -> add\_expr

sub\_expr: NUMBER "-" NUMBER -> sub\_expr

%import common.NUMBER
%ignore " "
"""

add_expr and sub_expr on the right hand side of the grammar rules are the names of the functions that are to be applied when a rule is successfully parsed. Let’s write them:

class CalcTransformer(Transformer):

    def add\_expr(self, args):
        return int(args[0]) + int(args[1])

    def sub\_expr(self, args):
        return int(args[0]) - int(args[1])

Uh. For instance, when parsing

2-1

args[0] will contain "2" and args[1] will contain "1" . In our transformer functions we convert both to integers and add or subtract them returning the result. Now create the Lark object:

parser = Lark(grammar, parser='lalr', 
    transformer=CalcTransformer())

For it to be able to accept transformers the parser needs to be a LALR parser. We are finally ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()

The output is now:

Tree(start, [2])
Tree(start, [1])
Tree(start, [1])

Better? 1+1 is 2, 2–1 is1 and 3–2 is also 1.

Of course this is just scratching the surface. If you are interested, you can find the full examples on Github.

Discussion (0)

pic
Editor guide