Yuan Gao

Posted on Dec 19, 2020 • Edited on Dec 22, 2020

Advent of Code 2020: Day 18 finally using PEG grammar in Python in the way it's supposed to

#adventofcode #python

Another quick one for today. The challenge is actually about parsing expressions. So I pull out the Parsimonious library again, and this time use it for its intended use, rather than all the times I used it on previous challenges (Day 07 and Day 04)

The Challenge Part 1

Link to challenge on Advent of Code 2020 website

The challenge is to implement a simple math expression parser, but with some unconventional operator precedence rules. An example of an expression is given as:

1 + (2 * 3) + (4 * (5 + 6))

The parser must solve a big file of these, but one in which operators are evaluated left-to-right rather than multiplication before addition, so 1 + 2 * 3 + 4 * 5 + 6 equals 71 rather than the usual 33

PEG Grammar

Finally, we get to use Parsimonious for its intended purpose. We'll write a PEG grammar to deal with these expressions. I'm going to skip over the explanation pretty quickly, more details for how this works can be found in my Day 04 and Day 07 solution posts.

The first thing I note is that the formatting is very consistent, and expressions are always NUMBER SPACE OPERATOR SPACE NUMBER. So we go ahead and define these:

    OP = " " ("+" / "*") " "
    NUMBER    = ~r"\d"

Next, our operators look like this:

    OPERATION = NUMBER (OP NUMBER)+

Which is saying that the operation consists of an expression followed by one or more pairs of OP NUMBER. E.g. 1 + 2 * 3 + 4 would give us the following parts:

[1, [["+", 2], ["*", 3], ["+", 4]]

Finally, we realize that we can't just deal with NUMBER, and sometimes we have to deal with BRACKETS first, so we replace it with an EXPR which can itself be either NUMBER or BRACKETS. This gives us the whole PEG expression:

from parsimonious.grammar import Grammar

grammar = Grammar(r"""
    OPERATION = EXPR (OP EXPR)+
    EXPR = (NUMBER / BRACKETS)
    BRACKETS = "(" OPERATION ")"
    OP = " " ("+" / "*") " "
    NUMBER    = ~r"\d"
""")

When we test it on the expression 1 + (2 * 3) we get:

print(grammar.parse("""1 + (2 * 3)"""))

Output

<Node called "OPERATION" matching "1 + (2 * 3)">
    <Node called "EXPR" matching "1">
        <RegexNode called "NUMBER" matching "1">
    <Node matching " + (2 * 3)">
        <Node matching " + (2 * 3)">
            <Node called "OP" matching " + ">
                <Node matching " ">
                <Node matching "+">
                    <Node matching "+">
                <Node matching " ">
            <Node called "EXPR" matching "(2 * 3)">
                <Node called "BRACKETS" matching "(2 * 3)">
                    <Node matching "(">
                    <Node called "OPERATION" matching "2 * 3">
                        <Node called "EXPR" matching "2">
                            <RegexNode called "NUMBER" matching "2">
                        <Node matching " * 3">
                            <Node matching " * 3">
                                <Node called "OP" matching " * ">
                                    <Node matching " ">
                                    <Node matching "*">
                                        <Node matching "*">
                                    <Node matching " ">
                                <Node called "EXPR" matching "3">
                                    <RegexNode called "NUMBER" matching "3">
                    <Node matching ")">

This looks really long, but when you strip out all the extra bits, you can see that it contains the structure we want.

Next, we implement a node visitor to execute the parsed syntax tree.

class ExprVisitor(NodeVisitor):
    def visit_OPERATION(self, node, visited_children):
        left = visited_children[0]
        for op, value in visited_children[1]:
            if op == "+":
                left += value
            if op == "*":
                left *= value
        return left

    def visit_EXPR(self, node, visited_children):
        return visited_children[0]

    def visit_BRACKETS(self, node, visited_children):
        return visited_children[1]

    def visit_OP(self, node, visited_children):
        return visited_children[1][0].text

    def visit_NUMBER(self, node, visited_children):
        return int(node.text)

    def generic_visit(self, node, visited_children):
        return visited_children or node

ev = ExprVisitor()
ev.grammar = grammar

Most of the lower nodes deal with formatting the nodes so that higher up nodes have clean data to work with. The main part of this that implements the challenge rules is the visit_OPERATION() function:

    def visit_OPERATION(self, node, visited_children):
        left = visited_children[0]
        for op, value in visited_children[1]:
            if op == "+":
                left += value
            if op == "*":
                left *= value
        return left

This function will have all the EXPR children evaluated to values already (which means BRACKETS will have been resolved recursively to their values). Recall the earlier example of 1 + 2 * 3 + 4, when it gets to this function, visited_children would be

[1, [["+", 2], ["*", 3], ["+", 4]]

So it can be seen that left is set to 1 and then for each pair of operator and value, it'll add or multiply the value onto the running total depending on the operator. The way this is written, the operators will evaluate left to right, as required by the challenge.

Finally, we run the lines of the input through the parser, which returns the evaluated value:

print("sum", sum(ev.parse(line.strip()) for line in open("input.txt").readlines()))

The output is the answer the challenge is looking for. The full code is:

from parsimonious.grammar import Grammar, NodeVisitor
grammar = Grammar(r"""
    OPERATION = EXPR (OP EXPR)+
    EXPR = (NUMBER / BRACKETS)
    BRACKETS = "(" OPERATION ")"
    OP = " " ("+" / "*") " "
    NUMBER    = ~r"\d"
""")

class ExprVisitor(NodeVisitor):
    def visit_OPERATION(self, node, visited_children):
        left = visited_children[0]
        for op, value in visited_children[1]:
            if op == "+":
                left += value
            if op == "*":
                left *= value
        return left

    def visit_EXPR(self, node, visited_children):
        return visited_children[0]

    def visit_BRACKETS(self, node, visited_children):
        return visited_children[1]

    def visit_OP(self, node, visited_children):
        return visited_children[1][0].text

    def visit_NUMBER(self, node, visited_children):
        return int(node.text)

    def generic_visit(self, node, visited_children):
        return visited_children or node

ev = ExprVisitor()
ev.grammar = grammar

print("sum", sum(ev.parse(line.strip()) for line in open("input.txt").readlines()))

The Challenge Part 2

In the second part of the challenge, the precedence is switched. Now instead of evaluating left to right, additions are evaluated first.

We can use the same code as before, we just have to change the visit_OPERATION() function to evaluate the additions before multiplications. It ends up looking like this:

    def visit_OPERATION(self, node, visited_children):
        parts = [visited_children[0]]
        for op, value in visited_children[1]:
            if op == "+":
                parts[-1] += value
            if op == "*":
                parts.append(value)
        return math.prod(parts)

Instead of keeping a running total onto which each new value is added or multiplied, here, we store a list of values, and for every addition operation we see, immediately add that to the last entry on the list. If we see a multiply, append to the list. This way, all the addition operations happen as we go, leaving only values to multiply in the list to the end, when they're all multiplied together, fulfilling the rules of the challenge.

The full code is, mostly the same as before:

from parsimonious.grammar import Grammar, NodeVisitor
import math

grammar = Grammar(r"""
    OPERATION = EXPR (OP EXPR)+
    EXPR = (NUMBER / BRACKETS)
    BRACKETS = "(" OPERATION ")"
    OP = " " ("+" / "*") " "
    NUMBER    = ~r"\d"
""")

class ExprVisitor(NodeVisitor):
    def visit_OPERATION(self, node, visited_children):
        parts = [visited_children[0]]
        for op, value in visited_children[1]:
            if op == "+":
                parts[-1] += value
            if op == "*":
                parts.append(value)
        return math.prod(parts)

    def visit_EXPR(self, node, visited_children):
        return visited_children[0]

    def visit_BRACKETS(self, node, visited_children):
        return visited_children[1]

    def visit_OP(self, node, visited_children):
        return visited_children[1][0].text

    def visit_NUMBER(self, node, visited_children):
        return int(node.text)

    def generic_visit(self, node, visited_children):
        return visited_children or node

ev = ExprVisitor()
ev.grammar = grammar
print("sum", sum(ev.parse(line.strip()) for line in open("input.txt").readlines()))

That's all! Parsimonious is a fun library to play with for any parsing jobs. I can highly recommend it

Onwards!

DEV Community

Advent of Code 2020: Day 18 finally using PEG grammar in Python in the way it's supposed to

The Challenge Part 1

PEG Grammar

The Challenge Part 2

Top comments (0)

Read next

Python Asynchronous Programming: Simplifying Concurrency Like a Pro

Automate Saving the Planet... Or Just Your Computer's Energy 🐍

This Week In Python

Building a 🧾 PDF Invoice Generator with 📨 Twilio SendGrid, 🐍 Python, Flask, and 🖌️ Bootstrap