DEV Community

Tomasz Wegrzanowski
Tomasz Wegrzanowski

Posted on

100 Languages Speedrun: Episode 76: Python SLY

So far we tried two parser generators for Python - PLY, and ANTLR 4 (which took two episodes - one, two).

Time for another one - SLY, a successor to PLY. It doesn't do anything special parsing-wise, it's just just another run-of-the-mill LR style parser generator, its main selling point is a much nicer Python interface.

Math Language Parser

Let's start with something very simple - a program to parse and run our "math" language, the same one I created 7 versions of for ANTLR 4 episodes. In SLY it's so much more concise:

#!/usr/bin/env python3

from sly import Lexer, Parser
import sys

class MathLexer(Lexer):
  tokens = { PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN, NUM, ID }
  ignore = " \t\r\n"
  PLUS = r"\+"
  MINUS = r"-"
  TIMES = r"\*"
  DIVIDE = r"/"
  LPAREN = r"\("
  RPAREN = r"\)"
  NUM = r"-?[0-9]+(\.[0-9]*)?"
  ID = r"[a-zA-Z_][a-zA-Z0-9_]*"

class MathParser(Parser):
  tokens = MathLexer.tokens

  def __init__(self):
    self.vars = {}

  @_("expr PLUS term")
  def expr(self, p):
    return p.expr + p.term

  @_("expr MINUS term")
  def expr(self, p):
    return p.expr - p.term

  @_("term")
  def expr(self, p):
    return p.term

  @_("term TIMES factor")
  def term(self, p):
    return p.term * p.factor

  @_("term DIVIDE factor")
  def term(self, p):
    return p.term / p.factor

  @_("factor")
  def term(self, p):
    return p.factor

  @_("LPAREN expr RPAREN")
  def factor(self, p):
    return p.expr

  @_("NUM")
  def factor(self, p):
    return float(p.NUM)

  @_("ID")
  def factor(self, p):
    return self.getVar(p.ID)

  def getVar(self, name):
    if name not in self.vars:
      self.vars[name] = float(input(f"Enter value for {name}: "))
    return self.vars[name]

if __name__ == "__main__":
  path = sys.argv[1]
  with open(path) as f:
    text = f.read()
    lexer = MathLexer()
    parser = MathParser()
    result = parser.parse(lexer.tokenize(text))
    print(result)
Enter fullscreen mode Exit fullscreen mode

We can run it on the same three examples:

a.math - operator precedence test:

300 + 50 * 4 + 80 / 4 - (80 - 30) * 2
Enter fullscreen mode Exit fullscreen mode

miles_to_km.math - unit converter:

miles * 1.60934
Enter fullscreen mode Exit fullscreen mode

circle_area.math - a test program to verify it asks for same variable only once:

3.14159265359 * r * r
Enter fullscreen mode Exit fullscreen mode

And we can try to run it:

$ ./math.py math/a.math
420.0
$ ./math.py math/miles_to_km.math
Enter value for miles: 420
675.9228
$ ./math.py math/circle_area.math
Enter value for r: 69
14957.12262374199
Enter fullscreen mode Exit fullscreen mode

Let's follow how it works step by step:

Lexer

Lexer is the part responsible for chopping up the input text into tokens. So a text like 2 + 3 * 4 becomes [NUM(2), PLUS, NUM(3), TIMES, NUM(4)].

It very tiny, we just define a set of 8 tokens in our language, regular expressions for each of them, and then also some ignore rules to skip any extra whitespace between the tokens:

class MathLexer(Lexer):
  tokens = { PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN, NUM, ID }
  ignore = " \t\r\n"
  PLUS = r"\+"
  MINUS = r"-"
  TIMES = r"\*"
  DIVIDE = r"/"
  LPAREN = r"\("
  RPAREN = r"\)"
  NUM = r"-?[0-9]+(\.[0-9]*)?"
  ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
Enter fullscreen mode Exit fullscreen mode

We could add some error handling there with error method. Also apparently SLY wants us to maintain self.lineno manually for error messages, which is weirdly common for parser generators, and feels really inappropriate for a language like Python, but I skipped that part.

We could do some pre-processing here, like converting value carried by NUM from a string to a float, but it's not really necessary, we can do it in parsing stage as well.

Parser

Parser has very usual rule, but it encodes them in a very unusual way:

class MathParser(Parser):
  tokens = MathLexer.tokens

  def __init__(self):
    self.vars = {}

  @_("expr PLUS term")
  def expr(self, p):
    return p.expr + p.term

  @_("expr MINUS term")
  def expr(self, p):
    return p.expr - p.term

  @_("term")
  def expr(self, p):
    return p.term

  @_("term TIMES factor")
  def term(self, p):
    return p.term * p.factor

  @_("term DIVIDE factor")
  def term(self, p):
    return p.term / p.factor

  @_("factor")
  def term(self, p):
    return p.factor

  @_("LPAREN expr RPAREN")
  def factor(self, p):
    return p.expr

  @_("NUM")
  def factor(self, p):
    return float(p.NUM)

  @_("ID")
  def factor(self, p):
    return self.getVar(p.ID)

  def getVar(self, name):
    if name not in self.vars:
      self.vars[name] = float(input(f"Enter value for {name}: "))
    return self.vars[name]
Enter fullscreen mode Exit fullscreen mode

You might have noticed a lot of methods with the same name. That's how SLY encodes alternatives. To say that expr can we one of three things (expr PLUS term, expr MINUS term, or term), you define three expr methods, each with different @_ decorator.

Match argument is passed to each of those methods. If certain sub-match occurs once, you can refer to it with p.expr or such. If it occurs multiple times, you'd need to use p.expr0, p.expr1, etc.

The __init__ and getVar are specific to just our math program, and not related to SLY.

Running it

And finally we run the program like this:

if __name__ == "__main__":
  path = sys.argv[1]
  with open(path) as f:
    text = f.read()
    lexer = MathLexer()
    parser = MathParser()
    result = parser.parse(lexer.tokenize(text))
    print(result)
Enter fullscreen mode Exit fullscreen mode

There's no tokenizeFile, so we need to read it manually. Once we do that, we can create Lexer and Parser objects, and call result = parser.parse(lexer.tokenize(text)). The API is very simple.

Should you use SLY?

SLY is not going to win any Excellence in Parsing awards. It's just another LR parser generator, so it has limited power, cryptic shift reduce error messages if you mess up, poor error messages and error recovery for user syntax errors, all the usual LR issues.

On the other hand it has fantastic Python API, so if you want to build something fast that's not too complicated, it's much less work than figuring out ANTLR 4, or writing your own parser by hand.

I wouldn't recommend it for more complex languages, but there's a lot of simple cases SLY is a perfect fit for.

Code

All code examples for the series will be in this repository.

Code for the Python SLY episode is available here.

Top comments (0)