DEV Community

Cover image for PHP Skeleton for Bison
Anton Sukhachev
Anton Sukhachev

Posted on • Updated on

PHP Skeleton for Bison

What is Bison?

Bison is a parser generator.
For example, it can help you to build a parser to parse your code into AST:

<?php

namespace App;

class Test
{
    public function test($foo) {}
}

Enter fullscreen mode Exit fullscreen mode
.
├── ZEND_AST_STMT_LIST
    ├── ZEND_AST_NAMESPACE
    │   └── ZEND_AST_ZVAL 'App'
    └── ZEND_AST_CLASS 'Test'
        └── ZEND_AST_STMT_LIST
            └── ZEND_AST_METHOD 'test'
                └── ZEND_AST_PARAM_LIST
                    └── ZEND_AST_PARAM
                        └── ZEND_AST_ZVAL 'foo'
Enter fullscreen mode Exit fullscreen mode

There is software that uses Bison:

How Bison works?

Bison

Bison takes your grammar.y file, parses it, extracts all definitions, and then constructs a bunch of tables like this:

$yytable = [
    6, 3, 7, 20, 8, 51, 28, 1, 52, 4,
    9, 13, 10, 29, 15, 30, 18, 31, 16, 19,
    32, 22, 33, 34, 23, 24, 35, 11, 37, 25,
    21, 38, 39, 26, 45, 0, 40, 42, 0, 43,
    41, 0, 0, 49, 0, 0, 0, 0, 0, 47,
    48, 0, 50, 0, 53, 54
];
Enter fullscreen mode Exit fullscreen mode

Then, this data is passed to a template that is called skeleton.
This skeleton is a special file written in M4 language that renders your parser.php file.
By default, Bison supports C/C++/D/Java languages, but you can extend it with your own skeleton file.

Simple calculator

Let's make a simple calculator with Bison and PHP. It will parse expression from stdin and print result.

First, we must install PHP skeleton package.

composer require --dev mrsuh/php-bison-skeleton
Enter fullscreen mode Exit fullscreen mode

Then we define a simple grammar file.

grammar.y

%define api.parser.class {Parser}

%token T_NUMBER

%left '-' '+'

%%
start:
  exp                { printf("%f\n", $1); }
;

exp:
  T_NUMBER           { $$ = $1; }
| exp '+' exp        { $$ = $1 + $3;  }
| exp '-' exp        { $$ = $1 - $3;  }
;

%%
Enter fullscreen mode Exit fullscreen mode

Let's build a parser from grammar.y:

bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o parser.php grammar.y
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

For now, parser.php does nothing. But we can see the interface LexerInterface inside it:

parser.php

interface LexerInterface
{
    public const YYEOF = 0;    
    public const YYerror = 256;    
    public const YYUNDEF = 257;
    public const T_NUMBER = 258; /** %token T_NUMBER */
    ...
}
...
Enter fullscreen mode Exit fullscreen mode

Interface contains constants with our tokens from grammar.y file and some special values for the end of file or errors.
With this interface, we can write our class Lexer to parse calculation expressions into tokens.

class Lexer implements LexerInterface {

    private array $words;
    private int   $index = 0;
    private int   $value = 0;

    public function __construct($resource)
    {
        $this->words = explode(' ', trim(fgets($resource)));
    }

    public function getLVal()
    {
        return $this->value;
    }

    public function yylex(): int
    {
         $this->value = 0;

        if ($this->index >= count($this->words)) {
            return LexerInterface::YYEOF;
        }

        $word = $this->words[$this->index++];

        if (is_numeric($word)) {
            $this->value = (int)$word;

            return LexerInterface::T_NUMBER;
        }

        return ord($word);
    }
}
Enter fullscreen mode Exit fullscreen mode

Every time we call the function Lexer::yylex(), it will return the token's identifier and store the word into the value property.
For example, the expression 10 + 20 - 30 will translate into this:

word token value
10 LexerInterface::T_NUMBER (258) 10
+ ASCII (43)
20 LexerInterface::T_NUMBER (258) 20
- ASCII (45)
30 LexerInterface::T_NUMBER (258) 30
LexerInterface::YYEOF (0)

And the last part.
We must instantiate lexer, parser and bind them.

$lexer  = new Lexer(STDIN);
$parser = new Parser($lexer);
if (!$parser->parse()) {
    exit(1);
}
Enter fullscreen mode Exit fullscreen mode

Let's assemble all parts above into a single grammar.y file.

grammar.y

%define api.parser.class {Parser}

%token T_NUMBER

%left '-' '+'

%%
start:
  exp                { printf("%f\n", $1); }
;

exp:
  T_NUMBER           { $$ = $1; }
| exp '+' exp        { $$ = $1 + $3;  }
| exp '-' exp        { $$ = $1 - $3;  }
;

%%

class Lexer implements LexerInterface {

    private array $words;
    private int   $index = 0;
    private int   $value = 0;

    public function __construct($resource)
    {
        $this->words = explode(' ', trim(fgets($resource)));
    }

    public function getLVal()
    {
        return $this->value;
    }

    public function yylex(): int
    {
         $this->value = 0;

        if ($this->index >= count($this->words)) {
            return LexerInterface::YYEOF;
        }

        $word = $this->words[$this->index++];

        if (is_numeric($word)) {
            $this->value = (int)$word;

            return LexerInterface::T_NUMBER;
        }

        return ord($word);
    }
}

$lexer  = new Lexer(STDIN);
$parser = new Parser($lexer);
if (!$parser->parse()) {
    exit(1);
}
Enter fullscreen mode Exit fullscreen mode

Build it:

bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o parser.php grammar.y
Enter fullscreen mode Exit fullscreen mode

It's time to test our parser.

php parser.php <<< "1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + 10"
7
Enter fullscreen mode Exit fullscreen mode

It works!

Some useful links:

Top comments (0)