DEV Community

Adam Crockett 🌀
Adam Crockett 🌀

Posted on • Edited on

Jess update #1 lexing

Jess is still very much under way, I have been learning a few things about parsing and rust along the way.

A crucial part of any compiler is the lexer, for those who don't know, the lexers sole responsibility is to identify individual tokens and push them into an array, this means that if a token is unknown, the lexer can throw an error. In jess's case, we don't throw an error yet, instead Jess lexer places a PANIC token in the stream so that we know that something was not understood. Jess can currently understand 5 tokens, 2 of which are the panic token and the ambiguous token, the other tokens are left curly and right and the start of an import statement.

Ambiguous

Tokens can be split into deterministic and nondeterministic identities, a deterministic is known to the lexer such as a semicolon, nondeterministic tokens include values such as strings, numbers, (for jess) CSS values and more, but crucially unknown tokens are also nondeterministic, they are Ambiguous because they require a bit of probing.
Ambiguous tokens get passed to an identity checker (mini parser) and various mostly regex based tests are run to further extrapolate the identity, afterwards this is returned to the token stream, if all identity checks fail as stated, the panic token is inserted.

Jess or more specifically libjess is panicking at most of the syntax right now because I have a lot of grammer left to define and register with the lexer, eg I have to teach the lexer a language. Furthermore Jess is a combination of CSS and JavaScript in such a way that CSS values are on a par with any other primitive types, therefore I have to decide if the lexer should know about types at all or just lump them into a single VAL token to be determit by the parser, to do this the lexer would need contextual awareness when determining a type, is this a CSS * or a JS *, this is a story for another post.

EOF

Phew we made it to the end of the file and our let while loop, amazing! Because we made it to the end and nothing caught fire 🔥 we get a special token, EOF, end of file, horay... Now what, the stream is serialised to JSON and returned back to the Typescript bindings (the front-end of libjess) as JSON, for now this is just wrote to stdout but it could also be used to potentially get cached, source mapped, used by a vscode language server, dumped for debugging and most importantly passed into the parser which will also be written in Rust.

EOP

Top comments (3)

Collapse
 
shanecandoit profile image
outOfBounds

I have always wanted to create a programming language. If this is part of a series consider adding a link to the earlier installments. Looking forward to more, Adam!

Collapse
 
adam_cyclones profile image
Adam Crockett 🌀 • Edited

Oops! IL do that now, glad it helps. This is my first language so you can see the journey as we go. I am nearly ready to write the parser and make the post about this, can't wait! Finally code generation which is going to be really fun as well.

Collapse
 
shanecandoit profile image
outOfBounds

Looking forward to it :)