DEV Community

LarmkaartDev
LarmkaartDev

Posted on

The lexer

The lexer is a surprisingly simple component of a compiler. It mainly consists of a bunch of if-statments. The lexer will break up the code in seperate lines and analyze them one by one.

1. Tokens

But before looking at the lexer, let's take a better look at tokens. A token is a very simple array that consists of a Type and a Value:
{ Type = TYPE_NAME, Value = TOKEN_VALUE }

The Type property describes the type of token like:

  • a statement
  • a number
  • a variable

The Value property can have different meanings depending on the type:

  • the statement type
  • the number value
  • the variable name

a token can have even more properties depending on the type.

2. The lexer

Let's see how the lexer will break down this example line into tokens:
var y = x + 2
I will use the string variable in the code examples to represent the part we are currently looking at.

the first part is var. We will insert a variable declaration token:

if string == "var" then
    pushTokens({Type = "statement", Value = "var"}) -- puts the token at the end of the tokens list
end
Enter fullscreen mode Exit fullscreen mode

Now we have created our first token using the lexer!

Next up is y. This is a variable, but the lexer doesn't know this. Luckily, it is able to look at the previous token and see that it's a declaration, so it will add the y variable to the variable list.

if prevToken.Type == "statement" and prevToken.Value == "var" then -- the previous token declares a variable
    pushTokens({Type = "variable", Value = string}) -- put new token in the token list
    table.insert(localVariables, string) -- put new variable in the variable list
end
Enter fullscreen mode Exit fullscreen mode

Next we have =. This is a simple case of adding a new token with Type assigner and value =:

pushTokens({Type = "assigner", Value = "="})
Enter fullscreen mode Exit fullscreen mode

now the variable x. We will say we defined x earlier in the code somewhere, so the lexer already knows it's a variable. If it wasn't defined and the previous token is not a variable declaration token then the lexer should throw an error.

if table.find(localVariables, string) then -- A variable exist with name x?
    pushTokens({Type == "variable", Value = string}) -- Let's add it!
elseif prevToken.Type == "statement" and prevToken.Value == "var" then
    pushTokens({Type = "variable", Value = string})
    table.insert(localVariables, string)
else
    error("Unknown variable " .. string)
end
Enter fullscreen mode Exit fullscreen mode

all thats's left are + and 2 these will be converted into these simple tokens:

{Type = "operator", Value = "+"}
{Type = "number", Value = "2"}
Enter fullscreen mode Exit fullscreen mode

Now we have fully generated all of our tokens:

[Type = "statement", Value = "var"},
{Type = "variable", Value = "y"},
{Type = "assigner", Value = "="},
{Type = "variable", Value = "x"},
{Type = "operator", Value = "+"},
{Type = "number", Value = "2"}
Enter fullscreen mode Exit fullscreen mode

This is a basic overview of the lexer. The more additions you add to your language, the complexer the lexer will become, so be sure to keep your code nice and tidy!

Top comments (0)