The lexer is a surprisingly simple component of a compiler. It mainly consists of a bunch of if-statments. The lexer will break up the code in seperate lines and analyze them one by one.
1. Tokens
But before looking at the lexer, let's take a better look at tokens. A token is a very simple array that consists of a Type and a Value:
{ Type = TYPE_NAME, Value = TOKEN_VALUE }
The Type
property describes the type of token like:
- a statement
- a number
- a variable
The Value
property can have different meanings depending on the type:
- the statement type
- the number value
- the variable name
a token can have even more properties depending on the type.
2. The lexer
Let's see how the lexer will break down this example line into tokens:
var y = x + 2
I will use the string
variable in the code examples to represent the part we are currently looking at.
the first part is var
. We will insert a variable declaration token:
if string == "var" then
pushTokens({Type = "statement", Value = "var"}) -- puts the token at the end of the tokens list
end
Now we have created our first token using the lexer!
Next up is y
. This is a variable, but the lexer doesn't know this. Luckily, it is able to look at the previous token and see that it's a declaration, so it will add the y
variable to the variable list.
if prevToken.Type == "statement" and prevToken.Value == "var" then -- the previous token declares a variable
pushTokens({Type = "variable", Value = string}) -- put new token in the token list
table.insert(localVariables, string) -- put new variable in the variable list
end
Next we have =
. This is a simple case of adding a new token with Type assigner
and value =
:
pushTokens({Type = "assigner", Value = "="})
now the variable x
. We will say we defined x
earlier in the code somewhere, so the lexer already knows it's a variable. If it wasn't defined and the previous token is not a variable declaration token then the lexer should throw an error.
if table.find(localVariables, string) then -- A variable exist with name x?
pushTokens({Type == "variable", Value = string}) -- Let's add it!
elseif prevToken.Type == "statement" and prevToken.Value == "var" then
pushTokens({Type = "variable", Value = string})
table.insert(localVariables, string)
else
error("Unknown variable " .. string)
end
all thats's left are +
and 2
these will be converted into these simple tokens:
{Type = "operator", Value = "+"}
{Type = "number", Value = "2"}
Now we have fully generated all of our tokens:
[Type = "statement", Value = "var"},
{Type = "variable", Value = "y"},
{Type = "assigner", Value = "="},
{Type = "variable", Value = "x"},
{Type = "operator", Value = "+"},
{Type = "number", Value = "2"}
This is a basic overview of the lexer. The more additions you add to your language, the complexer the lexer will become, so be sure to keep your code nice and tidy!
Top comments (0)