DEV Community

Cover image for Making a Math Interpreter: The Lexer
John Nyingi
John Nyingi

Posted on • Updated on

Making a Math Interpreter: The Lexer

Resources

  • Find the Github link here

Tokens

So we need to define a list of all the tokens we expect in a math input. So go ahead and add a class Tokens.cs. The class will have an enum with all the tokens like so. We define it above the Tokens class.

    public enum Token
    {
        NUMBER=0,


        ADD, // +
        MINUS, // -
        MULTIPLY, // *
        DIVISION, // /

        RBRACE, // (
        LBRACE, // )

        EOF // END OF FILE
    }
Enter fullscreen mode Exit fullscreen mode

We also need a way to store the values attached to the Number Token. We will add the following to the Tokens class

    public class Tokens
    {
        public readonly Token _tokenType;
        public readonly object _value;

        public Tokens(Token tokenType, object value)
        {
            this._tokenType = tokenType;
            this._value = value;
        }


        public override string ToString()
        {
            return " " + this._tokenType + ":" + this._value;
        }
     }
Enter fullscreen mode Exit fullscreen mode

Let's add a way we can transform our Text input to Tokens. Enter Lexer

Lexer

So we need a class which will transform text to tokens, go ahead and create a Lexer.cs class.
Let's define the basics for the class

    public class Lexer
    {
        private readonly List<Tokens> tokens;
        private readonly string _input;
        private Int32 pos=0;
        private char curr_input;

        public Lexer(string input)
        {
            this._input = input;
            tokens = new List<Tokens>();
            this.curr_input = input.Length > 0 ? this._input[pos] : '\0'; // set first char

        }
     }
Enter fullscreen mode Exit fullscreen mode

The class will have a list of tokens, a string input and a position counter pos and also a current_input which is a char.

NOTE : this.curr_input sets the first value if empty it sets a null byte as first value.

We need a method which will allow us to get the next character(char) of input and update the position;

     private void Get_Next()
     {
         if(pos < this._input.Length - 1)
         {
             pos++;
             this.curr_input = this._input[pos];

         }
         else
         {
             curr_input = '\0';
         }
      }
Enter fullscreen mode Exit fullscreen mode

So right after the constructor we define the above method. This method checks if the pos(position of the current token) is within range. It increments the position and then it updates the current character. If it's out of range it sets the current character as a null byte \0 from ASCII table.

Now we need a method which will iterate through the whole input and create tokens.


        public List<Tokens> Get_Tokens()
        {

            while (true)
            {

                if(curr_input == '\0')
                {
                    Tokens eofToken = new Tokens(Token.EOF, null);
                    tokens.Add(eofToken); // Add the End OF File TOKEN
                    break;
                }
                Get_Next();
            }


            return tokens;
        }
Enter fullscreen mode Exit fullscreen mode

Our Get_Tokens method simply iterates through the whole list and generates the tokens, in this case we are checking for the null byte and creating an EOF token then breaking from the while loop.

Since we're already here let's create a override ToString() method. So that we can see all the tokens.

       public override string ToString()
       {
            StringBuilder sb = new StringBuilder();

            foreach (var token in tokens)
            {
                sb.Append(token.ToString());
            }

            return sb.ToString();
        }
Enter fullscreen mode Exit fullscreen mode

To view the EOF output update last Console.WriteLine in Program.cs like so;

        // generate tokens
        Lexer lexer = new Lexer(input);
        List<Tokens> tokens = lexer.Get_Tokens();

        Console.WriteLine(">> {0}", lexer.ToString());
Enter fullscreen mode Exit fullscreen mode

So now when you run it and hit ENTER you'll see this out put
image

Lexing Numbers

When you think of basic number structure it really only consist of the following;

  • Numbers from 0-9
  • Decimal numbers can be of the form 0.455 or 9.345

With this in mind let's create a method that can generate a number token and store it in decimal object which has 128 bits (sufficiently large enough for our test).

Let's start by creating a representation of what we expect a number should contain A list will do;
Add the below list at the top alongside other private fields

// LIST CHECKER
private List<char> NumberList = new List<char> { '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
Enter fullscreen mode Exit fullscreen mode

We can now define our Generate Number method;

        private Tokens Generate_Number()
        {
            int decimal_count = 0;
            StringBuilder sb = new StringBuilder();
            while(NumberList.Contains(curr_input))
            {
                if (curr_input == '.' && decimal_count <= 1)
                {
                    decimal_count++;
                }

                if(sb.Length < 1 && decimal_count > 0)
                {
                    // You have a decimal place starting
                    // with no preceding number i.e .6767 = 0.6767
                    sb.Append("0");
                }
                sb.Append(curr_input);
                Get_Next();
            }

            string str = sb.ToString();
            decimal val = Convert.ToDecimal(str);
            return new Tokens(Token.NUMBER, val);
        }
Enter fullscreen mode Exit fullscreen mode

Let's update the While loop to in Get_Tokens like so;

            while (true)
            {
                if(curr_input == ' ' || curr_input == '\t')
                {
                    // Skip empty space
                    Get_Next();
                    continue;
                }
                else if(NumberList.Contains(curr_input))
                {
                    Tokens numberToken = Generate_Number();
                    tokens.Add(numberToken);
                }

                else if(curr_input == '\0')
                {
                    Tokens eofToken = new Tokens(Token.EOF, null);
                    tokens.Add(eofToken);
                    break;
                }
            }
Enter fullscreen mode Exit fullscreen mode

We can run the application again and you can now enter a number and you should see an output like this;
image

Lexing Operators

Let's now add tokens for + - * / ( ). They all take the same structure, so we will update our while loop in Get_Tokens

            while (true)
            {
                if(curr_input == ' ' || curr_input == '\t')
                {
                    // Skip empty space
                    Get_Next();
                    continue;
                }
                else if(NumberList.Contains(curr_input))
                {
                    Tokens numberToken = Generate_Number();
                    tokens.Add(numberToken);
                }
                else if(curr_input == '+')
                {
                    Tokens additionToken = new Tokens(Token.ADD, null);
                    tokens.Add(additionToken);
                    Get_Next();
                }
                else if (curr_input == '-')
                {
                    Tokens minusToken = new Tokens(Token.MINUS, null);
                    tokens.Add(minusToken);
                    Get_Next();
                }
                else if (curr_input == '*')
                {
                    Tokens multiplyToken = new Tokens(Token.MULTIPLY, null);
                    tokens.Add(multiplyToken);
                    Get_Next();
                }
                else if (curr_input == '/')
                {
                    Tokens divideToken = new Tokens(Token.DIVISION, null);
                    tokens.Add(divideToken);
                    Get_Next();
                }
                else if (curr_input == '(')
                {
                    Tokens lbraceToken = new Tokens(Token.LBRACE, null);
                    tokens.Add(lbraceToken);
                    Get_Next();
                }
                else if (curr_input == ')')
                {
                    Tokens rbraceToken = new Tokens(Token.RBRACE, null);
                    tokens.Add(rbraceToken);
                    Get_Next();
                }
                else if(curr_input == '\0')
                {
                    Tokens eofToken = new Tokens(Token.EOF, null);
                    tokens.Add(eofToken);
                    break;
                }
            }
Enter fullscreen mode Exit fullscreen mode

Now you can run the program and enter some operators like so;
image

Error Handling

Thus far, we haven't tested some edge case;

  • What if the input has characters?
  • What if a number has multiple decimal places or characters?

For unknown characters let's add the below else in the above while loop;

   else
      {
        throw new InvalidOperationException($"{curr_input} is an unsupported type");
      }
Enter fullscreen mode Exit fullscreen mode

When a user enters an invalid number structure let's say; 56.564.4657 the Convert.ToDecimal() throws an invalid Format exception so that's handled.

Lets wrap our lexer object in Program.cs with a try-catch like so;

         try
           {
                    // generate tokens
               Lexer lexer = new Lexer(input);
               List<Tokens> tokens = lexer.Get_Tokens();

               Console.WriteLine(">> {0}", lexer.ToString());
           }
           catch (Exception ex)
           {
               Console.ForegroundColor = ConsoleColor.Red;
               Console.Error.WriteLine(ex.Message);
               Console.ForegroundColor = ConsoleColor.White;
           }
Enter fullscreen mode Exit fullscreen mode

So, now we can see errors like so;
image

Unit Tests

  • Right click the solution > Add > New Project
  • Choose xUnit image
  • Give it a project name like calcy.test, click next till you see it on your solution. image
  • Right click dependencies on the test project and select Add Project Reference
  • Check the project shown which should be the math interpreter image

Then click Ok

  • Rename UnitTest1 to LexerTest

Then add the following tests

    public class LexerTest
    {

        [Fact]
        public void TestAllTokens()
        {
            string expected = " LBRACE: RBRACE: NUMBER:4646 ADD: MINUS: MULTIPLY: DIVISION: NUMBER:565.788 EOF:";
            Lexer lexer = new Lexer("( ) 4646 + - * / 565.788");
            List<Tokens> tokens = lexer.Get_Tokens();
            string actual = lexer.ToString();

            Assert.NotEmpty(tokens);
            Assert.Equal(expected, actual);

        }

        [Fact]
        public void TestInvalidCharacters()
        {
            Lexer lexer = new Lexer("Wabebe");
            Assert.Throws<InvalidOperationException>(() => lexer.Get_Tokens());
        }

        [Fact]
        public void TestInvalidDecimalNumber()
        {
            Lexer lexer = new Lexer("35.4533.4546");
            Assert.Throws<FormatException>(() => lexer.Get_Tokens());
        }
    }
Enter fullscreen mode Exit fullscreen mode

To run the tests in visual studio right click the test project and select Run Test or in dotnet cli you can use.

dotnet test .\calcy.test
Enter fullscreen mode Exit fullscreen mode

In our part 3 we will;

  • Create an AST

Discussion (0)