DEV Community

KiminLee
KiminLee

Posted on • Updated on

What is the problem, what would be the best solution?

Make another lexer.js vs use the original one and update it

The issue is updating the lexer.js. And the maintainer of the open source project allowed me to modify the code structure. However, it is really good idea to create a whole new brand of lexer.js or just to remove the useless parts and replace them with new codes?

To decide it, let's take a look at the code.

What does the lexer.js really do in the program.

This program mainly have 4 parts.

  1. tokens: A pair of keyword and value. This is the list of syntax of the program to be compiled. If the words read from a file match with one of the token, the program know that the word is meaningful syntax.

  2. _lexer: basically convert the raw input into meaningful object which later can be transplied.

  3. transplie : take the object from lexer.js. If all syntax are correct then make the codes to javascript codes.

  4. syntax checker : check if the object from lexer.js is all correct if not issue the syntax errors

inside the lexer.js

So, let's explain the lexer engine again step by step:

  1. find all of the possible token(the words predefined in the program to be compiled later, eg) 'print(), var ...')

  2. transform the raw data into the array of the token (need to be sorted: first in first out)

  3. compare the token the object which has identifier(the name of the token) and regex(regular expression).

  4. If they are matched, it will create an object(token and identifier). This object will be compiled later in the transplier.js (final version of object)

step 1 to 2, I should not modify them because if they were changed, all the predefined token will be useless (needs new token system).

Also, the lexer engine has to pass the object to the transplier.js, so that the raw code can be successfuly compiled later.

So, I could only change the step 3.

The thing to be upgraded

So far, there is so much work to be upgraded like var ----> let or const. Removing global variable, reform the code structure...

In this week, I changed the structure of the code first so that the upgrade can be done well later.

I first removed all the global variable in the lexer,js

let colors = require("./colors");
let tokens = require("./tokens");


let regexTerms = [];
let rawRegex = "";
let mainRegex;
let resolverRegexes = [];

Object.keys(tokens.tokens).forEach((token) => { // Populate main regex
    regexTerms.push(`${tokens.tokens[token].match}`);
});

Object.keys(tokens.tokens).forEach((token) => {
    resolverRegexes.push({
        "match": new RegExp(`(${tokens.tokens[token].match})`, "gi"),
        "data": {
            "identifier": token
        }
    });
});
Enter fullscreen mode Exit fullscreen mode

And create a new .js file called lexerToken.js. This new file will generate the intermediate token object which will help to create the final version of object (used in transpiler.js)

This commit can be found in here

After the work, now lexerToken only handle the token related job. And lexer will only handle transform the raw data into the completed object with token object.

next step

The thing is, it took long time to compile big amount of codes. So the lexer engine needs to have smaller runtime.

The main reason for too much runtime may be the too many loop. So , the next step is find the extra loop and reduce it and change the algorithm.

Top comments (0)