DEV Community

Samuel Pires
Samuel Pires

Posted on

Linters and Formatters: How code reads our code.

Throughout all my journey as a software engineer, I've always hoped in and out of different topics and questions, OOP or Functional?, Static or Dynamic Types? But one thing always seemed sure. I, and every other programmer that I knew liked a good-looking code, and I don't mean in a functional way, I'm talking about the visual part.

Identations, blank lines here and there, double and single quotes, and a bunch of other rules and styles make our code more readable and concise. And thanks to tools like auto-formatters and Linters, we can save ourselves the hustle of pressing tab once (or twice if you are a psycho) at the start of each line or things like that.

We all know what those linters and formatters are, but something that I got interested in the last weeks was HOW they worked, how exactly they read our code and sometimes can even fix some of our errors by themselves.

So in this little article, I'll guide you through a simple and general explanation of how a code formatter work, these rules can change but in most cases apply to the majority of linters and formatters out there.

1 - Abstract Syntax Tree

Abstract Syntax Tree, or AST for short, is known as a visual representation of our code structure, it doesn't know anything about how our code works or what it does, it consists in organizing each part of our code into "branches" of a tree-like structure, or even json which I will be using for further examples for better understanding.

For example lets take this simple Javascript function.

function sum(a, b) {
    return a + b;
}
Enter fullscreen mode Exit fullscreen mode

Lets use AST explorer to generate the AST for this function block, it will look something like this.

{
  "type": "Program",
  "start": 0,
  "end": 34,
  "body": [
    {
      "type": "FunctionDeclaration",
      "start": 0,
      "end": 34,
      "id": {
        "type": "Identifier",
        "start": 9,
        "end": 12,
        "name": "sum"
      },
      "expression": false,
      "generator": false,
      "async": false,
      "params": [
        {
          "type": "Identifier",
          "start": 13,
          "end": 14,
          "name": "a"
        },
        {
          "type": "Identifier",
          "start": 16,
          "end": 17,
          "name": "b"
        }
      ],
      "body": {
        "type": "BlockStatement",
        "start": 18,
        "end": 34,
        "body": [
          {
            "type": "ReturnStatement",
            "start": 21,
            "end": 32,
            "argument": {
              "type": "BinaryExpression",
              "start": 28,
              "end": 31,
              "left": {
                "type": "Identifier",
                "start": 28,
                "end": 29,
                "name": "a"
              },
              "operator": "+",
              "right": {
                "type": "Identifier",
                "start": 30,
                "end": 31,
                "name": "b"
              }
            }
          }
        ]
      }
    }
  ],
  "sourceType": "module"
}
Enter fullscreen mode Exit fullscreen mode

Yes, those 3 lines of code generated this monster of a JSON, but dont panic, lets go through slowly and you'll see that its actually very simple.

2 - Traversing the tree

First, we have our type which is "program", followed by an indication of where is the start and end of our script which can be like a "coordinate" of where that character is, and blank spaces count and tabs count differently here, but the most important part of all of this is the "body", is basically where all our code is.

Inside the body, we will see an array containing again the same pattern, a type, a start and end, another body, and so on.

Each "block" of our code resides inside the "body" of another block, and each type of code block may have some specific properties like for example let's pick only the function declaration.

"type": "FunctionDeclaration",
      "start": 0,
      "end": 34,
      "id": {
        "type": "Identifier",
        "start": 9,
        "end": 12,
        "name": "sum"
      },
      "expression": false,
      "generator": false,
      "async": false,
      "params": [
        {
          "type": "Identifier",
          "start": 13,
          "end": 14,
          "name": "a"
        },
        {
          "type": "Identifier",
          "start": 16,
          "end": 17,
          "name": "b"
        }
      ],
Enter fullscreen mode Exit fullscreen mode

We see a FunctionDeclaration type, followed by a property of the type Identifier wish contains the function name, then we have expression, generator, and async wish are boolean values, followed by an array of params, that have each function parameter contained in its object, again with type Identifier, its name, and its "coordinates".

We can use this system to read through every block of our code, and we can already start to have an Idea of how the formatters and linters can apply rules to it.

3 - Applying changes through AST

Now let us take a look at the fun part of all of this. Knowing exactly where every statement starts and ends, where every variable is, and every function name and parameter, linters will read the AST and can generate a warning based on the reading. Let's see some examples:

Unused variables / functions / imports

The linter goes through the AST storing the name for every VariableDeclaration Identifier in a temporary Array, now every time it finds this same identifier's name on the rest of the code, it will pop that entry out of the array, when it gets to the end of it, the linter loops through all the remaining items and generates the "Unused variable" warning in your console since it never got out of the array, it means that it was never called anywhere in the code. The same process happens with function declarations and function calls, as well as imports in javascript.

Formatting

When it comes to formatting its also very straightforward, the formatting tool can read through our AST and find the exact position of every character, it can increment or decrement the "start" and "end" values to manipulate the number of blank spaces or tabs that your code will have.

Conclusion

I hope that with this article I could deliver some cool information about this topic that isn't discussed as much in our community I find it very interesting, and I also hope that you could learn something new here.

Feel free to follow me here and on my socials for more tech talk and articles like this one!

Top comments (2)

Collapse
 
madhan_s profile image
Madhan S

Nice explanation. Any idea how this processes such large information for huge LOC efficiently (space & time)

Collapse
 
sammy1999 profile image
Samuel Pires

Linters at what I've seen, dont have knowledge of line size by itself, so, it treats the entire file continuously like an one liner. The performance itself isnt bad at all no matter the size of the given file, if you only run linters / formatters on the current buffer or only in warning mode (not applying corrections).