Abstract Syntax Tree (AST) sounds like one of those daunting computer science terms at first but it becomes more approachable once you grasp the basics. The goal of this post is to give you a gentle introduction to AST while exploring practical applications in JavaScript.
If you are trying to understand the basics of AST and its practical application then this article is for you, No prior assumptions about your knowledge of AST are made here, as we'll take a straightforward approach to explain the concepts.
Instead of delving into the various stages that a program goes through before execution, this article is dedicated to enhancing your grasp of ASTs and demonstrating their practical applications in your JavaScript development journey. We'll achieve this by delving into tools that heavily rely on ASTs.
To effectively follow along, a foundational understanding of JavaScript is required. We will explore various JavaScript tools and engage in hands-on coding in the later sections of this post.
Disclaimer: If you've already developed Babel or ESLint plugins, this article may not be as beneficial for you, as you're likely already familiar with the majority of the content covered here.
What is an Abstract Syntax Tree (AST)
An abstract syntax tree (AST) is a hierarchical data structure used in computer science and programming language theory to represent the syntactic structure of source code or expressions in a programming language. It is often used as an intermediate representation during the compilation or interpretation of code.
That's a lot of words, right? Let's make it simple
Every piece of source code you write, whether it's intended for interpretation or compilation, undergoes a process known as parsing. During this process, the code is transformed into an Abstract Syntax Tree (AST), which serves as a structured, hierarchical representation of the code's underlying structure.
Having established that, let's look at some code and its corresponding AST:
Head over to astexplorer to get a clearer view
From the AST on the right, you'll notice the tree-like structure, we start out with the root node of type Module
which represents the whole file, and in that, we have the body
which holds other nodes of type ImportDeclaration
, VariableDeclaration
, and VariableDeclarator
which clearly describes each part of the code.
Here, I'm using the swc parser to turn my JavaScript code into an AST.
Please note that the AST may be a little different when you use a different parser but the idea is the same, A tree-like structure that represents the source code
Remember that we earlier established that every source gets parsed into an AST at some point before it gets compiled or interpreted. For example, platforms like Nodejs and chromium-based browsers use Gooogle's V8 engine behind the scenes to run JavaScript and of course, some AST parsing is always involved before the interpreter kicks in. I looked V8's source and I discovered it uses its own internal parser to achieve this.
Why do we then have other JavaScript parsers like babel parser
, swc parser, acorn
, espree and the likes since JavaScript engines have their own internal parsers?
They exist to provide a baseline for other tools to work with. For example, Transpilers, Minifiers, Linters, Codemods, Language Processors, and Obfuscators, use a parser behind the scenes to parse your code into an AST before applying transformations or performing any analysis whatsoever.
When it comes to the practical usage of Abstract Syntax Trees (ASTs), our primary emphasis in this article will be on two widely used applications: Transpilers and Linters, particularly within the context of JavaScript.
Here, we will see how ASTs play a crucial role in these applications, enabling developers to transform and analyze code effectively.
Code Transpililation
A transpiler, short for "source-to-source compiler", is a software tool that translates source code written in one programming language into equivalent source code in the same language. Transpilers are commonly used for various purposes, such as language compatibility, syntax conversion, and code optimization.
A frequent scenario involves syntax conversion, especially when dealing with compatibility issues. Imagine we have an application, and some of our users are on older web browsers. If we've adopted new syntax features like the Nullish coalescing operator (??), this could render our app unusable for those users. To address this, we must transform our code into an older, compatible syntax before deploying it to production. This ensures that our app remains accessible and functional for users with older browsers.
As depicted in the image above, the Transpiler begins by parsing the code into an Abstract Syntax Tree (AST). Following this, it proceeds to transform the AST as needed before finally generating code based on the modified AST.
Babel is a very common JavaScript Transpiler in the ecosystem and you may have used it directly or indirectly. We'll talk about how it uses AST in detail as we proceed.
Code Linting
Another software tool that is quite heavy on AST is Linters. A linter automatically analyzes and checks source code for potential errors, style violations, and programming best practices, helping developers identify and correct issues in their code during development.
Before a linter can perform static analysis on your source code, it begins by parsing the code into an Abstract Syntax Tree (AST). Once this parsing is complete, the linter then proceeds to traverse the AST to identify and address potential issues within the code.
ESLint is a widely adopted Linter within the JavaScript community. It boasts a robust plugin system, a comprehensive library of plugins, editor extensions, and presets (which are groups of plugins) that you can easily integrate into your project. We'll be talking about Eslint in detail later in this post.
AST in Transpliers (Babel)
Now that you've seen a couple of use cases for AST, we'll talk about Transpilers in detail. Specifically, we'd be using Babel and building a plugin.
Babel provides us with the toolchains to transpile our code, it has a CLI, a parser, and a plugin system, which means that you can write a plugin that applies some transformation to your code. You can also ship that to npm so that anyone can install and use it.
Code transpilation isn't specific to JavaScript, You can also add a level of transformation to your CSS source using tools like post-css. Most languages with a fairly mature ecosystem will probably have some tools to help with code transformation.
Babel takes each of your files, generates an Abstract Syntax Tree (AST) based on your code, and passes this AST along with additional information to a Plugin. The Plugin can then apply the required transformations to the AST. After the transformations are complete, the resulting AST is converted back into code. It is important to note that without a plugin, Babel does absolutely nothing. You simply get the same code as output.
To go more practical on our knowledge of ASTs, we'll write a simple Babel plugin that removes console logs from our code. Most JavaScript developers are guilty of littering their console.log while debugging. Our plugin will remove console.log
from our source code completely.
Most of the time, you'd want to use a linter to catch
console
logs before you commit changes to your repository instead of just removing them at build time.
Cloning Template
As you may have noticed, this isn't a comprehensive "How to Create a Babel Plugin" tutorial, so we won't spend too much time talking about how to Create a plugin. Instead, we'll begin with a template that I have exclusively designed for this article. This template is hosted in a monolithic repository, which simplifies the management of multiple packages within a single project. The template has two plugins in its plugin directory, one Babel plugin and another Eslint plugin. You can access the repository on GitHub.
Let's start off by cloning the repository:
https://github.com/marvinjude/ast-and-practical-js-applications.git
Checkout to the starter
branch:
git checkout starter
This repository comprises two branches:
main
andstarter
. Thestarter
branch serves as an empty template that we'll progressively build upfon throughout this article. On the other hand, themain
branch captures every modification we make to ourstarter
branch. Feel free to cross-reference themain
branch with your ongoing updates as necessary.
Installing Dependencies
Before installing dependencies and eventually running the project, you must have node and pnpm installed.
To Install dependencies, run:
pnpm install
Template and Files
As earlier mentioned, Our template is a monolithic repository managed by pnpm. The plugins
directory contains two packages that we'll be working on; babel-plugin-remove-console
and eslint-plugin-emojify-array
. I've also installed both packages in our project's root as you can see in package.json using the pnpm's workspace protocol.
We have a handful of files and folders in our project, but we'll be focusing on a few of them in this section:
src
- source files to be transplied
plugins
- contains the plugins that we'll mostly be working on
.babelrc.js
- Babel configuration file where we specified the plugin(s) to be used(more details below)
pnpm-workspace.yaml
- pnpm workspace configuration file where we specified what directory to store our packages
Configuring Babel
Babel relies on a configuration file that allows us to customize the plugins, presets, and other settings used during the Babel transpilation process. The primary configuration file is typically named .babelrc.js
, although other formats are also supported (you can learn more about configuring Babel here)
There's a .babelrc.js
file in the root of our project where we've correctly configured Babel to use our plugin.
๐ .babelrc.js
module.exports = {
plugins: ["babel-plugin-remove-console"],
};
Writing a Babel plugin
Before we dive into writing the plugin, let's examine a code snippet that utilizes console.log
and take a closer look at its corresponding Abstract Syntax Tree (AST). This should provide us with valuable insights on how to approach the development of our plugin.
Check it out on astexplorer
One great feature of astexplorer is its ability to allow you to interactively explore the Tree by clicking on or selecting code, which then automatically focuses on the corresponding AST node. For instance, when you work with a function call, like console.log
, you'll notice that it's represented as a CallExpression
. In our task, we aim to eliminate CallExpression
nodes specifically for those that involve console.log
.
Our Babel plugin, named babel-plugin-remove-console
is housed within the plugins
directory. It's a standard JavaScript package, and its entry point can be found at lib/index.js
which is the core component of our plugin. When Babel processes your code, it invokes the function exported from this entry point and applies the specified transformations. It's time to write our plugin function! (make sure to update the file below on your local branch)
๐ plugins/babel-plugin-remove-console/lib/index.js
module.exports = function (api) {
const { types: t } = api;
return {
name: "remove-console",
visitor: {
CallExpression(path) {
const { callee } = path.node;
if (
t.isMemberExpression(callee) &&
t.isIdentifier(callee.object, { name: "console" }) &&
t.isIdentifier(callee.property, { name: "log" })
) {
path.remove();
}
},
},
};
};
What do we have going on here?
First, Babel calls the plugin function with api
and pluginOptions
. Let's see what those are:
api
: This is the primary object provided to the plugin, and it grants the plugin access to various Babel methods, utilities, and information about the code being transformed. It contains properties like types
which has a bunch of utility methods like isMemberExpression
on it.
options
: This is an optional parameter representing the configuration options passed to the plugin. The structure and content of the options object depend on how the plugin is configured in your Babel setup. These options allow you to customize the behaviour of the plugin based on your specific requirements.
Notice that our plugin function returns an object. The object is expected to match this shape. visitor
is the most important property in the returned object, the name is derived from the Visitor Pattern โ a software design pattern. visitor
is used by Babel to specify the part of the AST to be targeted for modification. With this pattern, we don't have to manually write a tree traversal to walk through the generated AST, we simply specify the node type and the transformation to be applied.
visitor
can consist of keys that correspond to specific node types in the AST, and the values associated with these keys are functions that define the behaviour of the plugin when it encounters nodes of those types. These functions are called with a path
argument representing the current node in the AST, and they determine what modifications, if any, should be made to the code.
For example, if we could apply the same idea to a house so we can close the door to all the rooms, it would look something like this:
visitor: {
Room(path){
path.node.doorMode = "closed"
}
}
In our case, we're visiting every CallExpression
and removing it if meets some conditions.
Running Babel
Now that we've written a plugin, let's run Babel to the effect of the plugin.
First, let us populate the files in our src
directory with some code that contains console.log
. Here's an example:
const name = "John W. Smith"
console.log(name)
Babel is installed in the root of our project, so we can run Babel using the command babel src --out-dir dist
. For simplicity, I've added it to the build
script in package.json
:
pnpm build
Now, we should have all files in the src
directory in dist
with all console.log
calls removed. yay!
Keep in mind that this is an example plugin and may not handle some edge cases correctly so you may not use it on your codebase. You should use babel-plugin-transform-remove-console instead.
What other plugins can you write?
Babel has all you need to move from writing a simple plugin that removes console.log
to writing much more complex plugins. Most times you may not need to write your own plugin since Babel has a huge plugin library, both official and unofficial.
Babel plugins are everywhere. From being used to remove unwanted exports from files in Gatsby to being used to disallow users from doing re-exports in Nextjs.
For more information about building Babel plugins, check the Kent's Babel Handbook or this awesome Babel handbook by Jamie.
AST in Linters - ESLint
Linters are indispensable tools for upholding coding standards across your codebase. Whether you aim to eradicate semicolons, champion tabs over spaces, or delve into more intricate scenarios, Linters have got you covered.
They empower you to maintain code quality, adhere to best practices, and ensure consistency throughout your projects. From simple conventions to much more intricate ones, Linters play a crucial role in enhancing your codebase's integrity.
Fun fact! There has always been a debate about whether to use tabs or spaces online. While Linters won't necessarily settle the debate, they can help teams enforce the standard they eventually agree on โ hopefully, they get to agree :)
ESLint is a great Linter! It has a plugin system where each plugin can define a set of rules. Behind the scenes, each rule operates on your code's AST to flag possible violations. ESLint also allows you to configure these rules to specify if violating them leads to an error or warning which we can specify in an ESLint config.
ESLint config
ESLint relies on a configuration file that allows us to define plugins and rules to be used and their configuration. We can also use different file formats like YAML
and JSON
. Here, we'll use the .js
format. We have a .eslintrc.js
file at the root of our project and it looks like this:
๐ .eslintrc.js
module.exports = {
env: {
node: true,
browser: true,
},
parserOptions: {
ecmaVersion: "latest",
sourceType: "module",
},
plugins: ["emojify-array"],
rules: {
"emojify-array/padded-emoji-array": [
"error",
{
emoji: "๐ฅ๐ฅ",
},
],
},
};
Here, we have a minimal configuration, just enough to get things working. Each key serves a specific purpose. Let's see what each one does:
env
The env
key specifies the environments where your JavaScript code will run. In this configuration:
-
"node: true"
indicates that Node.js specific global variables are enabled. -
"browser: true"
indicates that browser-specific global variables are enabled.
parserOptions
The parserOptions
key is used to configure options related to JavaScript parsing and ECMAScript version. In this configuration:
-
ecmaVersion: "latest"
specifies that the latest ECMAScript version should be used, allowing you to use the most recent JavaScript features. -
sourceType: "module"
indicates that the code is in ECMAScript modules (ES6 modules).
plugins
The plugins
key lists the ESLint plugins you want to use. Plugins provide additional rules and features. Here, we're using the plugin "emojify-array", which we'll write in the next section.
rules
The rules
key defines ESLint rules and their configurations(severity level and options). ESLint plugins can have multiple rules, so we're picking the padded-emoji-array
rule from our plugin and passing a severity level and some options.
Writing an ESLint plugin
Let's take our knowledge about Abstract Syntax Trees one step higher by writing an ESLint plugin. This time, we're writing something really fun:)
Our plugin will define a rule that forces arrays to start and end with an emoji. We'll also make the emoji configurable so that anyone using our plugin can configure the emoji to be used.
The plugin is in the plugins/eslint-plugin-emojify-array
directory. In the plugin's entry point (/lib/index.js
), we can define all the rules that our plugin exposes in a rule
object.
module.exports = {
rules: {
"padded-emoji-array": require("./rules/padded-emoji-array"),
},
};
Next, we'll create the rule module referenced above in rules/padded-emoji-array.js
. This rule is responsible for ensuring that arrays start and end with an emoji, and it provides an optional configuration to customize the emoji used.
The rule module must export an object with a create
function, You can also define the rule's metadata and schema with meta
and schema
respectively:
create
Function: Defines the rule's behavior.meta
Object: Provides metadata, including description and recommendations.schema
Object: Configures and validates options for the rule.
Our rule module is defined below:
๐ plugins/eslint-plugin-emojify-array/lib/rules/padded-emoji-array.js
module.exports = {
meta: {
type: null,
docs: {
description: "Make sure arrays start and end with an emoji",
recommended: false,
url: null,
},
fixable: "code",
},
schema: [
{
type: "object",
properties: {
emoji: {
type: "string",
},
},
},
],
create(context) {
const [optionsObject] = context.options;
const emoji = optionsObject.emoji || "๐ฅ";
function containsEmoji(value) {
const emojiPattern = /[\p{Emoji}]/gu;
return emojiPattern.test(value);
}
return {
ArrayExpression(node) {
const startAndEndContainsEmoji =
containsEmoji(node.elements[0].value) &&
containsEmoji(node.elements[node.elements.length - 1].value);
if (!startAndEndContainsEmoji) {
context.report({
node,
message: "Array should start and end with an emoji",
fix(fixer) {
const firstElement = node.elements[0];
const lastElement = node.elements[node.elements.length - 1];
const fixes = [
fixer.insertTextBefore(firstElement, `"${emoji}", `),
fixer.insertTextAfter(lastElement, `, "${emoji}"`),
];
return fixes;
},
});
}
},
};
},
}
Let's go straight to the key part of this module, the create
function! The following steps are performed:
We access the provided options to configure the emoji that should be used (or use a default emoji, "๐ฅ" if none is provided).
A function named
containsEmoji
checks if the first and last value of the array contains an emoji using a regular expression pattern.The
ArrayExpression
node type is targeted in the code's AST. We check whether the first and last elements of the array contain emojis. If they don't, ESLint reports an issue.We have a
fix
function so that ESLint can automatically fix the issue by adding the emojis to the array when ESLint is run with the--fix
flag.
Harnessing the power of Abstract Syntax Trees, ESLint can serve as your watchful guardian to help dectect potential issues within your codebase. It caters to a spectrum of use cases, ranging from straightforward checks like the one we've just explored to more practical and intricate scenarios, such as prohibiting client components from utilizing asynchronous functions in Next.js or enforcing the rules of Hooks in a ReactJS project using eslint-plugin-react-hooks.
Running ESLint
In this section, we'll run ESLint againt our code in two ways. First, we want to list of potential errors and warning and next, we want to fix them. I've defined two scripts in package.json
, lint
and lint:fix
you should check package.json
to see the actual command behind the scripts.
lint
- Calls ESLint on our src
directory which then shows us all possible warnings and errors in our files.
lint:fix
- Fix errors using our src
directory using the fix
function defined in our ESLint rule.
In one of the files in the src
directory, I'd add an array without an emoji at the start and end then I'll run the lint
script. The command should exit with an exit code of 1
after listing all errors and warnings:
Since our Eslint rule defines a fix
function, we can run lint:fix
to fix the error above:
Eslint Editor Extensions
Aside from the standard output you get when you run ESLint, you can also take the experience further by installing the Eslint Extention on your editor. With the extension installed, you'd get errors, warnings and suggestions right in the editor. In our case, we should get a warning like so:
Conclusion
In conclusion, Abstract Syntax Trees (AST) may initially seem daunting, but it becomes more approachable once you grasp the basics. This post aimed to provide a gentle introduction to AST while exploring its practical applications in JavaScript
Whether you are a newcomer to the concept of AST or already have some familiarity with it, I hope this article has shed light on its significance and use cases. While our examples mainly revolved around JavaScript and related tooling, it's worth noting that AST concepts can be applied to various programming languages.
Amongst many other practical utilizations of ASTs, we've focused on two common applications: Transpilers and Linters.
The world of ASTs is vast, and while we touched on a few applications, there's much more to explore. You can expand your knowledge of what we've learned so far by figuring out what problems you can solve using these tools.
Thanks for reading!
Other Resources
- babel plugin handbook by Kent C. Dodds
- The super tiny compiler by Jamie
Top comments (4)
I just worked on an ESLint plugin, but I think espree's AST is not very suitable for linting.
I was able to complete the work using a few tricks, but it took me much longer than necessary.
I also implemented AST directly to create a VSCode extension, and AST specialized for needs is definitely more convenient.
Curious to hear about the areas where ESpree falls short for you. As far as I know, it's been ESLint's default parser and I've not seen lots of complains about it in the wild.
In JavaScript runtime, meaningless parentheses are not included in the AST. However, this was a significant inconvenience when creating a rule that manages parentheses in pairs.
Additionally, functions inside parentheses do not support features such as beforeComments, which is quite strange considering the structure of the AST.
The issue with
beforeComments
sounds like an Eslint-specific issue. It's most likely not directly related to ESpree. Opening an issue on their repo may be a way to go.Seem like most parsers ignore the extra parenthesis anyway. I tried
@babel/parser
andespree
on this code block:And it went from
ExpressionStatement
toFunctionExpression
, ignoring the extras, so I'd assume that most parsers do the same.