About half year ago, I try to be creative and figuring out how to find what my code change could affect in a large code base, to avoid some tedious problems. And I wrote an article of idea(or nonsense🙈) about it.
I naively thought the open souce world has all the Lego pieces I need, with a clear manual in my head, I could build the complicated stuffs like space ship right away.
But, there's aways a "but". The reality is, NO. I found some ready to use pieces, like @babel/parser
, @babel/traverse
, enhanced-resolve
by webpack, and something similar but couldn't fulfill my need. One of them is built by a Russian guy, and it works well for solving to file level. Others are webpack plugins depending on webpack’s stat.json, which I chose not to rely on, as it could limit the possibilities, and it's way of resolving dependencies starting from an entry file does not fit for the idea.
So I did my own, again, and here’s how I make the packages in this ast-lab repo.
The plan
After investigation, I made my 1st phase plan to cover ES modules, it could be summarized into 2 parts:
- Use AST to find imports, exports and the relationships of a file's root declarations (variable, function, class, etc.).
- Get the data from all ES files under one path(usually repo source path), and re-organise them into a dependency map, then figure out the impact from a declartion entry.
The AST of imports
In an ES file, other than dynamic import, keyword import and export always appear in the root of the file. Here’s an example:
Thanks master Lihau create a useful tool AST explorer that uses
@babel/parser
as AST parser, my little experiment cannot work without it.
They have easy to identify type definitions, like:
-
export default a
is ExportDefaultDeclaration -
export function a {}
orexport a
are ExportNamedDeclaration -
export * as a
is ExportAllDeclaration -
import a from ‘module_path’
is ImportDeclaration - dynamic import
const importedModule = import(‘module_path’)
the import callee is Import type, the expression is just CallExpression like other function calls.
Each type may have slightly different structure, and multiple situations to handle with. Take import as example:
Though module exports has a name, when they are imported in another file, they could be used via alias, which means alias name is also required.
There are 3 types of aliases in importDeclaration:
-
import a
declared a as alias of module default export, marked as ImportDefaultSpecifier -
import * as a
declared a as alias of all things a module exported, marked as ImportNamespaceSpecifier -
import { a as b }
declared "b" as alias of a module's named export "a", same as the one without alias, it is marked as ImportSpecifier
All of them own a local.name
property to store the alias name, and for ImportSpecifier, you may get the original name from imported.name
.
Module path is stored in node’s source.value
property. But the file represented by this path could be uncertain. It could be:
- relative
js/jsx/ts/tsx/json
or other resolveable path (configured in your packing tools) - npm module from any parent paths’ node_module folder, or other specified paths
- npm module symlinked from somewhere else, like mono-repo
- fake npm module use tool’s alias feature
Now I understand why Node creator regrets about making this design, it does cause troubles.
I found Webpack’s enhanced-resolver
to help. By default it uses Node’s file system module(fs) to verify whether a file exists in user's file system. It is able to resolve customised module path, symlink, alias by simple configurations, and it will return the resolved absolute path as result.
Problem solved? Wait, there are some special cases to be aware of :
-
export * from ‘mymodule’
This syntax could be turned on by a babel plugin. To know what it actually exports, I will need to parse another file, which could make things comlicated, and possibly have circular relations. To avoid unpredictable problems, I chose to mark it as extension, and solve it later after I've resolved all the data in the project. -
export { a, default as b } from 'mymodule'
This line is quite tricky. It includes export, import information at the same time, anda
,b
actually could not be used anywhere else in the file. - Dynamic import I loved it so much, but the declaration it imported may exist in a local scope.
Exports, and other root imports could be handled in a similar way, with the right types and properties.
Use @babel/traverse to traverse AST
AST is a tree data structure. To work with tree, we usually traverse nodes starting from the root node with DFS(depth first search) or BFS(breadth first search).
I used @babel/traverse
package for AST traversing with DFS. It accepts a Visitor object, which takes a callback function or enter
, exit
callback functions as value, and types of Nodes as key. E.g.:
traverse(ast, {
Node(path) {
console.log(`I just visited a ${path.node.type} node!`);
}
});
The best part of it is it provides various useful alias to reduce the work. E.g., Function
type represents either ArrowFunction
or FunctionDeclaration
. All the visitors "matches" this node type would be executed, from top to bottom.
It also provides useful information, utils in a path
object. You may get the path object from a Visitor function:
traverse(ast, {
Scopable(path) {
if (path.isFunction()) {
console.log(`I just visited a function node through Scopable!`);
}
},
Function() {
console.log(`I just visited a function node through Function!`);
}
});
As a result, I extracted the import and export data in the following way:
const imports = [];
traverse(ast, {
ImportDeclaration({ node }) {
const modulePath = node.source.value;
node.specifiers.forEach(specifier => {
const alias = specifier.local.name;
let name;
switch (specifier.type) {
case 'ImportSpecifier':
name = specifier.imported.name;
break;
case 'ImportDefaultSpecifier':
name = 'default';
break;
case 'ImportNamespaceSpecifier':
name = '*';
break;
}
if (name) {
imports.push({
alias,
name,
source: modulePath,
});
}
});
}
});
Figuring out relationships within a file
A file may contain as many variables, functions, classes as they want. And often depend on each other, especially in functional programming. Which means, knowing what affects a file is not enough, not even useful in some cases, I got to be more precise, limiting down the coverage to declarations instead of files.
It requires the program to understand what does each declaration rely on. A word comes immediately to my mind - scope.
A declared name could be declared again, and in a context the local declared would be used instead of it's parent scopes. Thus, to figure out the right "relationships", I will need to resolve scopes.
Remember that I mentioned @babel/traverse
visits AST with DFS, this makes it perfectly fit for using stacks.
Since the declaration could happen anywhere in the scope, I chose to store all local declared declarations , and the used ones in the stack while traverse()
entering a node, and remove those local declared from the used set before exiting the scope. Then once I exit to the file scope, I know what does this root declaration depend on.
When to create the stack, and push the declarations? This question made me realize an interesting “in-consistent” design in Javascript that I never questioned about.
Here’s a simple function:
const param = ’I am the param’;
function func(param) {
return `You passed a param: ${param}.`;
}
func(‘suprise :p’);
It’s clear that the whole function block creates a scope, and in this scope it declared a new variable param
overwrites the param in its context without any variable declaration keyword.
Same happens to catch:
try {
doSomething();
} catch (err) {
console.error(`You know what? You have a new variable “err” created!`);
}
But how about other scopables, like for loop:
for (let i = 0; i < length; i++) {
console.log(`Loop counter:${i}`);
}
With the same brackets, but to declare a new param, you got to use keyword let
, var
, or const
. This is also why in AST the function type is FunctionDeclaration, catch is CatchClause, but for, if, switch, while, function calls, try... are called Statement.
This solution works for most of the cases except:
var
According to the ECMA specification, it stores the variable names in a special list called VarNames in the global context, which causes the famous hoisting issue. And it means that I have to handle it differently.eval
It's a disaster. Techniqually it's just a function accepting string as parameter, and could execute the string in runtime.
So in AST those code are still string. To understand what's going on in the string, I got to parse it to AST. What if there's another eval in this string?
And it could be called "indirectly". According to ECMA, if it's called indirectly, will be executed in the global context.
And then for a simple file like:
import { a } from 'fileA';
const b = () => a;
export default b;
I could get a map like following:
{
imports: {
a: { source: '/repo/fileA.js', name: 'a', alias: 'a' }
},
exports: ['default'],
relations: {
default: ['b'],
b: ['a']
}
}
Get the dependency map
An ES module only knows what declaration changed could affect itself, but no idea who is relying on its exports. Which means, to understand the impact of changing a function/variable/class of a module, searching through all the modules in the file system may find them all, but unnecessary for most of time, because we have practiced to keep files having dependency relationships under one folder/repository.
Let's say I have a simple repo like this:
/repo
- a.js
- b.js
- c.js
And the information I get from the steps above:
// file a.js relies on default of b.js
// it has 2 private declarations
// and exports default
const a = {
imports: {
b: [{ source: '/b.js', name: 'default', alias: 'b' }]
},
exports: ['default'],
relations: {
default: ['aPrivateFunc'],
aPrivateFunc: ['b', 'aPrivateConst'],
aPrivateConst: []
}
};
// file b.js relies on all stuff exported in c.js
// it exports default, no privates
const b = {
imports: { c: [{ source: '/c.js', name: '*', alias: 'c' }] },
exports: ['default'],
relations: { default: ['c'] }
};
// file c.js does not rely on any other files
// it exports c function/variable/class
const c = {
imports: {},
exports: ['c'],
relations: { c: [] }
};
It is clearer once draw this relationship out:
It's a Directed Graph in computer science. So firstly, we could reorganize the data above into a map could represent graph nodes and directions like this:
const relationGraph = {
'/c.js': {
'c': {
file: '/a.js',
name: 'aPrivateFunc',
affects: [relationGraph['/b.js'].default]
}
},
'/b.js': {
default: {
file: '/b.js',
name: 'default',
affects: [relationGraph['/a.js'].aPrivateFunc]
}
},
'/a.js': {
aPrivateFunc: {
file: '/a.js',
name: 'aPrivateFunc',
affects: [relationGraph['/a.js'].default]
},
default: { file: '/a.js', name: 'default', affects: [] }
}
};
And then I could do graph traversal from the declaration to check with as following:
// check the impact of c in c.js
const entry = { file: '/c.js', declaration: 'c' };
const { file, declaration } = entry;
const result = {};
const traversed = new Set();
let traversalQueue = relationGraph[file][declaration].affects;
let node;
while(node = traversalQueue.shift()) {
// Prevent circulation
if (traversed.has(node)) continue;
const { file, name, affects } = node;
if (!result[file]) {
// Use set to dedupe
result[file] = new Set([ name ]);
} else {
result[file].add(name);
}
traversed.add(node);
traversalQueue = traversalQueue.concat(relationGraph[file][name].affects);
}
return result;
The result of the example is:
{
'b.js': Set(1) {'default'}
'a.js': Set(2) {'aPrivateFunc', 'default'}
}
Top comments (3)
Good article, thanks
Have you tried running this code code on larger (~300 files) repos?
Great article!
I have a question please - is it possible to get a value of a variable in a program according to a specific line number? (when traversing the AST?)
Thank you
can for the variables without reassigning