Rafael Nunes

Posted on Jun 8, 2020 • Edited on Jun 14, 2020

Loading a directory as a tree structure in Node

#node #tree #javascript #fs

Hey you all 👋! This article is the first follow up on JSCity series. If you haven't read it yet, feel free to check it out in the post bellow:

JSCity visualizing JavaScript code

Rafael Nunes ・ May 31 '20

#react #reactthreefiber #threejs #visualization

In this post we will explore:

Loading directories using node APIs.
Building a tree structure that represents the directories loaded.
Test driven development to define the expectations around the output before implementing the code.

While in the MVP of JSCity all the processing happens in the browser (file upload, code parsing, visualization), for the second version I'm aiming to create modular packages, with the intention of increasing re-usability of these modules for future versions.

In this post, we'll be building the module that loads a local directory into a well-defined structure. The objective is to be able to export it as function of a package later.

I took this opportunity to use Typescript, but you can achieve the same goal with vanilla JavaScript.

Defining the structure

Directories in operating systems are displayed and represented in a hierarchical tree structure. The tree data structure is widely used to represent and traverse data efficiently.

The elements in a tree are called nodes and edges. A node contains some piece information, in our case information about the file or directory. In the following image, the arrows between the nodes are what we call edges.

Nodes without children are often called leaf nodes and the highest node in a tree is called the root node.

There are multiple well known algorithms to traverse a tree. These will facilitate the process of building the city. So how can we make that directory tree in node?

The node file system API allows us to read any directory with fs.readdirSync, for example. It returns the array of strings representing the subfolders and files of that folder.



console.log(fs.readdirSync(initialPath));
// [ 'example.js', 'utils' ]

We can then leverage this to build our own tree structure!

I did find node-directory-tree package that does that for us. Despite that, I believe it is a good exercise to build the tree by ourselves.

To represent a node I decided to create the TreeNode class. The properties of a TreeNode are the path in the file system and an array of TreeNode (representing the sub-directories and files). When TreeNode is a file the children array will remain empty just like the leaf nodes we learned before.



class TreeNode {
  public path: string;
  public children: Array<TreeNode>;

  constructor(path: string) {
    this.path = path;
    this.children = [];
  }
}

That's a good enough first version of our tree nodes. Let's keep going.

Defining the root node

Now let's create some tests!

I will use a folder called fixtures as the input of our tests. That folder contains just some example files.

So given an initial path, we want it to return the root node representing that directory. We want to assert that the root contains the expected properties.



describe('buildTree', () => {
  const initialPath = path.join(__dirname, 'fixtures');

  it('should return root node', () => {
    const rootNode = buildTree(initialPath);
    expect(rootNode).not.toBeNull();
    expect(rootNode).toHaveProperty('path', initialPath);
    expect(rootNode).toHaveProperty('children');
  });
});

For now this test will fail, but that's expected. We still need to build the function mentioned in the code above.

The buildTree function receives a path as input and returns the tree structure for that directory.



function buildTree(rootPath: string) {
  return new TreeNode(rootPath);
}

That is enough to get our first test to pass ✅🎉

Reading the folder and its children

We can see that the buildTree function does not really build the full tree structure yet. That's our next step. The fixtures folder used by our test looks like the following.



fixtures
├── example.js
└── utils
   └── sum.js

The output of the function should represent the following tree.

We can assert that the root, in our case fixtures, has two children: utils folder and example.js file.



it('should return root node with its exact 2 children', () => {
  const rootNode = buildTree(initialPath);
  expect(rootNode.children.length).toEqual(2);

  const childrenPath = rootNode.children.map(child => child.path);
  expect(childrenPath.includes(`${initialPath}/utils`)).toEqual(true);
  expect(childrenPath.includes(`${initialPath}/example.js`)).toEqual(true);
});

We can also assert that utils folder has the sum.js file inside of it.



it('should add utils node with its children inside root', () => {
  const rootNode = buildTree(initialPath);
  const utils = rootNode.children.find(
    child => child.path === `${initialPath}/utils`
  );

  expect(utils).not.toBeNull();
  expect(utils?.children.length).toEqual(1);
  expect(utils?.children[0]?.path).toEqual(`${initialPath}/utils/sum.js`);
});

And of course, they are going to fail at this point.

Building the tree

We now need to extend buildTree so it builds the entire tree, not only the root node.

The Depth-first search aka DFS algorithm is a well-known technique to traverse a tree. In the iterative DFS algorithm we will need to use a Stack, which has the first-in-last-out (FILO) approach.

With DFS, our step by step looks like this:

We first add the root to the stack.
We loop while the stack is not empty (that means we still have nodes to visit).
We pop an item from the stack to be our new currentNode.
We use fs.readdirSync(currentNode.path) to get the node's sub-directories and files.
For each one of them, we create a node and add it to the currentNode.children array. If it's a directory we also push it in the stack to visit it later.

In the end, we've visited all the directories, files and sub-directories and built our tree. The implementation looks like the this:



function buildTree(rootPath: string) {
  const root = new TreeNode(rootPath);

  const stack = [root];

  while (stack.length) {
    const currentNode = stack.pop();

    if (currentNode) {
      const children = fs.readdirSync(currentNode.path);

      for (let child of children) {
        const childPath = `${currentNode.path}/${child}`;
        const childNode = new TreeNode(childPath);
        currentNode.children.push(childNode);

        if (fs.statSync(childNode.path).isDirectory()) {
          stack.push(childNode);
        }
      }
    }
  }

  return root;
}

We used fs.readdirSync as before to discover the children of a folder. We also used fs.statSync to read the stats of the current path, it allows us to ask whether or not that child I'm looking at is a directory.

Green tests, yay 🙌, we have solved the problem of building the tree structure! When we log our root we are able to see its properties.



TreeNode {

  path: 'test/fixtures',

  children: [

    TreeNode {

      path: 'test/fixtures/example.js',

      children: []

    },

    TreeNode {

      path: 'test/fixtures/utils',

      children: [Array]

    }

  ]

}

What's next?

We got the desired output, but there is more we could do. For example, we can add a filter to exclude files of certain extension from our tree. I'll do that since I want to visualize .js files only.

There is also the possibility of adding properties like type, extension, size (...) to our TreeNode.

The next chapter will leverage this newly created structure to parse every JavaScript file in it and compute metrics about the code!

Was this post useful to you? I'm always keen to hear suggestions and comments. 👋

Top comments (6)

John • Sep 23 '20

Very useful.

This is what I need for building a web-based operating system-like file system.

Thanks for your work.

Rafael Nunes • Sep 23 '20

Thanks mate! I'm glad that was useful !

Himanshu Jha • Dec 26 '20

Hello Sir, thanks for this wonderful article. Can you please throw some light on how we can add a filter to exclude files of certain extension from our tree?

Rafael Nunes • Dec 31 '20

Hello! Sorry for the delay.
I believe you could use another Node API to solve that. The path API allows you to find the extension of a file given its path.

import path from 'path';

function buildTree(rootPath: string) {
  const BLOCKED_EXTENSIONS = ['.txt']
  const root = new TreeNode(rootPath);

  const stack = [root];

  while (stack.length) {
    const currentNode = stack.pop();

    if (currentNode) {
      const children = fs.readdirSync(currentNode.path);

      for (let child of children) {
        const childPath = `${currentNode.path}/${child}`;
        const childNode = new TreeNode(childPath);

        const isDirectory = fs.statSync(childNode.path).isDirectory();
        const extension = path.extname(childNode.path);
        const isFileBockled = BLOCKED_EXTENSIONS.includes(extension);

        if (!isFileBockled) {
          currentNode.children.push(childNode);
        }

        if (isDirectory && !isFileBockled) {
          stack.push(childNode);
        }
      }
    }
  }

  return root;
}

We could do something like the snippet above, you can have a const with the blocked extensions or you could have a const with the only ones you allow. Having that combined with path.extname we can filter out any file based on their extension.

Adrián UB • Feb 10 '21

Good article

Can you clarify how we can add a parameter to only search a specific number of subfolders?

Rafael Nunes • Feb 11 '21

Thanks!

Do you mean a limited number of subfolders as in the depth, on how deep you go down in the folder structure?
Or is it more like the total number of subfolders explored?

I believe for both ways we would need to work in our code to solve this problem.

The file system methods like readdirSync do not accept a limit number as far as I know. So to implement it we could probably define a constant called LIMIT and keep track of the depth of subfolders or the count of subfolders depending on your case.

        const isDirectory = fs.statSync(childNode.path).isDirectory();
        const isUnderLimit = depth <= LIMIT;

        if (isDirectory && isUnderLimit) {
          stack.push(childNode);
        }

It could be something like the code above to prevent adding more nodes after a certain threshold. Of course, we would need to increment the depth or count every time we explore a new subfolder.

I hope that helps 😄