DEV Community

Zack Jackson
Zack Jackson

Posted on

Bundler Design: A Comprehensive Exploration of Modern Bundling Technologies and Techniques

Lynx, our company’s (ByteDance) self-developed cross-platform framework (similar to React Native), has a compilation tool that diverges significantly from traditional web compilation tool chains. Unlike conventional tools, it doesn’t support dynamic styles or scripts, essentially eliminating bundleless and code splitting. The module system relies on JSON instead of JS, and there’s no browsing server environment.

Given the need for real-time compilation on the web (build system), dynamic compilation (WebIDE), server-side compilation and delivery, and multi-version switching, we were compelled to create a universal bundler. This bundler had to be flexible, customizable, and operable both locally and in the browser. During its development, we faced several challenges but ultimately succeeded in creating a new universal bundler based on esbuild, resolving most of the issues we encountered.

Update: Since the writing of this article, we have released rspack. The information however is valuable, while not necessarily up to date. I have pulled translated and republished it in english, because I believe that its a great in depth read

*Note: The original article is in Chinese, authored by @hardfist

What is a Bundler?

A bundler’s task is to package a series of codes organized by modules into one or more files. Common bundlers include webpack, rollup, esbuild, etc. While most of these refer to the JS-based module system, they don’t exclude other organizational forms (e.g., wasm, usingComponents of JSON for applets, import of CSS and HTML). The generated files might also vary, such as multiple JS files created by code splitting or different JS, CSS, HTML files.

Most bundlers share similar core working principles, but they may emphasize specific functions. Here’s a brief overview:

  • Webpack: Known for supporting web development, especially with built-in HMR support. It has a robust plugin system and excellent compatibility with various module systems (amd, cjs, umd, esm, etc.). However, this compatibility can be a double-edged sword, leading to webpack-oriented programming. Its rich ecosystem is a plus, but it’s criticized for not being clean enough, lacking support for esm format generation, and being unsuitable for library development.
  • Rollup: Focused on library development, it’s based on the ESM module system and offers strong support for tree shaking. The product is clean and supports multiple output formats, making it suitable for library development. However, it has some drawbacks, such as needing to rely on plugins for cjs support, requiring more hacks, lacking HMR support, and depending on various plugins for application development.
  • Esbuild: Praised for its performance, esbuild offers built-in support for CSS, images, React, TypeScript, etc., and boasts an incredibly fast compilation speed (100 times faster than webpack and rollup). Its main disadvantage is a relatively simple plugin system and an ecosystem that’s not as mature as webpack and rollup.
  • Rspack: Webpacks API rewritten in rust, but this was only after the writing of this post.

How bundlers work

The implementation of bundler is very similar to the implementation of most compilers, and it also adopts a three-stage design. We can compare it

  • llvm: Compile each language into LLVM IR through the front end of the compiler, then perform various optimizations based on LLVM IR, and then generate different cpu instruction set codes based on the optimized LLVM IR according to different processor architectures.
  • bundler: Compile each module into a module graph first, then perform optimizations such as tree shaking & code spliting & minify based on the module graph, and finally generate js code in different formats from the optimized module graph according to the specified format.

Comparison of LLVM and Bundler

The similarities between traditional LLVM and bundlers allow many compilation optimization strategies to be applied in bundlers. Esbuild exemplifies this approach taken to the extreme. To understand how a bundler works, we’ll use rollup as an example, given its streamlined functions and structure. The bundle process of rollup is divided into two steps: rollup and generate, corresponding to the bundler front-end and back-end respectively.

Image description

Example: Rollup Process

Generating the Module Graph

//src/main.js
import lib from './lib';
console.log('lib:', lib);
//src/lib.js
const answer = 42;
export default answer;
Enter fullscreen mode Exit fullscreen mode

Generating the Module Graph

const rollup = require('rollup');
const util = require('util');
async function main() {
  const bundle = await rollup.rollup({
    input: ['./src/index.js'],
  });
  console.log(util.inspect(bundle.cache.modules, { colors: true, depth: null }));
}
main();
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    code: 'const answer = 42;\nexport default answer;\n',
    ast: xxx,
    dependencies: [],
    id: 'Users/admin/github/neo/examples/rollup-demo/src/lib.js'
    ...
  },
  {
    ast: xxx,
    code: 'import lib from './lib';\n\nconsole.log('lib:', lib);\n',
    dependencies: [ '/Users/admin/github/neo/examples/rollup-demo/src/lib.js' ]
    id: '/Users/admin/github/neo/examples/rollup-demo/src/index.js',
    ...
  }
]
Enter fullscreen mode Exit fullscreen mode

This output represents the parsed AST structure of each module included in the generated product, as well as the dependencies between modules. After building the module graph, rollup can continue to build products based on the module graph according to the user’s configuration.

Generating the Content

const result = await bundle.generate({
  format: 'cjs',
});
console.log('result:', result);
Enter fullscreen mode Exit fullscreen mode

Generated Content

exports: [],
facadeModuleId: '/Users/admin/github/neo/examples/rollup-demo/src/index.js',
isDynamicEntry: false,
isEntry: true,
type: 'chunk',
code: "'use strict';\n\nconst answer = 42;\n\nconsole.log('lib:', answer);\n",
dynamicImports: [],
fileName: 'index.js',
Enter fullscreen mode Exit fullscreen mode

Plugin System:

Most bundlers offer a plug-in system to allow users to customize the bundler’s logic. For example, rollup’s plug-in system is divided into input and output plug-ins. The input plug-in corresponds to generating the Module Graph, while the output plug-in corresponds to generating the product according to the Module Graph. Here, we’ll focus on the input plug-in, the core of the bundler plug-in system, using esbuild as an example.

Core Hooks of Input Plug-in Systems

  1. onResolve: Determines the actual module address based on a module ID.
  2. onLoad: Loads module content according to the module address.

Esbuild and rollup differ from webpack in that esbuild only provides load hooks, and you can perform transform work within these hooks. Rollup additionally provides transform hooks, while webpack delegates transform work to the loader.

Esbuild Plug-in System

The esbuild plug-in system’s most distinctive feature is support for virtual modules. Here are some examples to demonstrate the role of the plugin:

Example 1: Loader

A common requirement in webpack is to use various loaders to process non-JS resources. Here’s how to use the esbuild plugin to implement a simple less-loader:

export const less = (): Plugin => {
  return {
    name: 'less',
    setup(build) {
      build.onLoad({ filter: /.less$/ }, async (args) => {
        const content = await fs.promises.readFile(args.path);
        const result = await render(content.toString());
        return {
          contents: result.css,
          loader: 'css',
        };
      });
    },
  };
};
Enter fullscreen mode Exit fullscreen mode

Example 2: Sourcemap, Cache, and Error Handling

A more mature plug-in must consider sourcemap mapping after transformation, custom cache to reduce repeated overhead, and error handling. Here’s an example using svelte:

let sveltePlugin = {
  name: 'svelte',
  setup(build) {
    let svelte = require('svelte/compiler')
    let path = require('path')
    let fs = require('fs')
    let cache = new LRUCache(); // 使用一个LRUcache来避免watch过程中内存一直上涨
    build.onLoad({ filter: /.svelte$/ }, async (args) => {
      let value = cache.get(args.path); // 使用path作为key
      let input = await fs.promises.readFile(args.path, 'utf8');
      if(value && value.input === input){
         return value // 缓存命中,跳过后续transform逻辑,节省性能
      }
      // This converts a message in Svelte's format to esbuild's format
      let convertMessage = ({ message, start, end }) => {
        let location
        if (start && end) {
          let lineText = source.split(/\r\n|\r|\n/g)[start.line - 1]
          let lineEnd = start.line === end.line ? end.column : lineText.length
          location = {
            file: filename,
            line: start.line,
            column: start.column,
            length: lineEnd - start.column,
            lineText,
          }
        }
        return { text: message, location }
      }

      // Load the file from the file system
      let source = await fs.promises.readFile(args.path, 'utf8')
      let filename = path.relative(process.cwd(), args.path)

      // Convert Svelte syntax to JavaScript
      try {
        let { js, warnings } = svelte.compile(source, { filename })
        let contents = js.code + `//# sourceMappingURL=` + js.map.toUrl() // 返回sourcemap,esbuild会自动将整个链路的sourcemap进行merge
        return { contents, warnings: warnings.map(convertMessage) } // 将warning和errors上报给esbuild,经esbuild再上报给业务方
      } catch (e) {
        return { errors: [convertMessage(e)] }
      }
    })
  }
}

require('esbuild').build({
  entryPoints: ['app.js'],
  bundle: true,
  outfile: 'out.js',
  plugins: [sveltePlugin],
}).catch(() => process.exit(1))
Enter fullscreen mode Exit fullscreen mode

Virtual Module

The esbuild plugin system’s support for virtual modules represents a significant advancement over rollup. Generally, a bundler needs to process two types of modules: those whose paths correspond to real disk file paths, and those whose paths don’t correspond to real ones. The latter, known as virtual modules, generate content based on the path form. Virtual modules have a wide range of applications.

Example: Glob Import

A common scenario is when developing a REPL like rollupjs.org/repl/, where code samples are loaded into memfs and then built based on memfs in the browser. If there are many files involved, importing them one by one can be cumbersome. Support for glob import can simplify this process.

File Structure


examples
 index.html
 index.tsx
 index.css
Enter fullscreen mode Exit fullscreen mode

Code

import examples from ‘glob:./examples/**/*’;
import {vol} from ‘memfs’;
vol.fromJson(examples,’/’); // Mount the local examples directory to memfs
Enter fullscreen mode Exit fullscreen mode

This functionality can be achieved by tools like vite or babel-plugin-macro. Here’s how esbuild can implement it:

  1. Analyze Custom Path in onResolve: Pass the metadata to onLoad through pluginData and path, and customize a namespace to prevent normal file load logic from loading the returned path.
  2. Get Metadata in onLoad: Customize the logic of loading and generating data according to the metadata, and hand over the generated content to esbuild’s built-in loader for processing.

Code Implementation


const globReg = /^glob:/;
export const pluginGlob = (): Plugin => {
  return {
    name: 'glob',
    setup(build) {
      build.onResolve({ filter: globReg }, (args) => {
        return {
          path: path.resolve(args.resolveDir, args.path.replace(globReg, '')),
          namespace: 'glob',
          pluginData: {
            resolveDir: args.resolveDir,
          },
        };
      });
      build.onLoad({ filter: /.*/, namespace: 'glob' }, async (args) => {
        const matchPath: string[] = await new Promise((resolve, reject) => {
          glob(
            args.path,
            {
              cwd: args.pluginData.resolveDir,
            },
            (err, data) => {
              if (err) {
                reject(err);
              } else {
                resolve(data);
              }
            }
          );
        });
        const result: Record<string, string> = {};
        await Promise.all(
          matchPath.map(async (x) => {
            const contents = await fs.promises.readFile(x);
            result[path.basename(x)] = contents.toString();
          })
        );
        return {
          contents: JSON.stringify(result),
          loader: 'json',
        };
      });
    },
  };
};
Enter fullscreen mode Exit fullscreen mode

The filtering based on filter and namespace in esbuild is for performance considerations.

The virtual module can not only obtain content from the disk but also calculate content directly in memory, and even import the module as a function call.

Memory Virtual Module

The virtual module concept in esbuild is not only powerful but also highly flexible, allowing for various innovative applications.

1. Environment Variables as Virtual Module

You can create a virtual module that represents the environment variables of the system. Here’s an example:

let envPlugin = {
  name: 'env',
  setup(build) {
    // Intercept import paths called "env" so esbuild doesn't attempt
    // to map them to a file system location. Tag them with the "env-ns"
    // namespace to reserve them for this plugin.
    build.onResolve({ filter: /^env$/ }, args => ({
      path: args.path,
      namespace: 'env-ns',
    }))

    // Load paths tagged with the "env-ns" namespace and behave as if
    // they point to a JSON file containing the environment variables.
    build.onLoad({ filter: /.*/, namespace: 'env-ns' }, () => ({
      contents: JSON.stringify(process.env),
      loader: 'json',
    }))
  },
}

// 
import { NODE_ENV } from 'env' 
Enter fullscreen mode Exit fullscreen mode

2. Function Virtual Module

You can use the module name as a function, complete compile-time calculations, and even support recursive function calls. This example demonstrates a Fibonacci sequence:

 build.onResolve({ filter: /^fib((\d+))/ }, args => {
            return { path: args.path, namespace: 'fib' }
   })
  build.onLoad({ filter: /^fib((\d+))/, namespace: 'fib' }, args => {
        let match = /^fib((\d+))/.exec(args.path), n = +match[1]
        let contents = n < 2 ? `export default ${n}` : `
              import n1 from 'fib(${n - 1}) ${args.path}'
              import n2 from 'fib(${n - 2}) ${args.path}'
              export default n1 + n2`
         return { contents }
  })



import fib5 from 'fib(5)' 
Enter fullscreen mode Exit fullscreen mode

3. Stream Import

You can run development without downloading node_modules, similar to the streaming import scene of Deno and Snowpack:

import { Plugin } from 'esbuild';
import { fetchPkg } from './http';
export const UnpkgNamepsace = 'unpkg';
export const UnpkgHost = 'https://unpkg.com/';
export const pluginUnpkg = (): Plugin => {
  const cache: Record<string, { url: string; content: string }> = {};
  return {
    name: 'unpkg',
    setup(build) {
      build.onLoad({ namespace: UnpkgNamepsace, filter: /.*/ }, async (args) => {
        const pathUrl = new URL(args.path, args.pluginData.parentUrl).toString();
        let value = cache[pathUrl];
        if (!value) {
          value = await fetchPkg(pathUrl);
        }
        cache[pathUrl] = value;
        return {
          contents: value.content,
          pluginData: {
            parentUrl: value.url,
          },
        };
      });
      build.onResolve({ namespace: UnpkgNamepsace, filter: /.*/ }, async (args) => {
        return {
          namespace: UnpkgNamepsace,
          path: args.path,
          pluginData: args.pluginData,
        };
      });
    },
  };
};

// in file
import react from 'react';
Enter fullscreen mode Exit fullscreen mode

Scenarios and Flexibility

The virtual module allows for different scenarios:

  • Local Development: Completely use local file loading.
  • Local Development without Node Modules: Suitable for super-large monorepo projects where installing all node_modules is too slow.
  • Web-Side Real-Time Compilation: For performance and network issues, where third-party libraries are fixed, and business code may change.
  • Dynamic Compilation on the Web Side: Intranet WebIDE scenarios, where both third-party libraries and business code are not fixed.

Converting CommonJS (CJS) to ECMAScript Modules (ESM)

On the browser, has been a complex task. While rollup offers a solution, it comes with inherent problems and additional complexities. Esbuild provides an alternative approach, utilizing a module wrapper similar to Node for CJS compatibility, and introducing a minimal runtime to support CJS. This method ensures better compatibility with CJS, simplifies the process by eliminating the need for additional plugins, and represents a promising direction for those seeking to perform this conversion on the browser without unnecessary complications.

Image description

Virtual Module Support in Rollup

The support of rollup’s virtual module is quite hacky, and a ‘\0’ is put in front of the dependency path, which is intrusive to the path, and is not very friendly to some ffi scenarios (c++ string regards ‘\0’ as a terminal symbol). When dealing with more complex virtual module scenarios, the path ‘\0’ is very easy to deal with problems.

File System

The local bundler accesses the local file system, but there is no local file system in the browser, so how to access files can generally be realized by implementing the bundler as independent of the specific fs, and all file access is configurable fs to access. Rollup’s REPL(https://rollupjs.org/repl/) does that. Therefore, we only need to replace the loading logic of the module from fs with the memfs on the browser. The onLoad hooks can be used to replace the reading logic of the file.

Node Module Resolution

When we switch file access to memfs, an ensuing problem is how to obtain the actual path format corresponding to the id of require and import. The algorithm for mapping an id to a real file address in node is module resolution. This algorithm’s implementation is more complex and needs to consider the following situations. For detailed algorithms, see https://tech.bytedance.net/articles/69/

  • file|index|directory three situations
  • js, json, addon multi-file suffix
  • The difference between esm and cjs loader
  • main field processing
  • Conditional exports processing
  • exports subpath
  • NODE_PATH processing
  • recursive lookup
  • Symlink processing

In addition to the complexity of node module resolution itself, we may also need to consider the main module filed fallback, alias support, ts and other suffix support and other functions that webpack supports but are more popular in the community, yarn|pnpm|npm and other package management tools are compatible and other issues. It is costly to implement this set of algorithms from scratch, and the module resolution algorithm of node has been updated. The enhanced-resolve module of webpack basically realizes the above functions, and supports custom fs, which can be easily transplanted to memfs.

Main Field

The main field is also a relatively complicated problem, mainly because there is no unified specification, and the community library does not fully comply with the specification, which mainly involves the distribution of packages, except that the main field is officially supported by nodejs, module, browser, browser Various bundlers and third-party community libraries have not reached a consensus on fields such as

  • How to configure the entry of cjs and esm, esnext and es5, node and browser, dev and prod
  • The code in module| main should be es5 or esnext (determines whether the code in node_module needs to use transformer)
  • Should the code in the module point to the implementation of the browser or the implementation of the node (determines the node bundler and the priority of main and module in the case of browser bundler)
  • How to distribute and process the difference code between node and browser, etc.

Upkg

Next, we need to process the module of node_modules. There are two ways at this time. One is to mount the full amount of node_modules into memfs, and then use enhanced-resolve to load the corresponding modules in memfs. The other way is to use For unpkg, convert the id of node_modules to unpkg’s request. These two methods have their applicable scenarios. The first one is suitable for a relatively fixed number of third-party modules (if it is not fixed, memfs must not be able to carry infinite node_modules modules), and the access speed of memfs is much faster than network request access, so it is very Suitable for the realization of building systems. The second type is suitable for the number of third-party modules is not fixed, and there is no obvious real-time requirement for compilation speed. This type is more suitable for webide scenarios like codesandbox, and businesses can independently choose the npm modules they want.

Shims and Polyfills

Another problem encountered by web bundler is that most of the community modules are developed around node, which will rely heavily on the native API of node, but the browser does not support these APIs, so these modules are directly run on the browser. There will be problems. At this time, there are two situations...

Simulating Node Utility APIs

One is that these modules actually rely on some node utility APIs such as utils, path, etc. These modules do not actually depend on node runtime. At this time, we can actually simulate these APIs on the browser. Browserify is actually designed to solve this kind of scenario, providing a large number of node API polyfills on the browser, such as path-browserify, stream-browserify, etc. These polyfills allow developers to utilize Node-like functionality within the browser environment, bridging the gap between server-side and client-side code.

Separating Browser and Node Logic

The other situation is to process the logic of the browser and node separately. Although the code of node does not need to be executed on the browser, it is not expected that the implementation of node will increase the size of the browser bundle package and cause errors. At this time, we need node-related modules to be externally processed. This means that the code specific to the Node environment is isolated from the browser bundle, ensuring that the browser does not have to load unnecessary code that it cannot execute.

A Little Trick

A little trick to handle this situation, especially when configuring external may be troublesome or impossible to modify the bundler configuration, is to wrap require in eval, and most bundlers will skip the packaging of the require module. Such as eval(‘require’)(‘os’). This trick allows developers to bypass certain bundling constraints, providing more flexibility in handling platform-specific dependencies.

In conclusion, shims and polyfills play a crucial role in bridging the differences between the Node and browser environments. By simulating Node APIs or separating logic based on the platform, developers can create more portable and efficient code that runs smoothly across different environments. Whether it’s through the use of existing libraries like Browserify or clever coding tricks, these techniques are essential tools in the modern web development toolkit.

Web Assembly

Web Assembly (Wasm) is a binary instruction format that allows code written in languages like C++ to be executed in a web browser. This technology has opened up new possibilities for web development, enabling performance-critical applications to run at near-native speed. However, integrating Web Assembly into a web project can present challenges, particularly when it comes to managing file size.

C++ Modules and Web Assembly

In many businesses, including the one described here, C++ modules are essential. In a local environment, C++ can be compiled into a static library and called through Foreign Function Interface (FFI). However, in the browser, it needs to be compiled into Web Assembly to run. This compilation process can result in large file sizes. For example, the Web Assembly module of esbuild is about 8MB, and a custom-compiled static library may also be around 3MB. These large file sizes can have a significant impact on the overall package size, affecting load times and user experience.

Code Splitting Solution

A potential solution to this problem is to apply code splitting techniques to the Web Assembly modules. By dividing the Web Assembly code into “hot” and “cold” segments, developers can optimize the loading process:

Hot Code: This segment includes the code that is most likely to be used during the initial access. By loading only this portion of the Web Assembly module at first, the initial load time can be reduced, enhancing the user experience.
Cold Code: This segment includes the code that is less likely to be used or only needed in specific scenarios. By deferring the loading of this portion, the overall package size for the initial load can be minimized.

Applications of esbuild

esbuild is a versatile tool that offers three core functionalities, each of which can be used independently or in combination with the others:

  • Minifier: Reduces the file size of code by removing unnecessary characters and spaces.
  • Transformer: Converts code from one form to another, such as from TypeScript to JavaScript.
  • Bundler: Combines multiple files and dependencies into a single file for easier distribution.
  • Below are some specific applications where esbuild can be utilized:

Enhanced Registration and Minification Tools

By leveraging esbuild’s transformation function, developers can use esbuild-register to replace the registration of unit test frameworks like ts-node. This substitution significantly boosts the speed of the process. An example of this can be found at github.com/aelbore/esb. Additionally, since ts-node now supports custom registers, it's possible to directly replace the register with esbuild-register. When it comes to minification, esbuild's performance surpasses that of terser by more than 100 times, making it a highly efficient choice.

Node Bundler: A Case for Bundling in the Node Community

Unlike the front-end community, the node community rarely employs bundling solutions. This discrepancy arises partly because node services may utilize operations like file system (fs) and addons that are not bundle-friendly. Additionally, most bundling tools are designed with the front-end in mind, leading to the need for extra configuration when applied to the node domain.

However, bundling node applications or services offers significant advantages:

  1. Reduction in Node Modules Volume: Bundling can decrease the size of node_modules, accelerating installation. By installing only the bundled code instead of numerous dependencies, the installation volume is greatly reduced, speeding up the process. Tools like pnpm and yarn exemplify this approach (source).
  2. Enhanced Cold Start Speed: Bundling can improve the cold start speed of applications by reducing the JS code size through tree shaking. Since parsing overhead is a significant factor in the cold start speed of large applications, this reduction is especially beneficial for applications sensitive to cold start times. Additionally, avoiding file IO further enhances the speed, making bundling suitable for scenarios like serverless that are sensitive to cold start.
  3. Avoidance of Upstream Semver Disruption: Semver (Semantic Versioning) is a community specification with strict code requirements. Bundling the code can prevent upstream dependencies from causing application bugs, a crucial consideration for high-security applications like compilers.

Given these benefits, the author strongly advocates for bundling node applications, and esbuild offers ready support for node bundles.

Challenges and Limitations of esbuild

While esbuild offers many advantages, it’s essential to recognize some of its challenges and limitations:

Debugging Difficulties:

  • Language Barrier: esbuild’s core code is written in Golang, and users interact with compiled binary code and JavaScript glue code.
  • Debugging Constraints: Breakpoint debugging the binary code is nearly impossible using tools like lldb or gdb.
  • High Debugging Requirements: Debugging esbuild’s code requires pulling down the code, recompiling, and debugging, a process that is both demanding and complex.

Limited Target Support:

  • ES6 Targeting: esbuild’s transformer currently only targets ES6.
  • ES5 Considerations: Many developers, especially in certain regions, still need to consider ES5 scenarios. As a result, esbuild cannot be used as the final product on its own and often requires collaboration with tools like Babel, TSC, or SWC to transition from ES6 to ES5.

Performance and Size Issues with Golang WebAssembly (Wasm):

  • Performance Degradation: The performance of Wasm compiled by Golang is suboptimal, with a 3–5 times degradation compared to native performance.
  • Package Size: The Wasm package compiled by Go is relatively large (8M+), making it unsuitable for scenarios sensitive to package size.

Smaller Plugin API:

  • Limited Hooks: Compared to the extensive plugin API support in Webpack and Rollup, esbuild only supports two plugin hooks: onLoad and onResolve.
  • Functionality Gaps: While these hooks enable a wide range of tasks, they are still relatively limited. For instance, post-processing of chunks after code splitting is not supported.

Top comments (1)

Collapse
 
henriquelimas profile image
Henrique Limas

That is a great text, thank you for sharing. How the bundlers differ (if they differ) when running the bundled code? It seems on webpack for instance it runs require functions to fetch the modules and this can take some time depending on the application size. We can warmup them separately (as you suggested in another post) to try to speed up, but is it needed in all the bundlers also?