DEV Community

Cover image for JS MODULE LOADERS; or, a brief journey through hell
Brian Kirkpatrick
Brian Kirkpatrick

Posted on

JS MODULE LOADERS; or, a brief journey through hell

Introduction

There's a saying in defense circles: "amateurs talk strategy; professionals talk logistics". In other words, what seems like the most mundane element of complex engineering tasks (moving stuff on time from point A to point B) is a surprisingly critical element of success.

If I had to force an analogy here, I'd say for the developer community that "amateurs talk code, professionals talk integration". It turns out that writing code (especially from scratch) is surprisingly easy, whereas putting code together (especially code you didn't write yourself) is surprisingly difficult.

So, in the world of JavaScript, how do we put code together? Well, it depends. In the year of our lord two-thousand and twenty-two, 26 years after JavaScript was released, we still don't have a consistent way to integrate units of code together. We don't even have a consistent way to define what those units of code are!

The Problems

You'll note the word "consistent", though. There are many ways you could go about it, but few ways that are truly interoperable. Let's break this into three specific problems:

  1. How are packages managed?

  2. How are modules exported?

  3. How are modules specified?

For example, the answer to #1 could be NPM, Yarn, or some kind of CDN. It could also be as simple as git submodules. (For reasons I won't dive too deeply into, I prefer the latter approach, in particular because it is completely decoupled from the module you are developing--and even the language you are developing in.)

The answer to #2 could be something like AMD/RequireJS modules, or CommonJS/Node, or browser-level script tags within a global scope (yuck!). Of course, Browserify or WebPack could help you here if you're really a big fan of the latter. I'm a big fan of AMD/RequireJS but there's no arguing that being able to run (and test) a codebase from the command line (locally or remotely) is HUGELY advantageous, both for development (just messing around) and for deployment (e.g., automated testing from a CI job).

The answer to #3 is a little more subtle, in no small part because with something like CommonJS/Node it's entirely implicit. With AMD/RequireJS, you have specific "require", "exports", and "module" parameters to a "define()" function. These exist in CommonJS/Node, too, but they're implied. Try printing "module" to console.log sometime and look at all the juicy details you've been missing.

SFJMs and UMD

But this doesn't include the contents of your package.json (if any) and even with AMD/RequireJS there's no specific standard for attaching metadata and other module properties. That's one reason I put together the SFJM standard in a previous dev.to article:

https://dev.to/tythos/single-file-javascript-modules-7aj

But regardless of your approach, the module loader (e.g., export problem outlined in #2 above) is going to be sticky. That's one reason the UMD standard has emerged, for which there is an excellent writeup by Jim Fischer:

https://jameshfisher.com/2020/10/04/what-are-umd-modules/

UMD specifies a header to be pasted in front of your define-like closure. It's used by a few major libraries, including support for certain build configurations, like THREE.js:

https://github.com/mrdoob/three.js/blob/dev/build/three.js

The Header

The UMD header has several variations but we'll consider the following one from Jim Fischer's writeup:

// myModuleName.js
(function (root, factory) {
    if (typeof define === 'function' && define.amd) {
        // AMD. Register as an anonymous module.
        define(['exports', 'b'], factory);
    } else if (typeof exports === 'object' && typeof exports.nodeName !== 'string') {
        // CommonJS
        factory(exports, require('b'));
    } else {
        // Browser globals
        factory((root.myModuleName = {}), root.b);
    }
}(typeof self !== 'undefined' ? self : this, function (exports, b) {
    // Use b in some fashion.

    // attach properties to the exports object to define
    // the exported module properties.
    exports.action = function () {};
}));
Enter fullscreen mode Exit fullscreen mode

There are effectively three use cases captured here: AMD/RequireJS; CommonJS/Node; and browser globals. Let's be honest, though--it's ugly. (This isn't a hack at Jim, this is a general UMD problem.) Among other things, here's what bugs me:

  • It's just plain bulky--that's a lot of text to paste at the top of every module

  • It actually tries too hard--I've never found a need to support browser globals, I just need my AMD/RequireJS-based single-file JavaScript modules to be able to run/test in a CommonJS/Node environment

  • The dependency listings are explicitly tied into the header--so it's not actually reusable. You have to customize it for every module! Compare this to simply specifying const b = require('b'); within the closure factory itself and clearly there's a big difference.

  • I'm not interested in treating usecases equally. I'm writing in AMD/RequireJS, and capturing CommonJS/Node loading is the edge case.

The main problem here with the last point is, AMD/RequireJS already give us a very clean closure and explicitly module definition interface. It's CommonJS/Node that require the hack. So, can we streamline the header and focus on adapting the latter to the former? Preferably in a way that is agnostic to dependencies? Well, since I'm writing this article, you can probably tell the answer is "yes".

My Approach

Let's start with symbols. What's available, and what isn't? Let's start with a AMD/RequireJS module already defined and working. If you put yourself in the mind of the CommonJS/Node interpreter, the first thing you'll realize is that, while "require", "exports", and "module" are already defined implicitly, the "define" factory is not. So, this is the root of our problem: we need to define a "define" (ha ha) factory that guides CommonJS/Node to interpret the module definition closure in a consistent way.

There's a good example of the conditional for this from UMD that we can borrow (and adjust slightly):

if (typeof(define) !== "function" || define.amd !== true) {
Enter fullscreen mode Exit fullscreen mode

Interestingly, you can't just check to see if define exists. You need to make sure it doesn't actually exist AS THE AMD IMPLEMENTATION, because CommonJS/Node may retain the "define" symbol outside of this context--for example, in the scope of another module that is "require()"-ing this one. Bizarre, but true.

So, now our goal is to define "define()". How can this be adapted to a CommonJS/Node scope? What we need to ensure is, the existence of an identical "define()" interface:

  • It should take a single parameter, an anonymous function (which we will call the "factory" here) within whose closure the module contents are defined.

  • That function should have the following interface: "require" (a function that resolves/returns any module dependencies based on path); "exports" (an Object that defines what symbols will be available to external modules); and "module" (a definition of module properties that includes "module.exports", which points to "exports".

  • Define should call that function and return the export symbols of the module. (In the case of a SFJM-compatible definition, this will also include package.json-like module metadata, including a map of dependencies.)

The last point is interesting because a) there's already multiple references to the module exports, and b) even AMD/RequireJS supports multiple/optional routes for export symbols. And this is one of the stickiest issues at the heart of cross-compatibility: the "exports" symbol can persist and be incorrectly mapped by CommonJS/Node if not explicitly returned!

Thanks, Exports, You're The Real (thing preventing us from reaching) MVP

Jesus, what a nightmare. For this reason, we are going to adjust how our factory closure works:

  • We are going to explicitly "disable" the "exports" parameter by passing an empty Object ("{}") as the second parameter to the factory.

  • We are going to explicitly return the module exports from the factory implementation

  • We are going to explicitly map the results of the factory call to the (file-level) "module.exports" property.

The combination of these adjustments means that, while AMD/RequireJS supports multiple routes, we are going to constrain our module implementations to explicitly returning export symbols from the factory call to route them to the correct CommonJS/Node symbol.

If you don't do this--and I lost some hair debugging this--you end up with a very "interesting" (read: batshit insane in only the way CommonJS/Node can be) bug in which the parent module (require()'ing a dependency module) gets "wires crossed" and has export symbols persist between scopes.

It's bizarre, particularly because it ONLY HAPPENS OUTSIDE THE REPL! So, you can run equivalent module methods from the REPL and they're fine--but trying to map it within the module itself (and then, say, calling it from the command line) will break every time.

So, what does this look like, practically? It means the "define" definition we are putting into the conditional we wrote above looks something like this:

define = (factory) => module.exports = factory(require, {}, module);
Enter fullscreen mode Exit fullscreen mode

It also means our module closure starts with explicitly disabling the "exports" symbol so poor old CommonJS/Node doesn't get wires crossed:

define(function(require, _, module) {
    let exports = {};
Enter fullscreen mode Exit fullscreen mode

Sigh. Some day it will all make sense. But then it won't be JavaScript. ;)

Examples

What does this look like "in the wild", then? Here's a GitHub project that provides a reasonably clear example:

https://github.com/Tythos/umd-light/

A brief tour:

  • "index.js" shows how the entry point can be wrapped in the same closure that uses the "require()" call to transparently load the dependency

  • "index.js" also shows us how to add a SFJM-style hook for (from CommonJS/Node) running an entry point ("main") should this module be called from the command line

  • ".gitmodules" tells us that the dependency is managed as a submodule

  • "lib/" contains the submodules we use

  • "lib/jtx" is the specific submodule reference (don't forget to submodule-init and submodule-update!); in this case it points to the following utility of JavaScript type extensions, whose single-file JavaScript module can be seen here:

https://github.com/Tythos/jtx/blob/main/index.js

  • This module uses the same "UMD-light" (as I'm calling it for now) header.

The Problem Child

And now for the wild card. There is, in fact, yet another module export approach we haven't mentioned: ES6-style module import/export usage. And I'll be honest--I've spent an unhealthy portion of my weekend trying to figure out if there's any reasonable-uncomplicated way to extend cross-compatibility to cover ES6/MJS implementations. My conclusion: it can't be done--at least, not without making major compromises. Consider:

  • They're incompatible with the CommonJS/Node REPL--so you loose the ability to inspect/test from that environment

  • They're incompatible with a define closure/factory--so there go all of those advantages

  • They directly contradict many of the design principles (not to mention the implementation) of the web-oriented AMD/RequireJS standard, including asynchronous loading (it's in the name, people!)

  • They have... interesting assumptions about pathing that can be very problematic across environments--and since it's a language-level standard you can't extend/customize it by submitting MRs to (say) the AMD/RequireJS project (something I've done a couple of times)--not to mention the nightmare this causes in your IDE if path contexts get mixed up!

  • The tree-shaking you should be able to reverse-engineer from partial imports (e.g., symbol extraction) saves you literally zero anything in a web environment where your biggest cost is just getting the JS from the server and through the interpreter.

If anything, your best bet seems (like THREE.js) to only use them to break a codebase into pieces (if it's too big for a single-file approach, which I try to avoid anyway), then aggregate those pieces at build time (with WebPack, Browserify, etc.) into a module that uses a CommonJS/Node, AMD/RequireJS, or UMD-style header to ensure cross-compatibility. Sorry, ES6 import/export, but you may have actually made things worse. ;(

Top comments (2)

Collapse
 
joseph_stein_f49a4d33fb44 profile image
joseph stein

Thanks for your article a really good read. I am going through A problem with module loading right now an I'm still new to JavaScript in a lot of ways. This article is barely in my understanding range so please forgive me if this sounds stupid. Why is the falt of esm, I think it just needs a little polish I think you're on the right track that esm modules should have a header attached to pass on info. This still could happen I think that every javascript library should have this header passed into in and it needs to be standardized unfortunately I am not the person to do it. I don't have the time to try to explain it. I really don't see the point of builders and tree shaking. Everything gits large over time and it much easier in IMHO if thing are smaller easer peice to debug. I think minimize and complications are still essential though but only need to go through just once and put on a cdn. I been looking for info like this and it's been a great help. I personally like to see a library AKA a module loaders that tries to comply to the latest standards and do everything you mentioned I think it could be as simple as passing a package.json file along with the esm module. I currently reaching if I could create such a best and checking for the closet library already out there that does this.

Collapse
 
tythos profile image
Brian Kirkpatrick

Yes, we've reached a point by now where ESM is pretty much the default (and doesn't even need to be flagged explicitly in as many places now). This is good for the ecosystem as a whole, even if it did take a while. For header-like data it might be worth looking at import.meta symbols, which (in some frameworks like Astro) can also be used to extent glob introspection and other hooks.