Deep dive into Monorepos

#lerna #bazel #yarn #monorepo

What are the best tools to use with a monorepo? As with any other software design consideration, the answer is - it depends on what you need.
But when we talk about monorepo, we should also clarify what a monorepo is. Too often, when discussing mono-repo, the answers resemble the people in the image on the cover of this article.

Allow me to share a few thoughts and clarification on this.

What's inside a monorepo?

The one commonality for all mono-repos is that somewhere in their folder structure, you will find multiple package.json files, each with its own name. So, while we commonly use the term mono-repo, an adequate term is probably a multi-package repo.

The "monorepo" (and now quotes are appropriate) may contain all of your company projects, or can just have a single project broken into packages.

Look under the Facebook GitHub organization and you will find that most of their open-source projects, such as Jest, React, CRA, Docosaurus, are built as monorepo. So clearly, this is not a mono (single) repo, but poly (multi) repos.

So your initial consideration should be the scope of your monorepo. Your monorepo may contain:

All of the organization's backend and front end applications
Your backend applications or microservices and their shared code split into packages.
Utilities shared between your backend and frontend applications (.e.g types, schemas).
Components shared between multiple frontend applications (such as design system)

Goals

Next, you should consider the goals you are trying to achieve by using a multi-package repo. Some of the answers are:

Increase code quality using small loosely-coupled units
Publish packages independently
Decrease installation footprint at the consumer (i.e. install package X only if I consume it. Select between installing package Y and Z)
Reduce maintenance overhead across repos
Reduce build / CI time by JIT building and testing.

It is essential to decide where you are heading as this impacts the decisions you make later.

Organizing a monorepo

I found three dimensions that impact the organization of the mono repo.

a) Installation (bootstrapping)
b) Development flow
c) Publishing

Installation

In multiple packages repo, each package.json contains its NPM packages dependency. The dependencies include runtime dependencies (i.e., dependencies) and development time dependencies (devDependencies).

The two methods for installing the dependencies are:

Install in a node_modules folder inside each package
Install all shared node_modules in the root of the project, and only packages that their versions collide inside each package.

The latter method is called "hoisting," and it is working because of node package resolution algorithm. When resolving modules, the process "bubbles" until it finds the packaged.

Hoisting npm packages is useful in saving time and disk space. (As we all learned, node_modules are the heaviest object in the universe.)

Installing local node_modules ensures that the package correctly defines its requirements in the package.json manifest. It can also avoid the cases of phantom dependencies and doppelgangers.

The standard tools to enable NPM hoisting are Lerna and Yarn (using workspaces). Lerna supports hoisting also for NPM, but in general, is doing a lesser job than Yarn.

Development workflow

The package.json must include the runtime dependencies for each NPM package (or else it will fail to run).
But what about the development dependencies?

There are two main options on how to build and test our packages:

Centralized development
Federated development

Centralized

In a centralized development, the development tools are only
installed and configured once.

├── babel.config.json
├── jest.config.js
├── node_modules
│   ├── jest
│   └── typescript
├── packages
│   ├── package-a
│   │   └── package.json
│   └── package-b
│   └── package.json
└── tsconfig.json

Test configuration, for example, will point to search all of the .spec or .test files all over the repository and run them.
Build can run centralized but need to generate a build directory for each package (for publishing it). In some cases, you may need to create dedicated scripts to achieve this.

A centralized development workflow is useful when building multiple packages that use the same technology. In such a case, it reduces the maintenance overhead, as there is only a single place for configuring the workflow.
A centralized workflow can become a maintenance nightmare if you are adding multiple different technologies.

Autonomous workflow

In a federated workflow, each package is autonomous in the tools, process, and configuration it uses. Here is an example of such a folder structure:

└── packages
├── package-a
│   ├── jest.config.js
│   ├── node_modules
│   │   ├── jest
│   │   └── typescript
│   ├── package.json
│   └── tsconfig.json
└── package-b
├── babel.config.json
├── node_modules
│   ├── babel
│   └── mocha
└── package.json

A decentralized model supports heterogeneous packages, such as backend and frontend applications. This comes with the price tag of higher maintenance overhead.

Lerna can be used to support both workflows. Lerna supports autonomous workflow by specifying each command separately in each package.json and running it centrally using the lerna run command. It can also run ad-hoc commands using the lerna exec syntax.
Nx takes a slightly different approach by using a centralized configuration file (workspace.json, heavily inspired by angular cli configuration) and specifying a dedicated toolchain for each package.
Bazel is another tool that supports monorepo development workflow. Bazel not only decentralized the workflow process but can harness multiple machines to work concurrently. Bazel is suited for very large mono-repos, and can easily become overkill for smaller ones.

Publishing

The last part of using the package is publishing it to a public or private NPM registry. Packages published to a registry are versioned. You can publish all packages with a single version or each package with its own version. Publishing a version also implies publishing a changelog. According to the method selected, there will be a single changelog for all packages or a separate one for each published package.

Lerna is probably the leading tool for multi-package publishing (in fact, this is what the Facebook repos are using it for). Under the hood, Lerna uses Npm and Yarn pack and publish commands.

Conclusion

Deciding to go "monorepo" is only the first step in the journey. It derives a set of further decisions to optimize the repo for the specific project needs.