Jakub Skoneczny

Posted on Aug 4, 2021 • Originally published at blog.jskoneczny.pl

Control your Monorepo 🗄️

#architecture #codequality #discuss #javascript

You might have heard the phrase monorepo earlier before. But, for those who haven't heard anything about it, monorepo is an architectural pattern where you keep multiple projects inside a single git repository.

Imagine working on a semi-large project that includes some back-end, web front-end, and mobile applications. The most common approach would be to create different repositories for each of those applications. Then, developers would work on each part separately, developing, committing, and pushing to those repositories.

But, as the work goes along, you start to notice some issues with your workflow:

you see that you have some code duplication between your projects
detecting critical/breaking changes became difficult since many problems came up only in the staging environment

How to get around code duplication?

The most common approach to deal with duplication is to move code "up" and separate it into reusable functions or maybe reusable typings. But, since your whole project consists of three separate repositories, there is no common spot to place reusable code.

The only way to achieve this opportunity to lift code 'up' is to create another repository for that reusable code. A package or library, which we will keep inside that repository, must be later built and published on the NPM registry.

Of course, since our two applications would use this package to import and use it, any change in that common library would create a need to publish a new version of that library on NPM.

We would have to keep track of releases and bump the version of that reusable package accordingly to the changes, probably using semantic versioning.

How to deal with late bug detection?

Introducing multiple apps and packages in a separate repository to maintain brings more significant problems than keeping proper versioning in mind. Imagine the following situation:

you are working on a back-end application, and you need to change the response shape of some endpoint
you commit your changes, the PR passes all the necessary tests, and your code ships to the staging environment
after deployment, you realize that part of the front-end application related to that prior endpoint has stopped working 😮

Did it happen because you haven't tested your changes locally with the front-end application? Yes. But did it also occur because your workflow is not resilient enough? Also yes!

It's hard to test everything, so we developers have CI/CD tools to take some weight off our shoulders.

We create automatic pipelines that run tests and perform code analyses, which are run on push. For example, in our case, we could have had two pipelines configured, one for running all of the checks for the front-end application, the other to do the same but for the back-end application.

Unfortunately, when it comes to having two separated pipelines for two different applications, the fact that they are passing doesn't give us much confidence. What about that reusable library, which we had moved to a separate repository? Is it even tested? Does the front-end use the same version of that package as the back-end? Those are the type of questions that we lack an answer for. Of course, our code is bug-free, and all the tests are passing, but will those two applications work together?

Even most minor changes, like extending the shape of a response with the extra field, maybe breaking change if the front-end does some strict runtime validation for static types (runtypes, zod, etc.)

Monorepos to the rescue

What if we had put our applications together in the same repository? Code duplication would no longer be a problem since we could move all the reusable code to another module or directory. Late bug detection would also not be a problem anymore because the pipelines for our front-end and back-end applications would run simultaneously. Linting, type checking, and static code analysis would also run globally.

In fact, we would ensure that both our applications would be compatible with each other at any point in time since none of the breaking changes could be done to one package without updating the other ones.

There are also other advantages of using monorepo over separate repositories:

we could have common configs and enforce the style and linting rules across multiple applications,
developers working on the project would have better visibility into the codebase,
dependency management would be simplified as we could enforce an exact version of the same package used in multiple applications,
we could manage our git history better since changes to multiple packages can be packed into a single commit

Disadvantages of using monorepo

Despite many visible pros of using monorepo, this architectural pattern comes with some limitations. The most significant limitation is the lack of control over packages to which developers have access. If all of the applications and packages are stored in the same repository, then the person having access to that repository can now look into the whole codebase. Some companies enforce strict access control and restrict some parts of the app, which is irrelevant to the user.

The other big concern is performance. Since there is a lot of code in one place, the build time is higher, and there are many commits that Git tracks. watching for changes and rebuilding only the packages that have changed can shorten build times and pipelines. I've heard that some tools allow you to fetch only one package along with its dependencies to speed git locally, but I haven't tested them out.

Monorepo tooling

There are great tools and utilities for constructing monorepo with multiple modules inside and a pleasant developer experience. Here I specify the most popular ones, which I've had an opportunity to get familiar with:

Yarn workspaces

Yarn workspaces link your dependencies together, which means that your packages can depend on one another. In addition, it sets up a single node_modules folder without cloning dependencies throughout different packages in the project.

Details on how to set up yarn workspaces can be found on yarn's official docs

I would recommend yarn workspaces to anyone who uses yarn as a dependency manager. It is easy to set up and maintain.

Nx

Nx is an advanced set of extensible dev tools for monorepos, emphasizing modern full-stack web technologies. It provides nifty features like incremental builds and generating dependency graphs.
Nx comes with a CLI that allows you to quickly generate and add new packages, applications, or libraries into your project.

More on that can be found in the Nx docs

Rush.js

Rush.js is a robust monorepo infrastructure open sourced by Microsoft.
One of its key features is that Rush.js installs all dependencies for all projects into a shared folder and then uses isolated symlinks to reconstruct an accurate node_modules folder for each project.

Rush.js also helps to ensure there are no phantom nor duplicated dependencies. Along with the PNPM package manager, it allows you to save disk space by installing your dependencies only once.

It also allows you to manage your packages, build and publish them. At the present moment, Rush.js is my favorite among other tools that I've mentioned.

More on Rush.js can be found on the official docs

Final thoughts

Monorepo architecture is a controversial architectural pattern. It comes with significant advantages as well as some big challenges. Even though many of the biggest companies use monorepos (Google, Facebook, Microsoft), this pattern has many opponents.

What do you guys think? Do you have some thoughts about monorepos? Do you have some good or bad experiences with them? I would like to know your opinions, and I am looking forward to the discussion.

I hope you liked this introduction to monorepos. 🙂 Feel free to comment or ping me with DM! ✉️

Thanks for reading! If you are interested in the latest tech news, you can follow my account since I plan to post here regularly. I also tweet on a regular basis so that you can follow My Twitter account as well!

Top comments (7)

Andrei Gatej • Aug 6 '21

Thanks for sharing, it was a great read!

I have one question, would you recommend using one of the monorepo tools for a project that will have only the 'client'(the frontend app) and 'server'(the backend app) directories?

Thanks!

Jakub Skoneczny • Aug 6 '21

Hi, yes! It doesn't need to be a frontend and backend application only :)

I think that would also be a huge benefit for you if you use monorepo for ANY bigger project :) For example, for Frontend, you could definitely split your application into a reusable UI library, a Storybook application with a style guide, a library with icons, reusable hooks, etc.

Andrei Gatej • Aug 6 '21

That’s great! I plan to start a new project and the last time I had a “client” and a “server” directory in the same repo, it felt quite “wrong” to manually enter each folder and start the app.

So I’ll be exploring yarn workspaces, thank you!

roshan092 • Aug 7 '21

I think monorepo is fine for grouping similar projects\microservices into a single codebase. But putting all code together in one repo is an overkill. Introduces a lot of other complexity. Works for large companies like google who have dedicated teams to develop tooling to support this. Not feasible for others.

Jakub Skoneczny • Aug 7 '21

Probably you are right, that without advanced tooling that would be painful to work with because of the performance.
I've heard that preconstruct allows to set some packages into development mode and prebuild the others, so the whole development process is much faster because you only work with a subset of the whole monorepo. But I haven't heard of any tool that would allow to only fetch part of the repository and not the whole of it :/

stefanonepa • Aug 7 '21

Did you try lerna?

Jakub Skoneczny • Aug 7 '21

I did once use Lerna with yarn workspaces, but Lerna was there just to manage releases and versioning.