You might have heard the phrase monorepo earlier before. But, for those who haven't heard anything about it, monorepo is an architectural pattern where you keep multiple projects inside a single git repository.
Imagine working on a semi-large project that includes some back-end, web front-end, and mobile applications. The most common approach would be to create different repositories for each of those applications. Then, developers would work on each part separately, developing, committing, and pushing to those repositories.
But, as the work goes along, you start to notice some issues with your workflow:
- you see that you have some code duplication between your projects
- detecting critical/breaking changes became difficult since many problems came up only in the staging environment
The most common approach to deal with duplication is to move code "up" and separate it into reusable functions or maybe reusable typings. But, since your whole project consists of three separate repositories, there is no common spot to place reusable code.
The only way to achieve this opportunity to lift code 'up' is to create another repository for that reusable code. A package or library, which we will keep inside that repository, must be later built and published on the NPM registry.
Of course, since our two applications would use this package to import and use it, any change in that common library would create a need to publish a new version of that library on NPM.
We would have to keep track of releases and bump the version of that reusable package accordingly to the changes, probably using semantic versioning.
Introducing multiple apps and packages in a separate repository to maintain brings more significant problems than keeping proper versioning in mind. Imagine the following situation:
- you are working on a back-end application, and you need to change the response shape of some endpoint
- you commit your changes, the PR passes all the necessary tests, and your code ships to the staging environment
- after deployment, you realize that part of the front-end application related to that prior endpoint has stopped working 😮
Did it happen because you haven't tested your changes locally with the front-end application? Yes. But did it also occur because your workflow is not resilient enough? Also yes!
It's hard to test everything, so we developers have CI/CD tools to take some weight off our shoulders.
We create automatic pipelines that run tests and perform code analyses, which are run on push. For example, in our case, we could have had two pipelines configured, one for running all of the checks for the front-end application, the other to do the same but for the back-end application.
Unfortunately, when it comes to having two separated pipelines for two different applications, the fact that they are passing doesn't give us much confidence. What about that reusable library, which we had moved to a separate repository? Is it even tested? Does the front-end use the same version of that package as the back-end? Those are the type of questions that we lack an answer for. Of course, our code is bug-free, and all the tests are passing, but will those two applications work together?
Even most minor changes, like extending the shape of a response with the extra field, maybe breaking change if the front-end does some strict runtime validation for static types (runtypes, zod, etc.)
What if we had put our applications together in the same repository? Code duplication would no longer be a problem since we could move all the reusable code to another module or directory. Late bug detection would also not be a problem anymore because the pipelines for our front-end and back-end applications would run simultaneously. Linting, type checking, and static code analysis would also run globally.
In fact, we would ensure that both our applications would be compatible with each other at any point in time since none of the breaking changes could be done to one package without updating the other ones.
There are also other advantages of using monorepo over separate repositories:
- we could have common configs and enforce the style and linting rules across multiple applications,
- developers working on the project would have better visibility into the codebase,
- dependency management would be simplified as we could enforce an exact version of the same package used in multiple applications,
- we could manage our git history better since changes to multiple packages can be packed into a single commit
Despite many visible pros of using monorepo, this architectural pattern comes with some limitations. The most significant limitation is the lack of control over packages to which developers have access. If all of the applications and packages are stored in the same repository, then the person having access to that repository can now look into the whole codebase. Some companies enforce strict access control and restrict some parts of the app, which is irrelevant to the user.
The other big concern is performance. Since there is a lot of code in one place, the build time is higher, and there are many commits that Git tracks. watching for changes and rebuilding only the packages that have changed can shorten build times and pipelines. I've heard that some tools allow you to fetch only one package along with its dependencies to speed git locally, but I haven't tested them out.
There are great tools and utilities for constructing monorepo with multiple modules inside and a pleasant developer experience. Here I specify the most popular ones, which I've had an opportunity to get familiar with:
Yarn workspaces link your dependencies together, which means that your packages can depend on one another. In addition, it sets up a single
node_modules folder without cloning dependencies throughout different packages in the project.
Details on how to set up yarn workspaces can be found on yarn's official docs
I would recommend yarn workspaces to anyone who uses yarn as a dependency manager. It is easy to set up and maintain.
Nx is an advanced set of extensible dev tools for monorepos, emphasizing modern full-stack web technologies. It provides nifty features like incremental builds and generating dependency graphs.
Nx comes with a CLI that allows you to quickly generate and add new packages, applications, or libraries into your project.
More on that can be found in the Nx docs
Rush.js is a robust monorepo infrastructure open sourced by Microsoft.
One of its key features is that Rush.js installs all dependencies for all projects into a shared folder and then uses isolated symlinks to reconstruct an accurate
node_modules folder for each project.
Rush.js also helps to ensure there are no phantom nor duplicated dependencies. Along with the PNPM package manager, it allows you to save disk space by installing your dependencies only once.
It also allows you to manage your packages, build and publish them. At the present moment, Rush.js is my favorite among other tools that I've mentioned.
More on Rush.js can be found on the official docs
Monorepo architecture is a controversial architectural pattern. It comes with significant advantages as well as some big challenges. Even though many of the biggest companies use monorepos (Google, Facebook, Microsoft), this pattern has many opponents.
What do you guys think? Do you have some thoughts about monorepos? Do you have some good or bad experiences with them? I would like to know your opinions, and I am looking forward to the discussion.
I hope you liked this introduction to monorepos. 🙂 Feel free to comment or ping me with DM! ✉️
Thanks for reading! If you are interested in the latest tech news, you can follow my account since I plan to post here regularly. I also tweet on a regular basis so that you can follow My Twitter account as well!