markdown guide
 

My take on the monorepo argument after seeing it play out on a few different teams. These experiences are with having 2 - 50 projects in a monorepo, spanning web and mobile dev.

While monorepos make sharing modules, updating dependencies, and working on FE/BE a lot easier, you trade that inconvenience for tooling around your CI/CD system. For example,

  • you'll have to make sure you only run the builds for the parts of the projects that have actually changed. No one wants to sit there and wait while you loop over every folder running the build command. Yes, I have seen a project starting with Z takes 1.5 hours before CI starts building it (alphabetical sorting).
  • you're at the mercy of your CI system for how you can split up the builds into separate pieces when they're all in the same repo. (Jenkins was quite annoying about this but this was before Jenkinsfile)
  • as another user mentioned, git tags and git related things get really wonky depending on how you've scripted it. Multiple projects can't create the same git tag since it'll just move the tag around.
  • I can't share anything between js, ruby, java, and terraform files so I get no value there if they're all in the same repo.
  • if a dependency is updated, you still have to test and release all the services that depend on it alll at once, lest you have projects sitting in the repo accidentally broken. You can make this a lot easier by investing more in automated testing though.
  • all teams using the repo need to follow the same branching strategy. Some teams might want to use git-flow, cut release branches, and others want to ship from master.
  • the argument that all you need is one git clone doesn't provide THAT much value. I hand new devs a Google doc with a repo(s) to clone and it takes them 10 minutes to clone multiple repos. Saving them 5 minutes here by using a monorepo gets me nothing.
  • at any given time, I only use 10% or less of all company repos so I don't quite care about having the remaining 90% of them up to date.

A good implemention I saw had the back end, front end, deployment scripts, and infrastructure in the same repo but this was limited to one product so it was just 4 subfolders. That was a pleasant experience. It would have gotten annoying if multiple products were also in this repo though.

Anyway, my 2 cents.

 

We switched to a monorepo at my last company. I really, really liked it. The reasons why:

When dealing with a frontend and a backend that are disconnected, it makes working with another developer a lot easier. You can both check out the same branch to start, then start building around that feature branch, without managing two different repos feature branches.

It makes using shared components a lot easier. We had a few products that we were working on that shared a lot of components. We tried using a seperate git repo for the components and installing that as a package via npm, but that was impossible to maintain across multiple branches and versions. Once we switched, we just set up a webpack alias and accessing ~components/COMPONENTNAME was all you had to do to get the component. We also did the same for shared Ruby components/classes.

It makes documentation a lot easier too, you can have a separate folder for the documentation at the root level and set up builds based on that folder.

You also don't run into mismatches with the frontend and backend repos, where you haven't checked out the correct branch in one or the other and so don't have something you expect to exist.

There were other benefits, but those were the biggest ones I could think of. I'd be happy to answer any questions about this, if anyone has any!

 

Any problems with issue clutter or generally too much going on in one environment?

 

We didn't use Issues so much, which might be a consideration. This was a for the companies code and so we tracked bugs/issues in a project management tool. I think proper standards and structure can mitigate a lot of that though, especially with Github's new tooling around issues/suggestions.

As far as too much going on, there were a few cases where we could have named something better. Sometimes two projects would have the same file name, and if you weren't paying attention to that, you could edit the wrong code. Better naming/more attention could have fixed that though. From a devops side, I think it made a lot of things easier. New developers could get set up quickly (just a single git clone) and all the code for deploys was in the same repo as the application code.

 

Very interesting subject.

Everyone here seems to only see pros for using monorepo.
On my side, though I can understand the pros in a project, in a startup or in a little company, I see a lot of cons in other cases.

I work in a huge company with thousands of developers working on a wide variety of projects. Some of them are just for research while some others are for production.
I can't imagine a monorepo in a company where many languages are used.
How do you deal with your git hooks?
How do you deal with your pipeline?
How do you deal with git tags?

There are a lot of situations where monorepo make the life more complex or even impossible.

Imagine the git tree in a company with hundreds of projects!

I would love having feedback from people having used monorepo "in real life".

 

I know Google uses a monorepo, but how they manage all that complexity is something I can't even imagine...

 

Google doesn't use git, so a lot of the tools and features you might be used to don't work there. That's good, though. Developers and bots make changes to Google's code base at a very fast rate (many changes per second!), so there's no way that a central hook registry would be able to run everything necessary for every change, nor could a central pipeline take care of everything in the repository 😱.

Each directory has the equivalent of hooks (presubmit checks) that are checked using independent microservices (often maintained by independent teams) and must be run before the code is merged in head. There is also infrastructure for watching sub-trees, so tools and pipelines can be triggered when code is updated - often creating a cascade of even more changes!

You can compare such large-scale monorepos to the Internet. It gives us the illusion of having a single domain namespace, but DNS TLDs are responsible for sub-parts of it. Same for IP addresses. In Google's piper, directory trees have their own hooks, ownership and tools. It works beautifully, actually, but there's also a lot of complexity and technical debt to manage since Google isn't that new.

I worked at Google for a number of years and I'm happy to answer questions if you have any. Their source code management practices have been published publicly, so I wouldn't be disclosing anything new. Happy coding!

The point here is having good tooling for a use case. Git isn't the right tool for a monorepo if you want those other features (and most non-small development orgs will probably need). Sorry for all the shoring-up (most, probably). Every use case is different.
This also leads into writing good tools and defining good requirements/use cases but now I'm getting ahead of myself.

Git in this case acts just like a smart file storage. Microsoft and Facebook use git for very large repos and it works for them, for example.

Facebook uses Mercurial. Microsoft did a lot of work in the last year or two to make git work well with big repos.

 

You don't use them the same way. There is a layer of tooling between your local system and the main line. All the hooks, tags, and the like are local to you. Once your commit has passed code review and automated testing, it gets integrated into the main line by the tooling. You never, ever run a merge to master by hand, nor do you have permission to.

 

One of my former colleagues Ivan Moore wrote an excellent post on it

A monorepo helps reduce the cost of software development. It does this in three different ways: by being simpler to use, by providing better discoverability, and by allowing atomicity of updates.

 

Monorepos sidestep the dependency tracking problem. When it comes to code there is no such thing as free lunch. The price of modularity is dependency management.

If you split up the codebase into pieces then you need something to reconstitute the pieces. That something is usually complication in the build scripts to pull in all the right versions to put together the final deployable artifact. I think it makes sense for codebases that are truly modular to break things up and pay the build system overhead but most of the time it's easier to put everything into one place.

I can't think of many cons other than you are forcing potentially unnecessary coordination between different teams. At large enough scale you'll also need to worry about virtual file systems so that you don't pull in parts of the repo you don't need. Fortunately microsoft has open sourced their virtual file system for git so this is a solvable problem when you get big enough.

 

Very nicely argued! I was beginning to lose patience with all the monorepo posts. I suppose monorepo is the next blockchain? 😬

 

Monorepos have some use. Blockchains are pure nonsense.

 

I'll be interested to see what answers you get to this one. One of the projects I work on did the double whammy of moving to git and turning what had been a number of different version controlled projects into a single repository.

For an organisation split into geographicly distinct teams each working in a different part of a large codebase it hasn't felt like a natural fit.

Even using sparse checkouts we keep working round long checkout times, merge conflicts between teems seemingly unrelated to the changes made, timeouts fetching or checking out code and a lot of branches and check-in in the logs that makes it tricky to pick out the thread.

None of these are fatal, all are workable round but I guess it is just a nagging feeling that things are tougher than they need to be.

 

We start monorepo project recently.
our reasons is:

  • Manage multiple apps on a single repo
  • Use single node_module for multiple apps
  • Ability to use shared libraries across apps
  • Ability to build each apps separately
  • Ability to e2e test all apps together
  • Teams can manage their task more efficiently

  • What else i can say 😊 all best things comes together.

 

any guides on use single node modelus on multiple apps?

you just make a folder outside node modules then it magically requires the modules?

 

Lol :))
No brother, I've use NX Workspace for Angular.

Nx generate monorepo style workspace with CLI.

BUT the monorepo design pattern is still the same as other approaches.

See this:

blog.scottlogic.com/2018/02/23/jav...

 

I am using monorepo for two repos. One repo is company wide another is accepting contributions.

Pros are very simplified release model. It makes collaboration easy, it is truly single source of truth.

Cons are there too - no breaking changes allowed, very extensive automated QA for every contribution, continuous delivery and deployment are must, separation of integration from contribution... a lot of infra and tooling around.

 

I'm curious about what others have to say about the pros. So far I'm not really convinced except for bundling back and frontend applications of the same web app in the same repo. :-/

I think as software developers we should be striving for modularization and separation of concern. Having everything in the same repository doesn't look like that to me, and it feels a bit odd that there is that big directory of everything which keeps on growing and growing the more projects you add to the mix. Is there an end in sight? What about pull / push performance?

Directory aesthetics aside, one counterargument against monorepo that crosses my mind might be access control.

Probably depends on the situation but not everyone in the organization needs or should have free access to everything. That's easier to control when repositories are separate.

Also if your company works for various clients and one client requests access to their source code (and change history), maybe because they want to take over or hand it over to another software development company, you cannot just give them access to that one project directory of theirs in that big monorepo. That would be awkward. Possibly also causes a whole bunch of contract breaches. And if the commit history contains cross-references to other client's code and source files due to monolithic monorepo commits... that could also cause awkward situations.

Personally I've dealt with monorepos only at a time where SVN didn't make the vast majority of programmers cringe in disgust. It's interesting that this becomes fashionable again.

 

Very interesting talk about how Google handles a monorepo at stupidly large scale.
Not applicable to every use case (not applicable to a LOT of use cases actually), but it's worth checking out for the concepts behind it.

 

For a normal project, under 5 devs it may be a good approach.

The cons would be:

  • more complex build pipeline - for example if you have the front end and backend, a push in the backend code would trigger a deploy for the front end.
  • harder to split to packages/services
  • impossible build versions/releases

Also I see a pro as "atomic updates". I don't think it is a pro, to get into the projects you do not know, just to update how a library is used. I I think a better approach is to use a dependency by version, and let each user of your project to migrate in its own terms and conditions, and sometimes partially and slowly.

Out of the ordinary: Here is a talk by Google about Googles monorepo

 

I teach three pros to the giant monorepo:

  1. Everything advances in lockstep. You will use the latest version of all libraries when you build. There's no questions around what versions of various libraries went into a build. If you have a commit hash, you have everything.
  2. Developers can move around the codebase easily. The tooling that worked in directory X should also work in directory Y. An improvement in tooling in one place propagates everywhere.
  3. It removes a lot of sense of ownership when your code is just part of a giant pile of code, which can stem or at least limit certain organizational pathologies.

But any of this discussion only really applies to a large repository. Here large refers to at least 500MB of source code, tens of thousands of files. Below that scale there is simply no reason to mess with submodules or anything but a single repository. The pain points you might have from one repository at that size are related to tooling around the repository, such as continuous build or testing. The source control system itself isn't a bottleneck there. Your effort is better spent fixing the tooling, because if it's painful now, it's only going to get worse later. At these scales the only reason to have multiple repositories is to control access to parts of the source code.

Once you get past that size, you pick a direction. If you choose many repositories, you have to build an orchestration system to handle all the interacting versions and getting everyone onto shared tooling can be a lot harder. If you choose a monorepo, you deal with slower and slower operations and lots of hard work on the version control system itself. Both paths work if you put in the engineering effort. Google and Facebook went monorepo. Amazon does multiple repos.

There's an underlying worldview that I need to challenge though. When people start talking about these things, the implicit idea is, "Google/Facebook/Amazon/big company do this, so it must be a good idea." These companies are outliers. Their scale has made working with the systems frankly annoying. What they do is not best practices for anyone without their problems. Giant mining machines have wheels so they can move (measured in meters per minute), but no one would think that the procedures for operating one should be emulated for an articulated bus or tractor trailer.

When someone says, "Yeah, but that's how Google does it," your first response should be, "Why do they do it that way?" In many cases the answer is, "The system got out of control in this particular dimension and it was the only way we could think of to keep it operational."

 
 

git submodules might be an approach to retain the best of both worlds.

 

One thing that I miss somehow more emphasized on the other comments is, that a "monorepo" approach doesn't necessarily mean "Put EVERYTHING you have in one repo".
From what I understood, just put the projects/modules into a monorepo where it makes sense for you.
So even multiple monorepos and mutliple single modules can happily live side by side.
It doesn't have to span across a whole project, but only some dependencies, which is just fine.

 

It's easier in a monorepo to do the cross-subprojects changes.
Imagine the main product (for example a plugin for WordPress) and a bunch of subproducts (addons to that plugin). And you do a refactoring for the main product code, or changes in product require synced changes in subproducts. It's easier in this case to maintain everything - code, issues. Single commit will include all changes - review will be much easier (bigger diff, but easier to see it in the same place). Building artifacts is unified - you won't need to copy pretty much the same build tools/steps (using gulp or npm, for example) in every single repo for every subproduct.

So, imo, monorepo is good for tightly coupled directories of code, that doesn't really work without each other.

 
 

When you want to setup a new project you need to do several things like set up the CI, configure ACL for various things, etc. With a monorepo you only have one repo to configure.

Classic DEV Post from Mar 9

What’s an unpopular software opinion you have?

Please share! ...

Ben Halpern profile image
A Canadian software developer who thinks he’s funny.