I’m a software engineer and want to tell you how my team at LOGIBALL moved from one Svn repository and multiple Git repositories into one Git based monorepo which follows the expanding/contracting style that Google haves. The used build tool previous and now is Maven.
We used Subversion (Svn henceforth) for ten years for multiple products and projects. Another team was using Git for a year up to this moment. Because of that team’s positive experience, Git was selected as our next company-wide VCS.
My team develops a bunch of ‘geo’ related services (mapping, geocoding, routing) and for a few months some Vaadin-based clients. Vaadin is a framework to create rich Internet applications. The clients we build are highly depend on customer requests which have a big variety in how and what to show in maps or on routing results . Each of our Vaadin clients was based on a number of shared libraries that we had made. Each client and each library has its own Git repository, in the classic style for Git organizations.
The geo services developed in the Svn were in the classic trunk, branches, tags (TBT) layout. All services and libraries are were within the same trunk as a separate directory. In both cases, clients and services, the sharing of libraries was done at a binary level and for released libraries only which were distributed via an Archiva installation (on-prem).
Before recommending a monorepo layout I thought about the alternate repo design popular in the Microservices1 scene. One deployable unit (library or service) in each repository. But my team didn’t like the idea of handling a bunch of small repositories. We tried it anyway in the client development. The result: we also didn’t like it.
In order to do a smooth migration, I made a detailed proposal for my team, with a aim of getting everyone agreement before pushing ahead with it. This proposal described the pro and cons to move to a monorepo or to a multiple repositories implementation. During the discussion about this, we also thought about two repositories instead of a monorepo. One for the service and one for the client development. But in that case we must also share binaries between them, because there are some libraries which are used by the services and the client. Then we have the cons of two worlds.
When we decided to go with the monorepo layout, I created a second proposal. This one was about the directory layout within the monorepo. This was an important topic because of the warning of the risk of chaotic directory layout (see inline callout with that title). I tried to create a generic layout that makes it possible to add new components, also in different languages, easy. In my opinion also the resulted paths must be readable and help to navigate thru the repository. That proposal was also accepted.
My experience is that the collective decision about a change like this is very important. Without that participation towards the decision, the risk of team rejection was very large.
The following points are our main goals:
- Get rid of repetitive Maven configuration
- Delete old Git repositories for the client development
- Get rid of Svn and switch to Git
- Share code at source level and get rid of Maven Snapshots
- Implement Trunk-Based Development
- Atomic commits
- Every service and client must independently releasable
- It must possible to safely work on a subset of the monorepo (git sparse checkout) at a moment in time
- Every service/client should be able to be built independently (from scratch) with all of its dependencies without depending on a binary repository
For the moment of the migration itself there were no hard constraints. We may not be “release ready” during the migration and it was OK if no repository is accessible for one or two days. The only real constraint was that I had to carry out the migration myself, which meant I did a modest amount of overtime.
We using Short-Lived Feature Branches variant of Trunk-Based Development in Bitbucket Server. Our workflow is this:
- Create branch for user story/feature, with the intention of the branch living for a day
- One developer (or a pair) working on the feature
- Merge from master just before create pull request
- Create pull request for review
- After reviewers accept and the CI was successful the branch will auto-merged with the master
- If the merge successful, the branch will be deleted automatically.
The biggest challenge was to determine how to release every component separately from a Maven based monorepo. Every component have it’s own version number and release time. Because of this problem I contacted Paul and asked for help. And he helped me to work it out. The solution is to create a release branch which contains only one service or webapp and it’s needed libraries. That is created on a just in time basis from the master merged, and then lock the majority of developers out of that branch, only allowing bug fixes in via a cherry-pick mechanism.
There also was some smaller challenges. They all popped up after the the migration:
- The team works mainly with Windows. Getting the
mr/checkout.shtechnology working with Git for Windows.
- After moving the webapps of webapps in the monorepo the build times exploded (from 2 to 25 minutes). There were two reasons:
- I’m not so into the client development. And because of that I didn’t know that my teammates creates a webapp-playground for every client-library, which was also build in the CI. An non-default maven profile helps to ignore them during CI.
- In the past we have a CI job for every component. So the build time are always small (< 2 minutes). With the maven parameter –threads I got them back down. After we resolved these two issues we are at ~5 minutes per build.
- After skipping installation of artifacts in the local maven repository the goal jetty:run would not work anymore. Because of that we currently don’t skip the installation. But it’s an open issue for us on which we will work to resolve.
The first step of the migration was to implement the defined directory layout. We achieved this in Svn a few weeks before the migration to Git. Moving from Svn to Git was not that hard because of the git svn command. The hardest part was to create the authors.txt file containing a mapping between the Svn and Git users for all developers for that call:
git svn clone --stdlayout --authors-file=authors.txt http://svn.example.com/repository monorepo
It needs about three hours in our case. The duration depends highly on the size, number of commits, branches and tags. After git svn clone finished some cleanup were needed:
$ cp -Rf .git/refs/remotes/origin/tags/* .git/refs/tags/ $ rm -Rf .git/refs/remotes/origin/tags $ cp -Rf .git/refs/remotes/origin/* .git/refs/heads/ $ rm -Rf .git/refs/remotes/origin $ git branch -d trunk # Delete all tags which not ends with an valid version like '-1.12.3' $ git tag -d `git tag | grep -v '\-[0-9]*\.[0-9]*\.[0-9]*$'` # Set remote origin $ git remote add origin git@my-git-server:myrepository.git $ git push origin --all $ git push origin --tags
As I say we did this in one three-hour go, but we always had the choice to do in a number of phases. Say three one-hour goes (by using the ranges of change-list numbers).
Next step was to share code on source level. With maven this is not really the case. Code is shared by jars which are created during one build run. So the installation in the local maven repository and deploying into remote maven repository is deactivated. Jars are used from their target/ folders. All modules are now defined in one tree and have the version
HEAD-SNAPSHOT. To get rid of the repetitive Maven configuration our root and master pom.xml must be merged. The master pom.xml contains globally configuration like the company name and Maven repositories to deploy snapshots and releases. And the root pom.xml was to collect all of our services and libraries projects to build them in one call. My team wants this in the past. I think this file shows that my team wants to work in an monorepo.
The fifth step was the integration of the existing Git repositories including the history into the monorepo. For that I checked out every Git repository and created a branch named monorepo-integration locally. In that branch the sources were moved to the new directory layout. Then I added it as an remote repository to the alos local checked out monorepo.
# In old Git repository (webapp-one in this example) $ git checkout -b monorepo-integration $ mkdir -p component/webapp/webapp-one $ mv . component/webapp/webapp-one # In monorepo directory $ git remote add webapp-one ../old-git-repositories/webapp-one/ $ git fetch webapp-one $ git merge --allow-unrelated-histories webapp-one/monorepo-integration
With the sixth step the initial monrepo tooling was introduced. That are the two scripts mr/checkout.sh and mr/release. The
mr/checkout.sh script was developed by Paul and is used to realize sparse-checkouts. It is described in the blog entry Maven In A Google Style Monorepo. The
mr/release script allows it to create and update release branches.
The last step was to activate CI. We using Jenkins and configured a Multibranch-Pipeline Job to observe the status of the master and feature branches. Additionally for every release branch a Multibranch-Pipeline Jobe is configured to create releases. For more details see the chapter Status.
Our migration steps, again:
- Define directory structure for the monorepo.
- Migrate service repository from Svn to Git
- Share code on source level
- Merge root and master
- The root
pom.xmlonly was used to collect all modules together so that the team can checkout the service trunk and build all services directly
- The master
pom.xmlcontains global configuration like the default JDK version or the version of widely used maven plugins (like the maven compiler plugin)
- The root
- Move other Git repositories into this monorepo one
- There were twelve Git repositories for the webapps
- Note we checked in HEAD revision here, and history for those remains in the old (now read-only) repos.
- Introduce initial monorepo tooling.
- This is mainly the
mr/checkout.shscript Paul developed, but modified some more to manage the expanding/contracting checkouts (sparse-checkouts in Git)
mr/releasescript is for creating release branches and perform releases.
- This is mainly the
- Activate CI (Jenkins with agents, Multibranch-Pipeline Job for master and feature branches, Pipeline Jobs for release branches)
- Use Feature Flags/Toggles and Branch by Abstraction for longer to implement changes
- Training required for the development teams
- Change to directed graph build system (Buck or Bazel), from Maven
- February: Maybe we should learn more about microservices and bought the book Building Microservices by Sam Newman.
- 17th March: First thought about monorepo’s after reading Sam’s Twitter conversation with Paul
- March - May: Discuss this topic with teammates and colleagues of another team. Also read articles.
- 22th May: Team decides to go with monorepo
- 24th May: Team decides the directory layout and implemented it in Svn
- June - July: Work out a migration path to monorepo and testing it (all in addition to our normal business deliverables commitments)
- 17th to 31th July: Methodical migration from Svn to Git
- 1th to 3th August: Migration of the Vaadin clients Git repositories into the monorepo too
Since changing to a monorepo never the complete 4 person team is working completely because of vacation and working on other projects. So the following numbers will change.
- ~12 commits/day
- ~1 pull requests/day (feature branch to master)
- ~10 builds/day
Jenkins is our CI daemon. We using one Jenkins instance with multiple virtual and physical agents company wide. My team is currently using only one agent to observe one master and between one to four feature branches. There is no need to expand it now.
I’ve uploaded a skeleton version of our monorepo to GitHub for people to use as they see fit. None of our production sources, of course, but the tech to do the expansion contraction is there, and ready to use. If you have any questions about the example monorepo please open an issue on GitHub.
- Via Sam Newman’s Twitter feed I read this and then researched more on monorepo’s which brought me back to Paul’s proof of concept work. ↩