loading...

Our experience: Monorepo with Java, Maven and GitHub Actions, including basic example

kgunnerud profile image Kenneth Gunnerud ・9 min read

Example code can be found here.

Monorepos has come up from time to time in discussions, especially since many of the big companies use this kind of technique to structure their code. The last project I was at, is by far not that big and we also decided to not take it as extreme as many others have done. This will be a post about how we did it and how it worked for us / what pitfalls we experienced.

First of, we changed to monorepo while the organization was moving code slowly over to GitHub, at the same time, GitHub announced GitHub Actions and GitHub Package Registry. Before this, we had a typical BitBucket + Jenkins (per team) + Nexus all behind Citrix and inaccessible from the internet. The organization itself has some hundreds of developers, but as every team has a lot of freedom when it comes to these choices, we decided to try monorepo.

We only structure OUR product into monorepo, not the whole organization. So this was not an organizational choice, but a team choice.

Reasons:

  • Atomic changes. Pull-requests with changes many places, e.g. contract + producer + consumer + documentation. Before we had to do 4x pull-requests and usually days between which made people have to go back and forth between what was done in the contract + producer when taking QA of consumer code then verify documentation.
  • Easier to search the code base for examples.
  • Common code / reuse. Very easy to create common code in monorepo, just make sure to have good conventions. E.g. we used Spring Boot's Auto Configuration in our common libraries and did not make too many new Maven modules, but bundled many "features" into that common library but with transitive dependencies as provided. I'll add an example of this in the GitHub example.
  • Management of dependencies, e.g. we were very invested in Spring Boot, so new updates usually only happened to applications that had changes, not those "stale" applications. With monorepo, we only updated the parent pom and triggered a full redeploy (usually manually since we don't have that many apps). This made it easy to have the latest and most up to date dependencies all the time.
  • Changes could be tracked across applications to same commits. Made it easier in fault situations where multiple applications had been changed as part of the same feature.
  • Large scale refactoring. When we decided to change formatting conventions to IntelliJ default, we did it for the whole product at once. We did other refactoring as we went along, but none that hit that hard in many applications and libraries.
  • One place for code, tools, contracts and documentation. Yes, we even moved our documentation from Confluence to GitHub with Asciidoc and Asciidoc Maven Plugin to update a GitHub Pages site whenever the documentation was changed. We had to get our non-technical people to learn Asciidoc, they catched on easier than anticipated.

Our situation:

  • Mostly an independent project in the beginning, so few had dependencies to us. But the end result is a product that basically everyone in our organization has to consume.
  • Max 12 developers (I think it was).
  • About 20 different applications within our product.
  • Very autonomous with a high degree of freedom for our team.
  • Everything on Kubernetes and everyone could deploy to production.
  • Trunk-based development. PR to master, deploy to dev and prod if tests go green, can optionally go to just dev if you are on a branch with name dev/*.
  • We don't share our Java POJO's between other teams since it creates a dependency on all fields in the contract, not just the ones you use. So it's up to the consumers to implement they're own POJO's with just the fields they require. Following tolerant reader pattern, only fail on breaking changes in things you are dependent on. This made us just make a module for all our internal contracts (contract-json / contract-avro) but it will add a bit of an overhead in the build if they get very big. Internally we share our own contracts since all contract changes affects producers and consumers anyway and triggers redeploys.
  • We changed our test strategy from tests and traditional test pyramid where we had unit-tests on class level, component tests that started the context and integration tests for one inspired by Spotify. Reason for this was that since we were iterating rather fast, we often changed code / refactor, this always leads to people pulling their hair on existing unit-tests, while we now moved to tests more on the level of: input + state = output. E.g. Given, when, then on a more functional level since this was our use-cases within every application. This worked great for us, especially to increase our iteration speed since most changes did not change the functionality, just added more or refactored the code.
  • We removed usage of external components in our tests, e.g. Embedded Kafka was often used, but this took a lot of time to run and was often error prune (race conditions e.g., we could have fixed it, but we mostly did not get anything out of these tests except for testing Spring Kafka (we just hit the same listener in our regular more functional oriented tests anyway, so only difference was if it was Spring Kafka that did the call to our listener or the test itself)). There was also more need for these kinds of tests when we were new to Kafka, but as we came to be more familiar, these tests never catched faults that would happen in the environment. I wish we didn't add Embedded Kafka to all our apps but just one or two for learning purposes, but you learn as you go.
  • Almost no manual test (there was some on the front-end).

Tips:

  • Check out tooling before starting with monorepo! E.g. you don't wish to build the whole project at every change. GitHub Actions supports scoped workflows (e.g. path: apps/app1/*). This saved us, but I have seen people create their own shell scripts that check the git commit log.
  • Use codeowners file to automatically assign people with specific domain/application knowledge. Codeowners can be scoped to path, e.g.: /apps/app1/* @GitHubUser1. See: GitHub Doc. Also example in the code provided.
  • Decide on some conventions early, codestyle, formatting, monorepo convention, testing, etc.. We went pretty heavy on Spring Boot and some basic conventions, this made it easier for anyone to jump into any application, even when Spring Boot was too heavy in some cases but this made all code and applications understandable for everybody (of course, had to learn what the app did, but the style, the libraries, e.g. was the same).
  • I've heard Gradle might be better for Monorepos, Bazel definitely is but that was too big of a leap for us at that time.
  • If you want integration tests, maybe they can be more of the common nature? Often you use an abstraction on top, e.g. Hibernate, Spring Kafka, Kafka-Client, so instead of re-testing these libraries in every application, you could make one test module per technology using e.g. Testcontainers to test this if you don't feel comfortable not having these tests, e.g. test/kafka-integration module. These would not have to run on every application change, but maybe more ad-hoc or when upgrading 3. party dependencies such as Spring Kafka. I won't cover this here.

Let's get to it:

First decide a convention for the monorepo. In this example, the format will be:

root
 .github
 .tools
 apps
   bar
   baz
   foo
 docs
 libs
   utils-common
   contracts-json

.github

Contains the workflows for GitHub Action and a file for dependabot. Dependabot is awesome.

Explanations of workflows:
We had one per app, I think this can be shortened down, but this works for us.

To build, we use: mvn clean package --projects :bar --also-make --threads=2 --batch-mode
This means, run this workflow for project :bar (only that app), one commit can trigger many workflows, e.g. changes to foo and bar will trigger workflow for both, in parallel.
Next, --also-make or -am is to build all the dependencies you have within your project, so this will build common-utils and contract-json, but not docs or other modules you might have in libs. See documentation here.

We went for threads 2 because at the moment, that's how many CPU cores you have on GitHub Actions.

Batch-mode is to not print every KB downloaded in console when downloading dependencies.

.tools

Contains different tools for your product, e.g. we had a lot of IntelliJ .http files to call our API's. These were also used as examples to other teams on how to call our APIs but mostly for our own usage.

apps

All applications. I have seen examples where people have used components or services etc, but for us, apps were what we went for.

Every app is dependent on libs/common-utils that contains some code that will be enabled/disabled based on what's on class path and properties (through Spring Auto Config).

docs

Contains all the documentation, internal and external documentation. We even included postmortems but had them in another folder that was not published to GitHub Pages and since our repo was private, only we could access them.

We published using gh-pages branch, but it seems this is not accessible to every repository. Read here. The example is still put up.

libs

All the common libraries.

We had multiple common libraries here, e.g. we had contracts that were only relevant for a few applications. These contracts we got as XSD from external providers. So to avoid having all our apps have to build these contracts (if we had put them in a common module), we made own modules for them that were only included as dependencies in the apps that needed them.

Note

We had a module name test, that was for: applications just relevant for tests (e.g. application that creates Kafka topics before all the other apps start when running locally using docker-compose), possibility for integration-tests (as mentioned above, but we never used it for that but we could / will when the need arises).

We also had multiple other . folders, like .docker for all files to run the parts or the whole repo using docker and docker-compose.

Our experience

After migrating most of our code base to monorepo, developers often hated when they had to work on non-monorepo applications. The developer experience was just, for us, better and easier. This is probably not only because of the new structure on our code, but GitHub Actions has solved many of the negative tooling problems that monorepos often have.

There was a high degree of learning and culture that had to be changed, especially since while we were migrating we could not put our regular development on pause which basically split the team somewhat.

We migrated app by app and this was a bit cumbersome for some parts, for example, we had some common libraries or contracts that were on Nexus and still highly used. So when we copied these to the monorepo, we basically had 2 places to maintain them (since some apps still depended on the ones in Nexus and migrated apps on those in monorepo). This could have been solved more elegantly if all dependencies were accessible from the internet, but the Nexus we had was on our internal internet. If it was accessible, we could add a settings.xml file to resolve the dependencies in monorepo or the other way around. If monorepo would be master, we would have to publish the dependencies to GitHub Package Registry. Development on those apps that depended on Nexus was also done behind Citrix, so many walls to climb. We ended up duplicating for a while and it worked fine, but an extra inconvenience.

Overall, the decision to migrate to monorepo was the right one given our situation and the team is happier for it. But many things had to be changed for it to work. E.g. our test strategy, it would have worked not changing it, but then it would be tons of Maven modules if we did it the same way as before.

Appendix

Discussion

pic
Editor guide
Collapse
tomb85 profile image
tomb85

Great article, one thing I did not get - what ensures that when one of the libs changes all dependant apps will be rebuilt?

Collapse
kgunnerud profile image
Kenneth Gunnerud Author

Good question, nothing :O. We usually made 1 change in every app we wanted to update. This was due to sometimes, we wanted a more controlled rollout. For example: app1 is critical, app2 was not, then we made a change (usually in a file called buildtrigger) in app2 that was rolled out in test and production, if all went well, we changed a line in buildtrigger for app1 as well.

For all practical purposes, we usually deployed 10 of the non critical apps on a dependency upgrade, if that went fine, we deployed the rest that was customer facing. But guess this depends on your requirements.
Another thing we also did sometimes, was create a new branch to deploy everything to dev, e.g.: dev/dependency-upgrades which triggered build and deploys only to dev environments. This worked due to a condition on prod deploy (only deploy master to prod). Then we deployed all the apps (including critical ones). If that went fine, we merged to master which re-triggered a deploy to dev + prod (so dev got 2x deploys in our case, one from branch dev/* and one from master). When we merged to master, we usually squashed since sometimes, we ended with some commits to fix some errors that occurred when updating.

One option, would be to create a workflow for root maven pom file that again uses github action workflow or repository dispatch to trigger the individual workflows.

Since we used a lot of spring boot, the dependency updates was not that frequent that it was a big problem for us, but as the project expands to more than our size I can see that alternatives to our solution is wanted.