DEV Community

Cover image for Monorepo and Trunk Based Development Using NX, GitHub Actions, and Argo CD
Bogdan Polovko for BetterHealthcare

Posted on

Monorepo and Trunk Based Development Using NX, GitHub Actions, and Argo CD

At BetterHealthcare we have a highly performant system that is built using Microservices architecture and is deployed on Kubernetes. As a small team, we have run into the issue, as the number of services was growing, it was much more costly and difficult to manage dependencies and deployments and significantly slowed down our sprint velocity. The system that we had in place was using a Multi-Branch, Multi-Repo structure, where each application was using separate GitHub repo and each repo had 3 branches — development, staging, and production. You can see the inefficiency of this system. If we needed to update and deploy all 40 services to production, we would need to create 120 PRs. The biggest inefficiency was that the code was rebuilt on each branch. Because we had to rebuild the image each time, we were not doing image promotion and had inconsistent deployments across the envs, which means 120 PRs = 120 Jenkins jobs and 40 pipelines. Not viable!

Idea

To fix this problem and make us much more effective and agile as a small team, we have created a set of requirements that our system will need to have and focused on 3 main components:

  • Migrate to monorepo where both backend and front-end apps could be stored and managed

  • Migrate from GitFlow to a Trunk Based Development

  • Have one single build pipeline that will be heavily optimized (harder to build, but easier to manage) where code is built once and could be promoted to any environment without rebuilding

For the monorepo management tool, we have evaluated multiple tools but decided to go with NX as it suited us best. For CI we stopped on GitHub Actions, and to manage the CD part we ended up using Argo CD.
So I came up with something like this:

Image description

And more detailed:

Image description

To be able to achieve this we had top go through some cultural change. All features needed to be ready to be rolled out to production at any time. That required heavy use of feature flags by the frontend, and more discipline on the backend as well as the use of the feature flags. Developer behavior had to be taken into the account during the design phase to eliminate any possibility that wrong code can get deployed to production.

Implementation

  1. I have created a template and migrated the first app, and added the scripts that were the most critical — build, docker, test, serve, build-schema (we use GraphQL Managed Federation), docker-retag.
    Image description

  2. Built the workflows — test-and-build, main-build (development), promote-staging, promote-production
    Image description

  3. Tested and optimized workflows (pnpm turned out to be one of the best tools we could use to achieve fast installations of all packages becuase we now had to store both front-end and backend packages in the same repo)
    Image description

  4. In parallel DevOps have prepared a repo for all the k8s manifests and helm templates, and setup the ArgoCD on our clusters

  5. We deployed first application
    Image description

  6. After testing and validations we have started gradually moving services to monorepo and abandoning previous repositories
    Image description

The builds turned out to be super performant and very fast:
Image description

Promotion to staging and production was even faster:
Image description

  1. The best part of this is that now a production release could happen at any time with a single action of creating an actual release in GitHub repo. The changelog is autogenerated and JIRA is automated to track all the code that was deployed. Image description

In Argo CD the development and staging environment, we have enabled auto-sync and on production, we can still sync in batches to have more control where in the future we are planning to configure application sets and enable autosync to optimize it further.

Conslusion

The top performance was achieved by using the nx affected feature, which allowed us to always build only what is affected by the commit. pnpm cache, docker-buildx, and using cache have also improved it a lot. In numbers: previous development/staging/production deployment was taking 10 minutes on average per app, current setup allows to deploy in 3 minutes per app + another 25–30 seconds per additional application that is being deployed at the same run on development. For staging/production promotion that time has decreased from around 10 minutes on average to 2 minutes + 10–13 seconds per additional app that is being promoted. Which is a significant improvement in efficiency, speed, and quality! With the old setup if I needed to update packages in all 40 services that would take me at least 6.66 hours to deploy, not taking into account all the test runs, code pushes, and PR creation. With the current setup, that could be easily done in under 30 minutes. Further improvements like esbuild, direct pushes to the GitOps repo instead of auto merging PRs, more automation with JIRA, and more developer experience tooling will significantly speed up the development process!

Top comments (0)