Do you know that one web service you have that builds multiple executables, a database migration script, and downloads a million libraries?
You wait an hour just to watch it get to 80% when it fails. For the third time by now, which means you have to make more pipeline modifications. And of course, wait another hour to see if that worked.
Oh, and think about all those poor developers waiting on their Pull Request build to finish before merging to master.
As a DevOps consultant, I spend much of my life waiting on builds. I see them in all flavors, shapes, and sizes. It’s not uncommon to catch me sitting there with a 1000 yards stare, after hitting that build button for the 100th time that day, expecting a different result.
What they don’t tell you before you join this business...
I once joked that my spiritual name would be “The one who stares at build logs.”
Anyway, there are things you can do to make your and your developer’s lives easier by making your build pipeline faster. Every minute you reduce, increase the development cadence, and reduces resource cost. Yes, and your mental stability.
So here is a list of things you can do to speed overall pipeline runtime:
Downloading modules at build time takes a significant portion of a build. Whether you are using NPM / Maven / Gradle / PIP, dependencies tend to get bloated, and you pay for it in wait time.
You can use caching to speed things up, instead of starting from scratch on every build.
Back in the old days, before everyone started to use Docker-based agents for their build (Jenkins / Gitlab / Whatever you cool kids use today), this wasn’t always a problem. The builds sometimes shared these libraries, such as a shared .m2 folder for Maven, and the first build to introduce a library took on the download time.
That introduced other issues, such as conflicting versions, and race conditions when running multiple builds at the same time, but that is a story for another time.
There are several solutions you can use when using Dockerized based build environments:
- Use a shared volume that includes the cache and attach it to the container.
- Pre-build “build images” that include all the third-party libraries.
- Cache third party packages in a local repository such as Nexus / Artifactory.
One other thing that I would recommend is to lock in versions of your dependencies. Not only will you save time by not downloading new versions, but it will also help you avoid conflicts.
“We only run tiny microservices at our company,” - Said no one, ever.
If you are building actual microservices that take just a few minutes to compile, then I’m proud of you. You are one of a select few that got what microservices are all about, and was able to pull it off.
The rest of us may still need to live with not so microservices, sometimes monolithic applications that may take much time to compile and test.
Even if you are using microservices, sometimes a Mono Repo makes sense, and then you have the problem of building everything, even if only just one module changed.
In this case, the solution is straight forward but now always simple - Build only the modules that are relevant for that commit.
Angular is an excellent example of a framework that has a nifty little tool, Nx, that helps you only to build modules where files changed while respecting dependencies if needed.
Given two git commits, Nx calculates the changes and then outputs the affected modules for you to build.
With Maven, you will need to do some of the heavy lifting yourself, but using the dependency tree and other tricks allow you to calculate what to build.
When possible, use parallel processing.
For the build phase, Maven allows you to pass the -T flag to specify the number of threads available for the build. In Gradle, it’s the —parallel-threads flag.
If you have the resources, use them.
Another phase that can benefit from parallelization is unit testing. These are usually individual, well-scoped tests that should have no problem running alongside each other. In Ruby, you can use gems like knapsack_pro, in JUnit, set the parallel parameter -
<configuration> <parallel>all</parallel> </configuration>
Sometimes it’s just a matter of underpowered build machines.
More CPUs means more threads.
More memory, well, gives your jobs more memory.
It’s perfectly ok to try and save money on your DevOps infrastructure, but make sure that you fully understand how it affects the bigger picture. Faster builds, or more builds in parallel, makes for quicker development and delivery.
This is especially true now that you can have auto-scaled / on-demand workers that don’t have to be up 24/7. Take some of that cost-saving and move it towards faster builds.
Starting up and connecting to third-party services during unit tests is, in most cases, redundant, and wrong. You can use mockups to simulate a connection to these services and run the tests against them.
For unit tests, you don’t need an actual Redis service, but instead, use a mockup. You are testing your own code, not the Redis driver.
Oh, and using mockups requires fewer resources to run and administer, which is always a bonus.
Time goes on, and code keeps piling in. We, developers, and engineers tend to be hoarders, and as much fun as it is to delete old code, we usually avoid doing it.
Tests take time, and running an unnecessary test is just a waste.
So go ahead and do some deleting.
Speaking of unit tests, make sure that your unit tests are just that.
It’s common to see unit tests that cover more than just a small piece of code, but the whole application, and at times, interaction with other external components.
Take a close look at your unit tests, and see if one is, in fact, an integration test. If this is the case, try moving this test further down the pipeline.
While you still run the test in a later phase, if you are bound to fail in an earlier stage, it’s better to fail faster.
This one tip can shave off a few minutes in the deployment phase, and if you haven’t done it already, it’s well worth your time optimizing.
When you have some sort of load balancing or “service” layer on top of your application, you usually find a health check mechanism to tell which instance/pod/container is ready to accept traffic.
It goes like this:
- An instance running the application is added to the load balancer, or a pod launches in a deployment.
- The load balancer starts polling the app with the configured health check. There is usually a grace period before starting to poll, to let the app warm up.
- If the application is healthy for enough polling requests, it is considered up, and the application is ready to accept traffic. Usually, there is a configurable threshold for the number of successful poll requests.
- As the application lives, the load balancer keeps polling it, and if the health check fails enough times, again, with a different threshold, the application is taken out of the pool.
- It goes back to the pool if it passes the same number of polls as in #3.
These settings are usually generic and, in many cases, remain with default values.
In your scenario, it may not be necessary to wait 5 minutes (just an example) for the application, as it shouldn’t take more than 1 minute for it to be ready.
Here are some of the things you may want to go over and tweak:
- Make sure that your health check reflects the actual status of your application.
- Give your applications time to start by setting the delay.
- Tweak the timeouts, intervals, and thresholds for deciding when the app is down or when it’s up and ready to accept traffic.
In Kubernetes, you can find two layers of checks - The liveness check, that determines whether to kill the pod and the readiness check tells the service when to add the pod to the active endpoint list. A pod can be live, but no ready.
A word of caution - Make sure that when making these changes, they make sense for production, and don’t cause issues such as flapping.
Don’t waste time on updates and system installations - keep your images fresh and up to date.
Consider running a daily task that builds the base images used by your applications.
The same goes, for instance, AMIs and other VM images.
You gain two things from that:
- You control these updates before things go to production, which gives you another safety net. You want to fail earlier in your pipeline and not when auto-scaling is launching new instances.
- Builds and deployments run faster as there are fewer updates to install.