DEV Community

Peeter Tomberg for Engineering as a Service

Posted on • Originally published at fvst.dev on

Unlocking the Need for Speed: The Surprising Solution That Supercharges Engineering Departments

We’ve been spending our time helping all the different size Engineering departments, and one standard and easy improvement keeps popping up — CI speed. This article will explore the significance of CI speed and ways to enhance it in any programming language. One of the primary advantages of boosting CI speed is its ability to combat context switching, which can adversely impact productivity. By having a faster CI process, we can cultivate a work environment that views errors as learning opportunities, enables quick recovery from mistakes, and accelerates deployment to production. We will discuss various techniques to improve CI speed, such as caching and utilizing smaller docker images, and offer practical examples of implementing these techniques in Node and other programming languages.

Why does CI speed matter?

The main benefit is to battle context switching. When a developer opens a pull request and then needs to wait 10 minutes for feedback, they switch their focus to something else. It reminds me of an old xkcd comic


https://xkcd.com/303/

But you can always switch to another task! And that’s the problem; you switch the context — it kills productivity.

When you finally come back to the task at hand, at best, all you have to do is merge the PR, but at worse, you have to go back to the branch and make code changes! This requires you to switch your context back to this task.

All my developers are superhuman, and they can handle context-switching!

If that’s the case, there are still benefits to speeding up CI.

Speed is crucial for building an environment that looks at mistakes as learning experiences. You should have a way to recover from any error quickly — if someone pushes something into production that should not be there, they should know they can fix their mistake in 3 minutes instead of 30. Rollbacks are not always an option.

Speed is crucial for pushing more stuff to production. CI is more than just running tests; it’s a pipeline for getting your code off your developer’s machine to a test environment and production. The faster this pipeline is, the more effective the people that rely on this pipeline are.

Speed has monetary value. For example, Github action round-up times to a minute; if your jobs run for 1.01 minutes, you get billed for 2 minutes. Reducing the time by 2 seconds saves you 50% on your CI bill.

So how can we make CI faster?

In this article, I’ll explain the steps you can take to improve speed using Node and the tooling around Node as examples. I’ll review why these examples make things faster and how to apply them to your language and toolset.

Package managers and global caches.

Most projects on Github Actions use actions/setup-node. This action has caching for the global cache built into it, meaning it does not cache node_modules but instead caches the package manager’s global cache. This means whenever your job runs, all your packages are cached in the global cache, and running npm/yarn install copies your packages from the global cache to node_modules.

There are a couple of problems with this approach.

  • Yarn 1 only uses this cache if the internet is not available. https://github.com/yarnpkg/yarn/issues/6398
  • You are doubling the IO operations that are required. First, you write the entire cache to the global cache, and then you copy from the global cache to your local dependencies. Disk is slow when it comes to copying thousands of small files. Circle CI tackles this issue by using a ramdisk, essentially keeping the files in memory instead of writing to disk.

The easiest solution is not to use the global cache and simply cache node_modules using actions/cache. This approach does have some disadvantages:

  • Hooks like post-install are only run once when you install the package for the first time. This can become problematic if you are using packages that write stuff outside of the node_modules folder or do some work based on the state of the code in your code base, which should be run each time a job is started.
  • The cache can become invalid if not purged after updating the node version. This can easily be solved by adding the node version to the cache key.

Tooling and caches

A lot of tooling has built-in support for caching. In the node world, the community agreed upon cache folder is node_modules/.cache/, which some libraries already use, and others can be configured to use.

P.S. Make sure your toolchain is optimized for the CI environment. For example, many setups use the max-workers=50% option to run Jest. Since most Github action workers have 2 CPU cores, you’re artificially limiting the test suite to run on a single core, reducing your speed significantly.

If you use docker, ensure your images are as small as possible.

Having smaller docker images speeds up your infrastructure quite a bit.

  • Smaller images are faster to upload to a registry, speeding up the CI flow.
  • Smaller images are faster to download from a registry, speeding up redeploy/scale operations. This is highly noticeable if you’re doing multi-region deploys.
  • Smaller images cost less, both from a storage point of view and a bandwidth point of view.

The first step to making your docker images smaller is to use an appropriate base image. I prefer alpine images, and most languages have base images based on alpine.

The second step is to use multi-stage builds to separate the build process from the final image. The flow I usually use is:

  • In the build stage, I install all dependencies and run the build. I then purge all devDependencies to reduce the size of the node_modules folder — these dependencies should not be required to run the application.
  • In a cleanup stage, I copy over node_modules from the build stage and purge extra files like markdown files, test suites, etc. A helpful library for this is node-prune. Since this is a GO project, the cleanup stage usually uses a GO base image.
  • In the final stage, I copy over the built application from the build stage and the node_modules from the cleanup stage.

To validate that I don’t have any wasted space, I love the utility https://github.com/wagoodman/dive, which shows me how much space each layer takes and what it adds.

P.S. Make sure you use a dockerignore file; it helps with speed, size, and security!

Take a look at your workflows.

To save time, a lot of CI workflows are run in parallel — one job to run the test suite, one job to run static analysis, and one test suite to run the linting. After adding caching, a common occurrence is that checking out and setting up the environment on the task runner takes longer than the actual work. So instead of having three jobs in parallel, having a single job that does it all is usually more effective. If a single job takes less than a minute instead of 3 jobs taking less than a minute, you’ve effectively cut your CI bill by 66%.

Think through if you need to run all tasks on a main branch. If you enforce all branches to be up to date with the main branch, running the test suite on the main branch doesn’t add anything. If you want static analysis on the main branch, move that job to a scheduled job instead of running on each commit to the main branch.

Final words

With all these steps, you can significantly improve the performance of your Engineering department. Make sure you’re utilizing all the resources of the task runner, reduce the amount of disk writes you do, use ramdisk if possible for IO, and make sure your docker containers are small!


Top comments (0)