DEV Community

Cover image for 🦊 GitLab CI: The Majestic Single Server Runner
Benoit COUETIL πŸ’« for Zenika

Posted on • Updated on

🦊 GitLab CI: The Majestic Single Server Runner

Initial thoughts

While many believe that autoscaling runners are the ultimate solution for improving pipeline speed, the reality is that a single server of reasonable size can outperform an entire swarm of EC2 or Kubernetes nodes in most day-to-day scenarios.

This article focuses on benchmark; for deployment details, see GitLab CI: Deploy a Majestic Single Server Runner on AWS.

Remember the blog post The Majestic Monolith? It advocates simplicity in architecture as long as possible.

Drawing parallels between the monolith vs. microservices paradigm and the single server runner vs. autoscaling runner paradigm, we find wisdom in Hansson's words:

The [autoscaling runner] patterns that make sense for organizations orders of magnitude larger than yours, are often the exact opposite ones that’ll make sense for you. It’s the essence of cargo culting. If I dance like these behemoths, surely I too will grow into one. I’m sorry, but that’s just not how the tango goes.

The problem with prematurely turning your [single server runner into an autoscaled runner] is chiefly that it violates the #1 rule of distribute computing: Don’t distribute your computing! At least if you can in any way avoid it.

In this article, we advocate for maintaining a single server runner for an extended period before considering an autoscaling setup. Our aim is to explore the core of CI runners and maximize the potential of GitLab's simplest runner architecture. Join us as we venture into uncharted territory, where few GitLab articles have dared to tread !

Buckle up your seatbelt

We assume you already know the basics of GitLab runner topologies ; if not, you can read GitLab Runners Topologies: Pros and Cons.

1. Are autoscaling runners better?

Obviously, any starting project will have a fast pipeline for some time, if we follow general advice like those described in GitLab CI Optimization: 15+ Tips for Faster Pipelines. And if our runner power is at least decent.

We will then be discussing the benefits of autoscaling runners in the context of large pipelines. These pipelines typically involve mono-repos, numerous automated tests and checks, and/or a substantial code base.

Autoscaling runners have virtually unlimited CPU and RAM, which might make them seem like a holy grail. However, their advantages come with a trade-off: cache handling. Autoscaling runners are stateless by nature, meaning they operate without maintaining persistent data locally. They start and stop a few minutes later. This poses challenges when it comes to caching, which is often essential for achieving fast pipelines. Especially for tasks that greatly benefit from caching, such as Java or JavaScript jobs.

Moving forward, we will consider pipelines where caching is an important aspect. It is an area where a powerful single server runner may compete with autoscaling runners. Let's focus in this article on a JavaScript pipeline example, as node_modules are well-known for being space-intensive (and cacheable).

node_modules vs ai

node_modules vs universe

2. Advantages of a single server GitLab runner

Here are the main advantages of a single server GitLab runner:

  • Simplicity and control: Managing a single-server Runner is simpler and requires less initial setup compared to setting up a Kubernetes cluster.

  • Custom environments: You can configure the single-server Runner to match your specific environment.

  • Cost-efficiency: A single-server Runner may be more cost-effective than maintaining a Kubernetes cluster, as the latter comes with additional infrastructure overhead.

  • Out-of-the-box Docker cache: Due to the single Docker host nature, single-server runners offer built-in job container image caching. We can use DockerHub free tier when if-not-present pull policy is configured.

3. Traditional cache tasks sequence

We already described the cache sequence in GitLab Runners Topologies: Pros and Cons. Let's build on what we said over there.

Cache is a significant topic for CI/CD Engineers, one that can consume a tremendous amount of time when fine-tuning for performance.

To find optimizations, it's important to have a clear understanding of the entire caching process in GitLab. In this case, we assume that all jobs are using cache and performing cache pull and push operations.

When the remote cache is not configured, the process is as follows:

  • Unzip the local cache from past jobs/pipelines (if any).
  • Run the job scripts.
  • Zip the produced cache for future jobs/pipelines.

When the remote cache is configured, the process is as follows:

  • Download the remote cache from past jobs/pipelines (if any).
  • Unzip the cache (if downloaded).
  • Run the job scripts.
  • Zip the produced cache if changed.
  • Upload the produced cache if changed, for future jobs/pipelines.

Without autoscaling, when the infrastructure is undersized, some jobs may have to wait for available hardware resources.

On autoscaling infrastructure, when the current resources are not sufficient, some jobs may have to wait for new servers to spin up.

The official documentation provides straightforward guidance. To ensure efficient cache usage with runners, you should consider one of the following approaches:

  • Use a single runner for all your jobs, or at least for similar ones.
  • Use multiple runners with distributed caching, where the cache is stored in S3 buckets. Shared runners on follow this approach. These runners can be in autoscale mode, but it's not mandatory.
  • Use multiple runners with the same architecture and have them share a common network-mounted directory for storing the cache. This directory should use NFS or a similar solution. These runners may be in autoscale mode.

(masterpiece, best quality), <lora:add_detail:.5>, sharp, intricate details, ((mechanical humanoid orange fox)), muscular, cyberpunk, fur

4. Searching for areas of optimizations

Let's focus on a JavaScript application and its pipelines, with at least 2GB of node_modules, with enough jobs needing them to consider caching.

A main advantage of a single server runner is the opportunity of being stateful. A stateful GitLab Runner maintains persistent data across its lifecycle. It typically holds information such as cached dependencies.

Now that we know what is happening when caching with GitLab, we can go down the list of tasks and see what we can optimize with a stateful single server.

We will purposefully overlook other improvement solutions that can benefit both architectures and consider them implemented, such as pushing/pulling cache only when needed, and assigning the optimum cache key to share data when suitable.

a. Download/upload remote cache from past jobs/pipelines

Download/upload remote cache has a high impact on jobs: 2GB of zipped cache is heavy.

The obvious optimization is to rely on local files. This is the main advantage of a single server, right? So no remote cache configured for the single server runner.

b. Unzip/zip cache

Using standard local cache is already an optimization, but there are still the zipping/unzipping processes.

In itself, unzipping is not that time-consuming. But this operation will be run in parallel for multiple jobs. And it is clearly a possible bottleneck for a single server unzipping node_modules multiple times.

An optimization we can implement is to rely on a custom cache, stored in a mounted folder in jobs. Jobs become stateful, with direct access to cached folders from the previous run. The runner scripts to achieve is shared in GitLab CI: Deploy a Majestic Single Server Runner on AWS.

The downsides of this approach are that we have to tweak the runner to mount a folder, and some operations are needed at runtime (in job scripts), which highly depends on the technologies / languages / package managers we are using. For JavaScript, you will have to share node_modules, which are not thread-safe on write; for Maven, you will have to implement the new thread-safe Maven repository. We have done this multiple times in multiple contexts, but each time a few hours of custom work are needed.

c. Run the job scripts

A single server lacks the resource power to compete against autoscaling runners once prerequisites (cache/state) are in place. There is not much specific improvement possible here πŸ€·β€β™‚οΈ.

5. Architectures benchmark conditions

We have discovered single server optimizations that are beneficial when a large cache is required. Now, let's compare these optimizations with an autoscaling architecture πŸ€“

Let's consider a project with the following characteristics, where we have implemented the suggested optimizations:

  • A monorepo gathering 30+ modules consisting of 200,000 lines of JavaScript code.
  • Only GitLab jobs that are containerized.
  • The cache significantly accelerates jobs, regardless of the parallel demand. We have extensively tested the pipeline without cache under various scenarios, and it consistently performs slower.
  • The first job generates the node_modules, while all subsequent jobs only consume them.
  • GitLab CI Optimization: 15+ Tips for Faster Pipelines has been fully implemented.

The original pipeline, taken from a real production web application, begins as follows:

javascript benchmark full pipeline

For the experiment, deployment and end-to-end test jobs are excluded, as their execution speed is independent of the runner's performance. Therefore, the modified pipeline looks as follows:

javascript benchmark build pipeline

This modified pipeline consists of 35 Yarn jobs that rely on 2.4GB of node_modules.

To determine the limit of a single server runner in this scenario and compare it with an autoscaling architecture, we will benchmark the runners executing this pipeline on two different architectures:

  1. A single AWS EC2 instance with 16 CPUs, 64GB RAM, and a local SSD disk. The Docker executor is configured to handle up to 20 jobs in parallel.
  2. An EKS managed Kubernetes cluster that autoscales between 5 and an unlimited number of EC2 instances. The cluster utilizes a standard GitLab cache on AWS S3 in the same zone. The EC2 instances have similar specifications as the single one, except the local SSD is replaced with GP3 with standard specifications.

Individually on each architecture, after pre-loading the cache, multiple pipelines are launched simultaneously, and we record the total duration until the last job of the last pipeline is completed. We conduct the experiment for a single pipeline initially, and then repeat the process for N pipelines in parallel. This continues until the Kubernetes runner, with its unlimited power, surpasses the local cache efficiency of the single server runner.

(masterpiece, best quality), <lora:add_detail:.5>, sharp, intricate details, ((mechanical humanoid orange fox)), muscular, cyberpunk, fur

6. Results

The graph below illustrates the duration of batches of jobs, with lower durations indicating better performance.

benchmark results

Surprisingly, we found that a single server runner outperforms a Kubernetes cluster with equivalent node specifications until approximately 200 jobs requested simultaneously! This is typically beyond the average daily usage for most development teams.

Equally important, when there are 40 jobs to process or below, the single server runner is twice as fast. This scenario is quite common, even during the busiest days, for most teams.

However, when dealing with 300 jobs at once, a Kubernetes cluster proves to be twice as fast in completing the tasks. It is worth noting that achieving this performance requires the involvement of 30 servers, as opposed to just one server for the single server runner.

During our tests, we discovered that when there are 500 jobs to be processed simultaneously for the same branch in the same project (spanning across 14 pipelines), GitLab refuses to launch additional pipelines until the initial ones are completed.

7. High availability considerations

A single server is not inherently highly available, but this might not be a critical issue for many scenarios.

Here are some practical considerations involving human interventions:

  • Infrastructure-as-Code (IaC): Provision your runner with IaC, enabling quick responses to cloud unavailability. In case of any issues, you can fix the problem promptly or create a fresh runner elsewhere.

  • runners as backup: For modest teams deploying infrequently, using runners as a backup is a simple option. This can be configured easily in project/group settings.

  • Multiple single runners for multiple repos: Larger companies can deploy multiple runners, each targeting a specific one using runner tags. If one runner becomes unavailable, configurations can be adjusted to target another temporarily.

  • Autoscaling group (if supported): If your cloud provider allows it, consider having the VM in an autoscaling group of one. This way, a new runner will be provisioned if the previous VM disappears. Note that this won't solve software problems if the VM is still running.

For teams requiring true high availability and minimal intervention, an active/active runner setup may be considered. Performance could be comparable to a single runner with local cache, and the choice between local and remote cache depends on benchmarking and the team's familiarity with cache handling.

Wrapping up

In conclusion, our analysis highlights the advantages of using a single server runner over an autoscaling runner in most day-to-day scenarios. While autoscaling runners may offer unlimited CPU and RAM, they come with challenges in cache handling and can be unnecessary for smaller teams.

In our benchmarking experiment, we compared a single server runner with a Kubernetes cluster. Surprisingly, the single server runner outperformed the Kubernetes cluster until around 200 simultaneous jobs. It was also twice as fast as the cluster when there were 40 or fewer jobs to process. However, for 300 jobs, the Kubernetes cluster proved to be twice as fast but required the involvement of 30 servers.

Based on these findings, teams should consider maintaining a single server runner for an extended period before transitioning to an autoscaling setup. By maximizing the potential of GitLab's simplest runner architecture, teams can achieve efficient and fast pipelines without the complexity of autoscaling runners.

So buckle up and embrace the power of the majestic single server runner in your CI/CD process ! Implementation details in GitLab CI: Deploy a Majestic Single Server Runner on AWS πŸš€

(masterpiece, best quality), <lora:add_detail:.5>, sharp, intricate details, ((mechanical humanoid orange fox)), muscular, cyberpunk, fur

Illustrations generated locally by Automatic1111 using Lyriel model with Hydro tech LoRA

Further reading

This article was enhanced with the assistance of an AI language model to ensure clarity and accuracy in the content, as English is not my native language.

Top comments (3)

nikpaushkin profile image
Nik Paushkin

It depends on how your cook your k8s setup. Take a look at our new GitLab runner SaaS, it's beating anything in the world in cost/performance terms and it's built entirely on Kubernetes. And it's also running each job in the clean new VM, so the security is even better than on a single server.

bcouetil profile image
Benoit COUETIL πŸ’«

Interesting, what do you do differently than most people ?

nikpaushkin profile image
Nik Paushkin

A lot of stuff really: from host preparation, fine-tuned container runtime and filesystem to our unique billing model. You pay only for cpu-seconds and memory GB-seconds consumed (not just allocated) by each pipeline job. We call it Burstable Resources, and such billing concept isn't available on any cloud platform. So if you just take a normal k8s GitLab runner and run it in some GKE cluster, you won't be able to achieve such cost/performance profile even with tuned autoscaler.