Zafri Zulkipli

Posted on Jan 17, 2022

Parallel Behat

#behat #docker #cicd #python

TLDR; Our company dockerized pretty much everything, as such to tackle long waiting time of the CI pipeline, we split the big list of behat files into chunks and process it using docker in parallel using python multiprocessing.

Background:

In my workplace, we deploy on a daily basis. Each time we want to deploy, we have to wait for at least 20 mins for the pipeline to finish. On average it takes around 25~30 mins from the time we push the code to our version control, wait for it to finish running all the checks, then deploy to production.

The main reason for this delay is we have a lot of behat .feature files. We emphasize a lot on integration testing, as such the test files keep growing.

All these waiting is frustrating whenever we want something in production very fast. It's even more frustrating when we're working with 3rd parties and they have to wait for us to apply changes.

The search for cutting down time

We've search for various ways to reduce the time, some of the suggestion was:

Remove deprecated features and its .feature file
Refactor the .feature file to remove redundancy
Search for 3rd party packages that deal with parallelism

Point 1: we've been regularly doing it, but the time it reduce was not as much as we expected.

Point 2: no one really wants to take the time to find redundant feature files and refactor it.

Point 3: 3rd party packages are unable to to satisfy all our requirements.

After all things considered, we've decided to create our own parallel behat runner.

The journey to achieve parallelism

Step 1: How do we even run process in parallel?

TLDR; I don't even know myself the intricate logic behind parallel processing. What I do know is that I've been using python multiprocessing package and it worked wonders!

Step 2: Add a script that takes specific input to execute

Python multiprocessing needs a process to run. So I've created a simple bash script that takes in .feature folder path and execute all the files within it. It would basically be self-contained docker process that executes the test and die.

Step 3: Chunk it!

This part is easy, let say we have .feature folders A,B,C,D,E,F. By putting it into chunks we would have [[A,B,C], [D,E,F]]

Step 4: Putting it all together

Finally, we use python multiprocessing to run the script which takes in the chunk. By splitting into 2 sub process, we managed to get the time down from 20 mins to 14 mins. We're still in testing phase on increasing the no. of sub processes. The hypothesis is, more sub process, less execution time provided we have enough CPU core.

The code

For company privacy reason, I will not be sharing the exact code. This is the simplified obfuscated version, much of the things may not make sense to you or edge cases not handled but it's already handled in our actual version.

sub_process_test.py

def main():
    processes = []
    sub_process_count = 2
    chunks = get_chunks(sub_process_count)

    for index, chunk in enumerate(chunks):
        current_proccess = multiprocessing.Process(
            target=run_behat_for_certain_folders,
            args=[index+1, ','.join(chunk)]
        )
        current_proccess.start()
        processes.append(current_proccess)

    [x.join() for x in processes]

def run_behat_for_certain_folders(pid, folder_names):
    if folder_names:
        subprocess.call(f"./sub_process_runner.sh {folder_names} {pid}", shell=True)

sub_process_runner.sh

#!/bin/bash

run_sub_process() {
  docker-compose -f $docker_compose_path -p "$prefix" exec -T test ./vendor/bin/behat \
    --stop-on-failure \
    ./your-behat-folder/$1)

  # status check here, obfuscated
}

# docker initialization here, obfuscated

for i in ${1//,/ }
do
    run_sub_process $i $2
done

sub_process_test.py will call sub_process_runner.sh A,B,C 1where the 1st arg is the list of folder it needs to run and the 2nd arg is the sub process id.

Conclusion

We achieve our goal of reducing the CI pipeline waiting time. Hopefully this post bring some insights to the reader, whatever the insight may be.

DEV Community

Parallel Behat

Background:

The search for cutting down time

The journey to achieve parallelism

Step 1: How do we even run process in parallel?

Step 2: Add a script that takes specific input to execute

Step 3: Chunk it!

Step 4: Putting it all together

The code

Conclusion

Top comments (0)

Read next

How I Saved Myself Hours Using Python, Google Gemini, & Meta Llama to Create a Time Saving Script

Automating Infrastructure Deployment for CI/CD Pipelines Using Terraform

Flatten in PyTorch

Speeding up ECS containers with SOCI