Emilien Mottet

Posted on Mar 3, 2021 • Edited on May 9, 2023

A journey into the depths of gitlab-ci

#git #gitlab #ci #subtree

The beginning of an adventure

After spending a lot of time on the awesome exercism.io platform, I was able to learn a lot of new languages and received great advice. After having accumulated a lot of exercism, I tried to automate and industrialize it. I was able to discover advanced features proposed by gitlab-ci. Here is my feedback

Exercism

Exercism is a community site, where people wishing to learn a new language can receive expert advice from volunteer mentors.
To learn a language, it is advisable to follow a path that corresponds to a logical sequence of exercises to have a regular progression in the language.

Here is the list of all the tracks (learning paths) available on the platform.

Exercism offers a cli to download and upload these exercises. This system is good but does not replace a versioning with git. So I use git to version my exercises.

The exercism folder is composed of subfolders for each language, and each language contains a folder for each exercism.

Here is an example of the architecture of an exercism folder.

exercism
+-- c
|   +-- hello-world
|       +-- hello-world.c
|       +-- hello-world-test.c
+-- go
|   +-- leap
|   |   +-- leap.go
|   |   +-- leap_test.go
|   +-- hamming
|       +-- hamming.go
|       +-- hamming_test.go
+-- ruby
    +-- sieve
        +-- sieve.rb
        +-- sieve_test.rb

Each exercise has tests to validate. The exercises use the TDD method. Once the tests have been validated, the solution can be submitted and the return of a mentor is requested.

After accumulating many solutions, I wanted to have an IC that would allow me to stabilize and keep my exercises up to date if new tests were added.

Moreover, the way of executing the tests is similar depending on the tracks/languages. For example the exercises in go will be tested with the command go test or for java exercises gradle test.

Wanting to follow the DRY principle, I would like to have a CI template that I could apply to all the exercises in the same language and add this CI to all the new exercises.

I compared the different continuous integration solutions available for open source projects (Jenkins, CircleCI, Travis CI, Codeship Gitlab CI). I chose gitlab-ci because it offers the most interesting specifications and features.

🦊 The Powerful Feature : Parent Child Pipelines

Gitlab offers a free CI, which can be easily integrated into a git project and even integrated into a Github project. Cool, my repo is on github

Parent Child Pipelines

They allow to execute a CI from a CI. It is therefore possible to create a meta-CI. Despite some limitations that we will see later, Parent Child Pipelines allow me to spread my CI over 3 nested levels :

At the root of the exercises folder which will execute a CI for each language. The simplest file, it just has to trigger the ci of the languages:

emacs-lisp:
  stage: test
  trigger:
    project: EmilienMottet/exercism-emacs-lisp-sol
    strategy: depend
  rules:
    - changes:
        - emacs-lisp/*

x86-64-assembly:
  stage: test
  trigger:
    project: EmilienMottet/exercism-x86-64-assembly-sol
    strategy: depend
  rules:
    - changes:
        - x86-64-assembly/*

Full file

At the root of each language that will execute a CI for each exercise:

{
   "generate_c_gitlab_ci": {
      "artifacts": {
         "paths": [
            ".c-*-gitlab-ci.yml"
         ]
      },
      "image": {
         "entrypoint": [
            ""
         ],
         "name": "bitnami/jsonnet:latest"
      },
      "needs": [
         {
            "job": "build_vars",
            "pipeline": "$PARENT_PIPELINE_ID"
         }
      ],
      "script": [
         "jsonnet -m . --ext-str exercism_projects=\"$( echo $DIR_TO_BE_TESTED | sed -En 's/ /\\n/p' )\" --ext-str lang=\"c\" \".c-gitlab-ci.jsonnet\""
      ],
      "stage": "build"
   },
   "test_c_armstrong-numbers": {
      "stage": "test",
      "trigger": {
         "include": [
            {
               "artifact": ".c-armstrong-numbers-gitlab-ci.yml",
               "job": "generate_c_gitlab_ci"
            }
         ],
         "strategy": "depend"
      }
   },
   "test_c_hello-world": {
      "stage": "test",
      "trigger": {
         "include": [
            {
               "artifact": ".c-hello-world-gitlab-ci.yml",
               "job": "generate_c_gitlab_ci"
            }
         ],
         "strategy": "depend"
      }
   }
}

Template file

At the root of each exercise to run the tests provided by the platform :

{
   "default": {
      "image": "gcc:latest"
   },
   "test-c-armstrong-numbers-exercism": {
      "script": [
         "cd armstrong-numbers",
         "make"
      ]
   }
}

Template file

Note the use of the rules field which allows each new push to execute the CI task only if there has been a modification in the given path.

To be DRY, I need to templating my CI, an example is proposed with jsonnet. Here is an example proposed by the gitlab team: https://gitlab.com/gitlab-org/project-templates/jsonnet.

A template for each situation

A gitlab-ci.yml file can be described in jsonnet format.

For example :

local exercism_projects = std.map(function(x) std.strReplace(x, '/', ''), std.split(std.extVar('exercism_projects'), '\n'));
local lang = std.extVar('lang');

local CTestJob(name) = {
  ['.' + lang + '-' + name + '-gitlab-ci.yml']: {
    default: {
      image: 'gcc:latest',
    },
    ['test-' + lang + '-' + name + '-exercism']: {
      script: [
        'cd ' + name,
        'make',
      ],
    },
  },
};


std.foldl(function(x, y) x + y, std.map(CTestJob, exercism_projects), {})

This template allows you to describe gitlab-ci.yml for C exercisms.

Morality, jsonnet templates to avoid repetition, it's perfect!

The almost perfect IC, multi project pipeline

Parent Child Pipeline are awesome but have several flaws, 2 were notable to me:

Only 2 levels for Nested child pipelines. In my case I need 3 levels. The limitation can be bypassed by using multi-project pipelines.
Management of files modified by the last commit to avoid rebuilding all exercisms.

For the management of modified files, it is possible to recalculate the modified files using a pre-step.

build_vars:
  stage: .pre
  script:
    - |
      if [ "$CI_PIPELINE_TRIGGERED" = "true" ]; then
        DIR_TO_BE_TESTED=$(ls -d */)
      else
        DIR_TO_BE_TESTED=$(git diff --name-only $CI_COMMIT_SHA^ $CI_COMMIT_SHA */ | cut -d'/' -f1 | sort | uniq)
        if [ -z $DIR_TO_BE_TESTED ]; then
          DIR_TO_BE_TESTED=$(ls -d */)
        fi
      fi
    - echo DIR_TO_BE_TESTED=$DIR_TO_BE_TESTED >> build.env
    - cat build.env
  artifacts:
    reports:
      dotenv: build.env

Once the exercisms has been modified, it can be exported to artifact to share it with the job that will be generated afterwards.

Submodule vs subtree

Because of the limitation of the number of nested and also a wish to separate the different tracks, I created a git repository for each track (manual action that could be automated with ansible).

The classic way to manage these nested projects would be to use the git submodule. However, keeping a consistent set of git submodule up to date can become quite restrictive.

There is an alternative ! The subtree :

git-subtree - Merge subtrees together and split repository into subtrees

The advantage I found in subtrees is that I can mirror only the tracks and automate this one simply with the git hooks. And this without modifying my existing one.

To create a subtree nothing could be simpler :

git subtree add --prefix=clojure --squash git@github.com:EmilienMottet/exercism-clojure-sol.git master

Then to make sure the subtrees is up to date, just add a hook to the post-commit..

Here is my hook file .git/hooks/post-commit :

#!/bin/sh

subtree_push(){
    git subtree push -P "$1" git@github.com:EmilienMottet/exercism-"$1"-sol.git master
}

for dir in */; do
    subtree_push "$( basename "$dir" )" 2> /dev/null &
done


for dir in */; do
    wait
done &

At each commit, each subtree is updated in the background.

Conclusion

I hope this article has made you want to try exercism.io and, why not, join this community.

Industrialize, as quickly as possible. Exercism has a version management system, however, complementing it with git allows you to go further:

For example, I can do my exercises on several machines
The automated tasks save me a lot of time
have a backup system
Just manage the history
share my work and make it visible on github.com.

Test yourself, the basis of exercise is the TDD. It is important to test yourself and your code all the time. Moreover, I want to keep my exercism compatible with the latest version of each language and in the future with new tests added by the community. Having up to date exercism allows you to always have up to date exercism that make a very good example, and it allows you to have a lot of stars ;)

DRY ! Do not repeat, Do not repeat (oops)! At first, factorized allows to gain clarity. Allows for sanctuarisation and transmission of knowledge. How many times you make a set of commands that are almost automatic, but after several months without doing them, you end up forgetting them. Taking the time to template, this time is quickly gained!

Thank you for reading :)

DEV Community

A journey into the depths of gitlab-ci

The beginning of an adventure

Exercism

🦊 The Powerful Feature : Parent Child Pipelines

Parent Child Pipelines

A template for each situation

The almost perfect IC, multi project pipeline

Submodule vs subtree

Conclusion

Top comments (0)

Read next

Create and Release a Private Python Package on GitHub

Orchestrating Airflow DAGs with GitHub Actions - A Lightweight Approach to Data Curation Across Spark, Dremio, and Snowflake

Fixing File Renaming Issues in Git: Handling Case Sensitivity and core.ignorecase

Mastering GitHub Actions for DevOps Engineers: A Complete Guide from Beginner to Advanced