DEV Community

Cover image for How to use CI to update Git submodules 🧩
Benjamin Rancourt
Benjamin Rancourt

Posted on • Originally published at benjaminrancourt.ca on

How to use CI to update Git submodules 🧩

If you are using Git submodules in your project, you might want to keep them up to date so that their latest version is available. Unfortunately, by design, Git keeps track of the submodules using their commit SHA (ex. eb41d76) and it must be committed to your repository.

And it seems their is not a way yet to always use the latest submodule commit without updating their reference with a command like git submodule update --remote... 😐

It is not really convenient when you want to automate your workflow and have to manually checkout the code to update the submodule SHA, right? πŸ€”

Imagine if you could only checkout your project containing submodules and you have the guarantee that they are at their latest version...😌

I am aware that sometimes, you do not want that a submodule to be able to introduce a breaking change, but if you are 99.99% sure that it cannot break your project, would not it be fun to forget that you have to update their references to use the latest changes? 🀭

It turned out to be not as simple as I would have like, but I managed to make it work. πŸ˜‡

Solutions

To simplify my explanation, let's says we have two projects: projectA and projectB, and projectB has projectA as one of its submodules. 😬

#1 - Update the submodules from within

Do you remember the .git:push job I wrote in my post How to push to a Git repository from a GitLab CI pipeline? Well, I have found another case to use it! πŸ˜…

update:submodule:
  extends: .git:rsync
  rules:
    - if: $UPDATE_SUBMODULE == "true"
  script:
    - cd "${CI_COMMIT_SHA}"
    - git submodule update --init --recursive
    - git submodule update --remote
    - git status
  stage: install

Enter fullscreen mode Exit fullscreen mode
Job used in projectB to update all its submodules

Since I did not want all other jobs to run when I was running this job, I add the following code to other jobs:

  rules:
    - if: $UPDATE_SUBMODULE == "true"
      when: never

Enter fullscreen mode Exit fullscreen mode
Do not run these jobs if the $UPDATE_SUBMODULE is true

By using a pipeline schedule with the $UPDATE_SUBMODULE set, this solution was working. My submodules have been updated according to the interval I wanted.

he first pipeline was executed to update a submodule, while the second was skipped because it had the -o ci.skip Git flag

A commit that updates a subdirectory to a new SHA

Unfortunately, this solution requires that the projectB schedule run some times after an update to the projectA, otherwise the latest commit of the projectA will have to wait for the next schedule to be taken into account...

Since I did not wanted to introduce any delay and I did not want to depend on the condition that my projectA pipeline be completed in less than X minutes, I keep looking for a better solution. πŸ”Ž

#2 - Trigger the pipeline of projectB from projectA

If we want the projectB to know when the projectA has been updated, one solution is for projectA to notify the projectB.

Fortunately, GitLab offers a specific keyword to launch a downstream pipeline: trigger. It is pretty straightforward, because you just need to configure two more keywords:

  • project : path of the second project
  • branch : branch to update of the second project

like in the example below.

update:references:
  trigger:
    branch: '${CI_DEFAULT_BRANCH}'
    project: ranb2002/projectB
    strategy: depend
  stage: trigger
  variables:
    UPDATE_SUBMODULE: 'true'
Enter fullscreen mode Exit fullscreen mode
Use of the trigger keyword in projectA GitLab CI file

Since I do not want to trigger the entire pipeline, I am also able to pass variables from one pipeline to another, like the UPDATE_SUBMODULE variable used in the previous solution.

There is also another keyword, strategy , that when set to depends, that links the repositories togethers. This can be useful if you want to know whether or not the second pipeline was successful from the upstream repository.

Pipeline of projectA (upstream), with the pipeline of projectB (downstream)

#3 - Use the GitLab API to trigger the downstream pipeline

I can not remember what I did wrong, but the previous solution did not work for me as I thought it would. I finally managed to fix it, but that was after finding another solution by using a specific GitLab API endpoint:

trigger:downstream:
  script:
    - curl
      --request POST
      --form "ref=${CI_DEFAULT_BRANCH}"
      --form "token=${CI_JOB_TOKEN}"
      --form "variables[UPDATE_SUBMODULE]=true"
      "${CI_SERVER_URL}/api/v4/projects/${DOWNSTREAM_PROJECT_ID}/trigger/pipeline"
  stage: update

Enter fullscreen mode Exit fullscreen mode
Use of the GitLab API to trigger the downstream pipeline

It is a little more verbose than the previous example, but you just need to provide the ID of the project to update (DOWNSTREAM_PROJECT_ID) as the other variables are automatically predefined by GitLab CI.

#4 - Use the GitLab API to update directly a submodule

While searching on the Internet, I came across another GitLab API endpoint that did exactly what I wanted, without the need to adapt two pipelines: the Repository submodules API!

As I finally choose this way to update my submodules, I improved the example from the documentation to meet my criteria, so be careful, as the example below may look complicated.

For a much simpler example, I suggest you read the official documentation on the subject. πŸ›‘οΈ

trigger:
  image: curlimages/curl:7.76.1
  script:
    # Put the private-token with a colon into a variable to escape the "Linefeed-Limbo" (https://stackoverflow.com/a/51187502)
    - PRIVATE_TOKEN="PRIVATE-TOKEN:"

    # Get the current datetime (in local time with the TZ variable)
    - CURRENT_DATE="$(date +'%F %T')"

    # Add it to each commit message of the repository (and make sure there is no carriage returns)
    - MESSAGE="${COMMIT_MESSAGE:-$CI_COMMIT_MESSAGE} (${CURRENT_DATE})"
    - MESSAGE=$(echo "${MESSAGE}"|tr -d '\n')

    # Use the Repository submodules API to update the submodule reference in the repository
    - curl
      --data "branch=${BRANCH:-$CI_DEFAULT_BRANCH}&commit_sha=${COMMIT_SHA:-$CI_COMMIT_SHA}&commit_message=${MESSAGE}"
      --header "${PRIVATE_TOKEN} ${GITLAB_TOKEN}"
      --request PUT
      "${CI_SERVER_URL}/api/v4/projects/${PROJECT_ID}/repository/submodules/${SUBMODULE_PATH}"
  stage: trigger
  rules:
    - changes:
        - ghost/
        - images/
  variables:
    PROJECT_ID: '20554554'
    SUBMODULE_PATH: 'ghost'
    TZ: ':America/Toronto'

Enter fullscreen mode Exit fullscreen mode

Now, I am sure sure you want an explication of the above code, right? πŸ˜…

First, the heart of the solution is the curl command, where I call the GitLab endpoint. Let's break down all of the variables used here:

  • SUBMODULE_PATH: which represents the name of the submodule exactly as it was commited into the projectB repository
  • PROJECT_ID: the projectB ID
  • CI_SERVER_URL: the URL of GitLab
  • GITLAB_TOKEN: a personal GitLab token, with write permissions on the API
  • MESSAGE: the commit message I want (more explanations below)

There is also a lot of line just to set the MESSAGE variable, because I want to add the current date to it (in my local time zone). If the COMMIT_MESSAGE is not provided, it defaults to CI_COMMIT_MESSAGE which is the latest commit message of the projectA repository.

Conclusion

You have now four ways to update a Git submodule in a GitLab CI pipeline! Which one will you choose? πŸ˜‰

Top comments (0)