wayofthepie

Posted on Feb 2, 2020 • Edited on Feb 8, 2020

Hacking Together A GitHub Actions Runner Orchestrator

#github #docker #bash

Run a single build and cleanup
A simple orchestrator
Conclusion

In the last post, we built a docker image that allows us to connect and disconnect actions runners when we want. But we need to do this manually when we need one.

In this post, I'm going to hack together an orchestrator that launches an actions runner per commit, with bash and cron!

Run a single build and cleanup

Right now when we run a container and connect an actions runner, it will stay alive until we manually kill it, building any new commits. It would be nicer if it ran a single build then cleaned itself up.

Back in the previous post we found the Runner.Common Constants.cs. I mentioned that there were some other interesting pieces of code in that class, outside of what we were looking for at the time. One of those is the CommandLine Flags class. This contains the help option, which looks like it maps to config.sh --help. But it also contains a once option.

Digging about a bit more and seeing where this is used uncovers a variable called RunOnce in Runner.Listener CommandSettings.

...

public bool RunOnce => TestFlag(Constants.Runner.CommandLine.Flags.Once);

...

This looks promising. It is most likely an option we can pass to ./run.sh. Let's find out by updating entrypoint.sh!

#!/usr/bin/env bash

OWNER=$1
REPO=$2
PAT=$3
NAME=$4

cleanup() {
    token=$(curl -s -XPOST -H "authorization: token ${PAT}" \
        https://api.github.com/repos/${OWNER}/${REPO}/actions/runners/registration-token | jq -r .token)
    ./config.sh remove --token $token
}

token=$(curl -s -XPOST \
    -H "authorization: token ${PAT}" \
    https://api.github.com/repos/${OWNER}/${REPO}/actions/runners/registration-token | jq -r .token)

./config.sh \
    --url https://github.com/${OWNER}/${REPO} \
    --token ${token} \
    --name ${NAME} \
    --work _work

# add the --once option here
./run.sh --once

cleanup

Rebuild and launch the runner:

$ docker run -ti --rm actions-image ${OWNER} ${REPO} ${PAT} ${NAME}

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration


√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.


√ Connected to GitHub

2020-02-02 18:05:11Z: Listening for Jobs

For the next steps to work, you will need to have an action workflow set up, here is an example:

name: CI

on: [push]

jobs:
  build:

    runs-on: self-hosted

    steps:
    - uses: actions/checkout@v2
    - name: Run a one-line script
      run: echo Hello, world!
    - name: Run a multi-line script
      run: |
        echo Add other actions to build,
        echo test, and deploy your project.
        sleep 10

Add this to the repository at .github/workflows/ci.yml. It should appear under the Actions tab in the repo. Now you can add a new commit to the repo, and your connected action runner should start the workflow:

...
2020-02-02 18:06:42Z: Running job: build
2020-02-02 18:07:00Z: Job build completed with result: Succeeded

# Runner removal

√ Runner removed successfully
√ Removed .credentials
√ Removed .runner

Awesome! It ran the action workflow on the new commit, exited, and cleaned up. Exactly what we wanted. Now we can build a simple orchestrator.

A simple orchestrator

I'm going to keep this dead simple. Here is the workflow I am thinking of:

Every minute check for new commits
If a new commit is detected, launch a runner

That's it! This has a few problems, for example if there are multiple new commits this will just launch a single runner which will build one commit and the other commits may never be built. However, if we get this flow working we can improve as we go.

Detecting new commits

The simplest way I can think of detecting new commits is by using the /repos/:owner/:repo/commits API. This returns a lot of info, all we need is the SHA1 hash of the last commit. For example:

$ curl -s -H "authorization: token ${PAT}"  \
    https://api.github.com/repos/wayofthepie/gh-app-test/commits \
    | grep sha \
    | head -n1 
"sha": "e0654f66fcb8061a439e404f1bf1b2c8b7116537",

Simple! We can make this a bit cleaner with jq:

curl -s -H "authorization: token ${PAT}"  \
    https://api.github.com/repos/wayofthepie/gh-app-test/commits \
    | jq -r .[0].sha
e0654f66fcb8061a439e404f1bf1b2c8b7116537

jq -r .[0].sha says get me the first object in an array, and get the value of the sha field from that. Give to me as a raw string (not wrapped in quotes).

With this, we can write a shell script (orc.sh) that detects new commits and launches a runner:

#!/usr/bin/env bash
##
# Detects new commits in a given repo by checking every minute
# and storing a reference to the last seen commit in a file called
# prev.
##

PAT=$1
OWNER=$2
REPO=$3
prev=prev

# make sure we have values for all our arguments
[ -z ${PAT} ] || [ -z ${OWNER} ] || [ -z $REPO ] && {
    echo "Incorrect usage, example: ./orc.sh personal-access-token owner some-repo"
    exit 1
}

# make sure the prev file exists
touch ${prev}

# get the latest commit
latest_commit=$(curl -s -H "authorization: token ${PAT}"  \
    https://api.github.com/repos/${OWNER}/${REPO}/commits |\
    jq -r .[0].sha)

# read the last commit we saw
prev_commit=$(cat ${prev})

echo "Latest commit is ${latest_commit}"
echo "Previous commit is ${prev_commit}"

# compare the commits, running an action runner if they differ
if [ "${latest_commit}" != "${prev_commit}" ]; then
    echo "Detected new commit, starting runner"
    docker run -d --rm actions-image \
        ${OWNER} \
        ${REPO} \
        ${PAT} \
        $(uuidgen)
fi

# update the previous commit store with the latest commit
echo ${latest_commit} > ${prev}

To test it, create a new commit in the repo you are connection your actions runners to and run this script against that repo, e.g.:

$ ./orc.sh ${PAT} ${OWNER} ${REPO}
Latest commit is 4ceb832f5386c812c0a296968197a1d242f8e8fd
Previous commit is c7e9b463a90a4c5eeab16ba9ccf2afdc1b28153b
Detected new commit, starting runner
68101fbf5c801afaa378ebbb6dc988c5e264fa784bc2c1353025f125f56bbc22

Awesome! To see the container:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
68101fbf5c80        actions-image       "./entrypoint.sh way…"   26 seconds ago      Up 25 seconds                           adoring_perlman

And the logs:

$ docker logs -f adoring_perlman

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration


√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.


√ Connected to GitHub

2020-02-02 18:38:25Z: Listening for Jobs

An issue with our process

Hmm it doesn't seem to be starting any jobs, and if we look at the repo we see that the check has failed!

The reason for this:

This is a problem. It seems you always need at least one self-hosted runner running, or commits will initially fail. For now we can manually kick off the checks again from the UI with the Re-run checks button, and our runner should start the job.

$ docker logs -f adoring_perlman
...
2020-02-02 18:38:25Z: Listening for Jobs
2020-02-02 18:42:37Z: Running job: build
2020-02-02 18:43:00Z: Job build completed with result: Succeeded

# Runner removal


√ Runner removed successfully
√ Removed .credentials
√ Removed .runner

Given this issue, we need to change tactics!

Changing tactics

From a bit of testing it seems we can get around this issue by configuring a runner that we never actually run. Let's update the entrypoint script to allow registration only:

#!/usr/bin/env bash

OWNER=$1
REPO=$2
PAT=$3
NAME=$4

# if set this script will only run ./config.sh
# it will not run the actions runner
REGISTER_ONLY=$5

cleanup() {
    token=$(curl -s -XPOST -H "authorization: token ${PAT}" \
        https://api.github.com/repos/${OWNER}/${REPO}/actions/runners/registration-token | jq -r .token)
    ./config.sh remove --token $token
}

token=$(curl -s -XPOST \
    -H "authorization: token ${PAT}" \
    https://api.github.com/repos/wayofthepie/gh-app-test/actions/runners/registration-token | jq -r .token)

./config.sh \
    --url https://github.com/${OWNER}/${REPO} \
    --token ${token} \
    --name ${NAME} \
    --work _work

if [ -z ${REGISTER_ONLY} ]; then
    ./run.sh --once
    cleanup
fi

The code up to this point can be found here. Now let's build:

$ docker build -t actions-image .
Sending build context to Docker daemon    105kB
Step 1/9 : FROM ubuntu
 ---> 775349758637
...
Successfully built f82508c0ebb8
Successfully tagged actions-image:latest

And run, but this time run it so that it only configures and does not stay waiting for commits:

$ docker run -ti --rm actions-image ${OWNER} ${REPO} ${PAT} test true

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration


√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.

We are setting the fifth parameter to true here, this will in turn set the ${REGISTER_ONLY} variable in our entrypoint.sh script, meaning ./run.sh will never be called. Now if we go to the UI and Settings -> Actions in our repo, we should see an offline action called test:

Great! Now let's commit to our repo again, and hopefully the check run will be queued and will not have failed:

Awesome! Finally let's spin up a runner to actually run the build:

$ ./orc.sh ${PAT} ${OWNER} ${REPO}
Latest commit is 1fb4c5fd212e8aba177a15b6016c68b7c005b74a
Previous commit is
Detected new commit, starting runner
4ecebf7f0a5fdd75c216d490ef67afa7a9d3e0c3b30848c150b148238eccbd7c

$ docker logs suspicious_ardinghelli -f

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration


√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.


√ Connected to GitHub

2020-02-02 19:14:33Z: Listening for Jobs
2020-02-02 19:14:36Z: Running job: build
2020-02-02 19:14:54Z: Job build completed with result: Succeeded

# Runner removal


√ Runner removed successfully
√ Removed .credentials
√ Removed .runner

Woop! All working again. Now that's working again, let's add some improvements to orc.sh and add a scheduler.

Adding cron

Let's get orc.sh running every minute with cron. Just a small update first so that it writes the prev file in the directory itself is located in:

#!/usr/bin/env bash
##
# Detects new commits in a given repo by checking every minute
# and storing a reference to the last seen commit in a file called
# prev.
##

PAT=$1
OWNER=$2
REPO=$3

# the directory of this script 
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
prev="${cur_dir}/prev"

# make sure we have values for all our arguments
[ -z ${PAT} ] || [ -z ${OWNER} ] || [ -z $REPO ] && {
    echo "Incorrect usage, example: ./orc.sh personal-access-token owner some-repo"
    exit 1
}

# make sure the prev file exists
touch ${prev}

# get the latest commit
latest_commit=$(curl -s -H "authorization: token ${PAT}"  \
    https://api.github.com/repos/${OWNER}/${REPO}/commits |\
    jq -r .[0].sha)

# read the last commit we saw
prev_commit=$(cat ${prev})

echo "Latest commit is ${latest_commit}"
echo "Previous commit is ${prev_commit}"

# compare the commits, running an action runner if they differ
if [ "${latest_commit}" != "${prev_commit}" ]; then
    echo "Detected new commit, starting runner"
    docker run -d --rm actions-image \
        ${OWNER} \
        ${REPO} \
        ${PAT} \
        $(uuidgen)
fi

# update the previous commit store with the latest commit
echo ${latest_commit} > ${prev}

I took the cur_dir var from this post: https://stackoverflow.com/questions/59895/how-to-get-the-source-directory-of-a-bash-script-from-within-the-script-itself.

Latest code up to this point is here.

To add orc.sh to cron, first move it to /var/tmp (we can pick a better location later) and then run cronab -e to edit the crontab for your user:

$ cp orc.sh /var/tmp
$ crontab -e 

# Edit this file to introduce tasks to be run by cron.
#
...
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command
* * * * * /var/tmp/orc.sh ${PAT} ${OWNER} ${REPO}

You can make sure it is running by checking the cron logs, it should run every minute:

$ journalctl -u cron -f
-- Logs begin at Sat 2019-08-10 18:05:10 IST. --
...
Feb 02 19:30:01 sky CRON[31363]: (chaospie) CMD (/var/tmp/orc.sh ${PAT} ${OWNER} ${REPO} )

Now if you commit, a container should be created and run your actions! Note that because orc.sh did not know about any previous commit it will always launch a container on its initial run.

Conclusion

We now have a super simple way of launching actions runners per commit. However the way we are doing this has a lot of of problems:

orc.sh will launch a container when it initially runs, whether there is something to build or not.
We launch a runner per commit, if there are multiple workflows defined in a repo we will only ever build one and the others will lay idle. We can fix this by using information from the checks API to figure out when to launch a runner and how many we may need.
The cron checks every minute for new commits, we can do better than this. For example we could use webhooks so the builds run in realtime.
In my testing I have seen some cases where a runner fails to be cleaned up and sometimes you cannot even remove them from the UI. When this has happened I've had to register a new runner with the same name and then remove that.
Instead of bash + cron it would be better to create a webservice for this, especially if we are going down the webhooks route.
This can currently only run on a single machine and isn't really scalable as is.

In the next post I'll tackle some of these issues. The code up to this point can be found here.

DEV Community