No, not preposterous. Powerful.
DevOps is all about rapid delivery. Using Python to make your pipelines "smart" can help you achieve that goal.
Let's say you have a suite of Angular libraries your team created to use for their applications. A robust pipeline for this project will include the following:
- validation jobs that make sure
- there's no "fit" or "fdescribe" to narrow down unit tests
- there's no "dist" or ".angular" folders present
- eslint passes
- tests
- unit tests
- integration tests
- security scans
- npm audit
- SAST scans (SonarQube, Fortify, etc)
- publication
- snapshots
- release candidates
- releases
At an enterprise level, you could be dealing with upwards of 20 libraries. You could have a job for each library. Speaking from experience, having jobs for each will quickly turn your pipeline into a big nasty pile of spaghetti. Nobody wants to deal with a big nasty pile of spaghetti.
Worse still, the time it'll take for your pipelines to run will slow things down to an agonizing crawl and have your team pulling their hair out in frustration every time they push a change up.
Or you can have an easy to read and maintain Python script that will figure out which libraries actually changed, then run the jobs for those libraries.
Let's use a mock scenario to illustrate my point. For the sake of brevity we'll just have our pipeline:
- Run eslint
- Run unit tests
- Publish snapshots off of our feature branches
Here's a project that demonstrates this using both GitHub Actions and GitLab CI/CD. I'm including both because I think most people use GitHub Actions but I know many companies also use GitLab for their enterprise applications (plus I'm way more familiar with GitLab CI/CD).
The major points I'll be going over will be:
- Setting up a runner image
- Setting up your CI file
- GitHub Actions
- GitLab CI/CD
- Handling secrets
- Creating an auth token for npmjs.org
- Creating Actions secrets in GitHub
- Creating masked environment variables in GitLab
- Pythonizing your pipeline
Setting up a runner image
There are a boat-load of Docker images to choose from for using as your pipeline runner image. Personally, I like having total control over my pipelines and prefer to use my own image. I like Alpine because it's tiny compared to the more popular Ubuntu. Tiny is good because it loads faster and there's a smaller attack surface.
Based on the listed requirements above, our image will need to support Python, the packages our pipeline script will be using, nodejs, and a browser for our unit tests.
Here's a good example of an image that will serve our needs:
FROM alpine:latest
RUN apk add --no-cache --update python3-dev gcc libc-dev libffi-dev git && \
ln -sf /usr/share/zoneinfo/America/Chicago /etc/localtime && \
ln -sf python3 /usr/bin/python && \
ln -sf pip3 /usr/bin/pip
COPY ./scripts/requirements.txt /tmp
RUN python -m ensurepip && \
python -m pip install --no-cache --upgrade -r /tmp/requirements.txt
RUN apk add --update --repository http://dl-cdn.alpinelinux.org/alpine/v3.16/main nodejs=16.17.1-r0 npm && \
apk add --no-cache chromium --repository http://dl-cdn.alpinelinux.org/alpine/v3.16/community
ENV CHROME_BIN=/usr/bin/chromium-browser CHROME_PATH=/usr/lib/chromium
Super duper. Assuming you know your way around Docker we can publish our image to whatever registry we use. If you're not familiar with Docker, don't fret. Their documentation is outstanding and well worth spending time reading through.
Setting up your CI file
GitHub Actions
As far as I can tell GitHub doesn't support using a custom pipeline runner image directly. You'll need to wrap a natively supported image around it and set your image as the "container". Kinda icky but it will still serve you.
Disclaimer: I'm still learning my way around GitHub Actions so the below yml file is far from perfect. It works but I'd love to hear about ways it could be improved/optimized.
.github/workflows/ci.yml
name: CI
on: push
jobs:
CI:
runs-on: ubuntu-latest
container: stevewhitmore/nodejs-python
steps:
- uses: actions/checkout@v3
- name: Cache node modules
id: cache-npm
uses: actions/cache@v3
env:
cache-name: cache-node-modules
with:
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: Allow me to run my script
run: git config --global safe.directory '*'
- name: pylint
run: |
python -m pylint --version
PYTHONPATH=${PYTHONPATH}:$(dirname %d) python -m pylint scripts/ci.py
- name: eslint
run: python scripts/ci.py eslint
- name: Unit Tests
run: python scripts/ci.py unit_tests
- name: Publish Snapshots
run: |
echo "//registry.npmjs.org/:_authToken=${{ secrets.NPM_TOKEN }}" > .npmrc
python scripts/ci.py publish_snapshots
GitLab CI/CD
.gitlab-ci.yml
image: stevewhitmore/nodejs-python
stages:
- validation
- test
- snapshot
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .npm/
pylint:
stage: validation
script:
- python -m pylint --version
- PYTHONPATH=${PYTHONPATH}:$(dirname %d) python -m pylint scripts/ci.py
except:
- tags
eslint:
stage: validation
cache:
key: ${CI_COMMIT_REF_SLUG}
script:
- python scripts/ci.py eslint
except:
- tags
unit_tests:
stage: test
cache:
key: ${CI_COMMIT_REF_SLUG}
script:
- python scripts/ci.py unit_tests
except:
- tags
npm_publish_snapshot:
stage: snapshot
cache:
key: ${CI_COMMIT_REF_SLUG}
script:
- echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc
- python scripts/ci.py publish_snapshots
except:
- main
- tags
Handling secrets
NPM needs to know where your packages (libraries) will be registered and there needs to be some kind of authentication. It gets this information from the .npmrc
file. Assuming you're publishing to npmjs.org, your pipeline's .npmrc
file will look pretty similar.
Note: This
.npmrc
file is created and only exists during the lifecycle of the pipeline job. DON'T use the.npmrc
file from your personal workstation!
The NPM_TOKEN
is an environment variable we'll pass from the project's settings. Be sure to mask this variable or you'll be inviting the world to publish npm packages on your bahalf. GitHub does this automatically but GitLab requires an additional step.
Creating an auth token for npmjs.org
- Sign into npmjs.org and click on your username on the far right. Select "Access Tokens"
- Click "Generate New Token" on the far right
- Name your token and select the "Automation" option. This is the ideal option because it will bypass two-factor authentication (which you absolutely should have set up).
- Copy the token to a file so you don't lose it.
No, you can't use this token. It has been deleted π
Creating Actions secrets in GitHub
- Go to the project Settings > Secrets > Actions. Click "New repository secret"
- Fill out the "Name" and "Secret" keys with NPM_TOKEN and whatever your auth token is, then click "Add secret"
Creating masked environment variables in GitLab
- Go to the project Settings > CI/CD
- Expand the "Variables" section and click "Add Variable"
- Add your auth token to your project. Note the Masked box is checked.
Pythonizing your pipeline
Now for the fun part. Let the Pythonization commence!
Create a folder named "scripts" at the root of your project. In that folder, create a file ci.py
.
There was an argument passed in for each Python script call in our CI file. Those arguments in turn are intended to trigger a specific function that will live in the script file.
For example, the unit_tests
job has the following line:
python scripts/ci.py unit_tests
We're passing unit_tests
to the ci.py
file.
scripts/ci.py
import sys
# ...
def unit_tests():
"""Runs unit tests on libraries with changes"""
npm_command("test")
locals()[sys.argv[1]]()
Let's assume npm_command()
handles whatever npm command you pass in (shocking, I know). It would look something like this:
def npm_command(command):
"""Runs npm commands depending on input"""
npm_install()
diffs = get_diffs()
for library in diffs:
subprocess.check_call(f"npm run {command}-{library}", shell=True)
That get_diffs()
function uses the GitPython package to compare the changes on your branch with the default origin branch (main). It finds all the git diffs, plucks out the library name, and returns a set of library names to avoid duplication.
def get_diffs():
"""Gets the git diffs to determine which libraries to run operations on"""
path = os.getcwd()
repo = Repo(path)
repo.remotes.origin.fetch()
diffs = str(repo.git.diff('origin/main', name_only=True)).splitlines()
updated_libraries = []
for diff in diffs:
if diff.startswith("projects"):
path_parts = diff.split("/")
updated_libraries.append(path_parts[1])
return set(updated_libraries)
Great, but what about something a little more complex, like publishing snapshots? NPM doesn't allow for duplicate version numbers, so how would we handle that?
Let's take another look at that npm_publish_snapshot
job.
python scripts/ci.py publish_snapshots
So it'll call the publish_snapshots()
function in our script.
def publish_snapshots():
"""Publishes npm snapshots on libraries with changes"""
npm_command("publish snapshots")
Not super helpful so far. Let's take another look at that npm_command()
function.
def npm_command(command):
"""Runs npm commands depending on input"""
npm_install()
diffs = get_diffs()
for library in diffs:
if command == "publish snapshots":
handle_snapshot_publication(library)
else:
subprocess.check_call(f"npm run {command}-{library}", shell=True)
Now there's an if/else block in our loop. The handle_snapshot_publication()
function should append a unique snapshot version to the changed library, build, then publish it.
def handle_snapshot_publication(library):
"""Updates version with snapshot, builds, and publishes snapshot"""
package_json_path = f"./projects/{library}/package.json"
with open(package_json_path, "r", encoding="UTF-8") as package_json:
contents = json.load(package_json)
version = contents["version"]
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
is_snapshot_version = re.match("\\s*([\\d.]+)-SNAPSHOT-([\\d-]+)", version)
if is_snapshot_version:
version = version.split("-")[0]
contents["version"] = f"{version}-SNAPSHOT-{timestamp}"
with open(package_json_path, "w", encoding="UTF-8") as package_json:
package_json.write(json.dumps(contents, indent=2))
subprocess.check_call(f"npm run build-{library}", shell=True)
subprocess.check_call(f"npm publish --access=public ./dist/{library}", shell=True)
The above function reads the changed library's package.json
file, parses out the version number, and replaces it with the version number plus a timestamp. So a version 1.2.3
becomes version 1.2.3-SNAPSHOT-{year/month/day-hour/minute/second}
(e.g. 1.2.3-SNAPSHOT-20221110-075530
). It also has a check in there is_snapshot_version
in case you're rerunning a job. This will avoid funky versions from being generated, like 1.2.3-SNAPSHOT-{timestamp}-SNAPSHOT-{timestamp}
.
Let's clean that up a little bit to be more singly-minded.
def append_snapshot_version(library):
"""Appends "-SNAPSHOT-" plus timestamp (down to the second) to library version"""
package_json_path = f"./projects/{library}/package.json"
with open(package_json_path, "r", encoding="UTF-8") as package_json:
contents = json.load(package_json)
version = contents["version"]
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
is_snapshot_version = re.match("\\s*([\\d.]+)-SNAPSHOT-([\\d-]+)", version)
if is_snapshot_version:
version = version.split("-")[0]
contents["version"] = f"{version}-SNAPSHOT-{timestamp}"
with open(package_json_path, "w", encoding="UTF-8") as package_json:
package_json.write(json.dumps(contents, indent=2))
def handle_snapshot_publication(library):
"""Updates version with snapshot, builds, and publishes snapshot"""
append_snapshot_version(library)
subprocess.check_call(f"npm run build-{library}", shell=True)
subprocess.check_call(f"npm publish --access=public ./dist/{library}", shell=True)
By now you should be seeing the pattern. You can see from the job outputs that the pipeline is only running the jobs on the changed libraries:
GitHub
GitLab
Enjoy a less painful pipeline with the power of Python! π¦ΈββοΈ
Top comments (3)
As much as I agree this is awesome, I don't see exactly where python shines here. I mean, you could do the same with nodejs, hence avoiding setting up yet another language framework (one less requirement). Linting with a js linter would also be easier in this case.
Let's be clear, I love python and would also choose it above nodejs, but would you care to explain a bit more your arguments in its favor?
PS great article and good job!
You know what, you make a very good point. I opted to go with Python because my team uses a common pipeline that serves many projects written in different languages and different frameworks. I just followed that pattern for this particular scenario and I'm a bit embarrassed to admit that this had not even occurred to me!
I'd be interested in hearing more about these tools. I agree less code is always better. My team went this route because we had pretty custom needs and we felt using our own scripts would be the most straight forward route.