The Goal π₯
Over the past month, I developed a containerized Streamlit webapp in Python that I then deployed manually to AWS. With a proof-of-concept in place, it is time to start automating the testing, building, and deployment of my application and its infrastructure.
The goal of this automation step is to push a new container image to an Amazon Elastic Container Registry (ECR) repository whenever changes to the application files are committed and pushed to GitHub. There's a lot of new stuff to learn in this project, so I stuck with a familiar CI/CD platform: GitHub Actions.
Planning π
From the manual deployment, I knew I would need to:
- login to Amazon ECR;
- build the application image from the
./app
directory; - tag the image with the correct registry and repository;
- and push the image so that it can be used in a cluster.
I started by logging into the AWS console and creating a public ECR repository in my dev account. (The POC is deployed in my production account.) This choice - a public repo vs. a private repo - will become important soon.
I also created a 'dev' branch on my git repository, so that I could push and test the workflow without committing code to the 'main' branch.
Taking (ok, finding) Action(s) π―
While it's possible to simply run bash commands on the virtual machine that is provisioned for an Action workflow run, there are a lot of community-authored actions available that abstract away complex API calls and simplify automated tasks. My first stop was the GitHub Marketplace to find actions related to Docker and ECR.
After getting a handle on my (many) options, I searched for and found a couple blog posts about this process, to see how other people had set up their workflows. I took some notes, discovered a few features I wanted to incorporate into my process, and read some documentation. The heavy lifting would be done with the docker/build-push-action
which has a lot of features and builds images with Docker's Buildx. Down the rabbit hole I go...
Authentication and Authorization βοΈ
There are several ways to authenticate to AWS. The simplest of these is to provide an ID and a Secret Key for programmatic CLI access. The challenges to this method are safeguarding the AWS credentials and ensuring least privileged access to the entity utilizing those credentials. I discovered an alternative way to grant access to AWS resources: assuming an IAM role using GitHub's Open ID Connect (OIDC) provider.
Setting up the provider in the IAM console was not difficult. I did spend quite a few minutes figuring out how to get OpenSSL to output the thumbprint of GitHub's security certificate, since it was recommended to validate the thumbprint that the IAM console calculated. Having accomplished that, I created a role and attached a permissions policy and trust policy that limited that role's access. With the role's ARN entered as a GitHub repository secret, I was able to add a credentials block to my growing YAML workflow configuration:
- name: Configure dev AWS credentials
uses: aws-actions/configure-aws-credentials@v1
if: github.ref == 'refs/heads/dev'
with:
role-to-assume: ${{ secrets.DEV_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
role-session-name: DevSession
- name: Configure prod AWS credentials
uses: aws-actions/configure-aws-credentials@v1
if: github.ref == 'refs/heads/main'
with:
role-to-assume: ${{ secrets.PROD_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
role-session-name: ProdSession
I used the if
conditional in the hopes of setting up this workflow to run on both the 'dev' and 'main' branches of my repo. While the condition works, I ended up not using these actions because of the second step in the authorization process: logging into the ECR repository.
The action I initially chose for this seemed straightforward enough: without having to enter any inputs, it would take the credentials passed back from the OIDC provider to grant access to the desired repo. However, after assembling the rest of the workflow and having a few runs fail, I dug into this action's open issues and discovered that it appears to only be authored to access private ECR repos. π€¦
No matter: there are other actions to log into container image repositories. I followed a suggestion and looked at Docker's login action, which presented options for both private and public ECR repos. Unfortunately, the public repo option does not make use of the configure-aws-credentials
action, which meant that - for now - the work I did to set up OIDC was for naught. I created an IAM user with limited permissions in my dev account, passed the credentials into GitHub Secrets, and I was almost out of the woods:
- name: Login to Amazon Public ECR
uses: docker/login-action@v1
with:
registry: public.ecr.aws
username: ${{ secrets.DEV_ACCESS_KEY_ID }}
password: ${{ secrets.DEV_SECRET_ACCESS_KEY }}
# env:
# AWS_REGION: us-east-2
As it turned out, this login action didn't work with the region input active; with no region specified, login worked and I could move on.
Secrets Are Hard π₯΅
I ran up against a seemingly intractable problem as I was going over the image build process: my application depends on an API token being passed in via a .toml file within the app's directory structure. The .toml language, which is a human-readable way to define configuration, has no way to access environment variables. I didn't want to commit the config file with my API token hard-coded to GitHub, and after more than an hour of research, I was at a loss for how to insert that value appropriately before building the image.
After sleeping on this problem, I came up with a simple solution that keeps the API key protected. In the same directory as the config.toml
file (which does not get committed or pushed, thanks to .gitignore), I created a copy of that file called config.template
. Where the hard-coded token would go, the .template
file reads 'TOKEN_PLACEHOLDER'. I passed the API token to the workflow runner as an environment variable, and use a sed
command to substitute in the token and create config.toml
in the directory structure on the runner before building the image:
- name: Create config.toml file
env:
TOKEN: ${{ secrets.MAPBOX_API_TOKEN }}
run: sed "s/TOKEN_PLACEHOLDER/$TOKEN/g" ./app/.streamlit/config.template > ./app/.streamlit/config.toml
Almost Done... Don't Forget the Cache! π΅
One of the interesting features of the build-push-image
action is the ability to cache the container layers. GitHub allows up to 10GB of cached data per repository, and persisting the container layers means faster build times after the first run.
Fortunately, all of the necessary inputs and file paths have already been published by the action's authors, so setting up that process (and moving the cache after the build to keep it from growing to the maximum limit) was an easy addition to the build action:
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Cache Docker layers
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
- name: Build Docker image
uses: docker/build-push-action@v2
with:
context: ./app
builder: ${{ steps.buildx.outputs.name }}
push: true
tags: ${{ secrets.DEV_ECR_REGISTRY }}/skyboy:latest
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache-new
- name: Move cache
run: |
rm -rf /tmp/.buildx-cache
mv /tmp/.buildx-cache-new /tmp/.buildx-cache
Wrap-up π¦₯
(Yeah, it's a sloth. Why not?)
Over the course of this process, the workflow ran ten times, with two successful runs. The first took 2m 37s to complete, and the second - after attempting and failing to re-implement use of the OIDC provider again - took only 1m 7s, proving the benefit of layer caching.
The final modification I made to my workflow configuration was to modify the trigger:
on:
push:
branches:
- 'dev'
paths:
- 'app/**'
The paths
syntax is a clever way to prevent this workflow from triggering unless changes are pushed to any of the files in the app
directory. I tested this after adding that syntax by editing and pushing the repo's README.md
file. Since the README file is in the root directory, the workflow was not triggered.
This was quite the journey, sending me down several deep rabbit holes and throwing plenty of errors to troubleshoot. I'd like to figure out how to make this single workflow configuration function on both the 'dev' and 'main' branches; I have a couple ideas to explore in that regard. I would also like to find a way to use the OIDC provider to authenticate to AWS. I imagine there are some other best practices that might be good to implement as well. For now, in the spirit of having an MVP, I'm pleased that this workflow runs successfully!
Next up: provisioning the webapp's AWS infrastructure with Terraform.
Onward and upward!
Top comments (2)
Outstanding writeup. Your path toward a solution was typically messy, as these things always are. You know you ultimately found the right path though, because your solution is elegant, understandable, and effective. Well done, friend.
(Someone really needs to hire you now; whoever does will have found a winning gem of a βjuniorβ engineer, who Iβve seen regularly outperform people with supposedly more experience.)
Great writeup, lots of good lessons learned, had no idea you could cache 10gb per repo on GitHub! Obviously I am biased but your MVP covers a lot of what you get out of box with Shipyard, would love to hear from you (holly@shipyard.build).