When we first started developing Tilt, we broke ALL THE TIME.
Either Kubernetes changed. Or we had a subtle misunderstanding in how the API works. Our changes would pass unit tests, but fail with a real Kubernetes cluster.
I built out an integration test suite that used the latest version of Tilt to deploy real sample projects against a real cluster.
At the start, it was slow and flakey. But the tooling around running Kubernetes in CI has come a long way, especially in the last 1-2 years. Now it's less flakey than our normal unit tests 😬. Every new example repo we set up uses a one-time Kubernetes cluster to run tests against.
A few of our friends have been asking us how we set it up and how to run their own clusters in CI. I've now explained it enough times that I should probably write down what we learned.
Here are three ways to set it up, with the pros and cons of each!
Here's how I set up our first integration test framework.
I created a dedicated gcr.io bucket for us to store images, and a GCP service account with permission to write to it.
I added the GCP service account credentials as a secret in our CI build.
kubeadm-dind-cluster, a set of Bash scripts to set up Kubernetes with Docker-in-Docker techniques.
All our test projects had Tilt build images, push them to the gcr.io bucket, then deploy servers that used these images.
I barely got this working. A huge breakthrough! It caught so many subtle bugs and race conditions.
I wouldn't call the Bash scripts readable. But they are hackable, cut-and-pasteable. There were examples of how to run it on CircleCI and TravisCI.
kubeadm-dind-cluster has been deprecated in favor of more modern approaches like
kind. But I learned a lot from its Bash scripts. We still use a lot of the techniques in this project today.
There were other downsides though:
When drive-by contributors sent us PRs, the integration tests failed. They didn't have access to to write to the gcr.io bucket. This made me so sad. Contributors felt unwelcome. I never figured out a way to make this secure.
We didn't reset the gcr.io bucket between test runs. So it was hard to guarantee that images weren't leaking between tests. For example, if image pushing failed, we wanted to be sure we weren't picking up a cached image from a previous test.
When I revisited this, I wanted to make sure:
Anyone could write to the image registry.
The image registry would reset between runs.
By this time,
kind was taking off as the default choice for testing Kubernetes itself.
kind also comes with the ability to run a local registry, so you can push images to the registry on
localhost:5000 and pull them from inside
I set up a new CI pipeline that:
Creates a VM.
Installs all our dependencies, including Docker.
kindcluster with a local registry, using their script.
This worked well! And because the registry was local, it was faster than pushing to a remote registry. We still use this approach to test
ctlptl with both
kind. Here's the CI config.
But I wasn't totally happy! Most of our team is more comfortable managing containers than managing VMs. VMs are slower. Upgrading dependencies is more heavyweight. We wondered: can we make this work in containers?
The last approach (and the one we use in most of our projects) uses some of the tricks that
The CI pipeline:
Creates a container with our code.
Sets up a remote Docker environment outside the container. (This avoids the pitfalls of running Docker inside Docker.)
kindcluster with a local registry inside the remote Docker environment.
socatnetworking jujitsu to expose the remote registry and Kubernetes cluster inside the local container.
socat element makes this a bit tricky. But if you want to fork and hack it, check out this Bash script.
But once it's set up: it's fast, robust, and easy to upgrade dependencies.
Hacking together this with Bash was the hard part.
ctlptl, a CLI
for declaratively setting up local Kubernetes clusters.
I eventually folded all the logic in the Bash script into
ctlptl. As of
ctlptl 0.5.0, it will try to detect when you have a remote docker environment and set up the
The Go code in
ctlptl is far more verbose than the Bash script, comparing number of lines. But it includes error handling, cleanup logic, and idempotency, which makes it more suitable for local dev. (CI environments don't need any of this because we tear them down at the end anyway.)
We use image-management tools that auto-detect the registry location from the cluster, which helps with the configuration burden. I like the general trend of Kubernetes as a general-purpose config-sharing system so that tools can interoperate, rather than having to configure each tool individually.
We currently use
ctlptl to set up clusters and test the services on real Kube clusters in all of our example projects.
It's been a long journey! But I hope the examples here will make that journey a lot shorter for the next person 🙈.