I've always been a bit annoyed at how difficult it can be to avoid shipping test code and dependencies with Python applications. A typical build process might look something like:
- create a virtual environment
- install service dependencies
- install test dependencies
- run tests
- package up code and dependencies into an RPM.
At this point, my service dependencies and test dependencies are intermingled in the virtual environment. To detangle them, I now have to do something like destroy the venv and create a new one, reinstalling the service dependencies.
Regardless of the packaging method, I don't want to pull down dependencies when I deploy my service.
At Twilio, we are in the process of embracing container-based deployments. Docker containers are great for Python services as you no longer have to worry about multiple python versions or virtual environments. You just use an image with exactly the version of Python your service needs and install your dependencies directly into the system.
One thing I've noticed is that while many services are built and packaged as Docker images, few use exclusively Docker-based development environments. Virtual environments and pyenv .python-version
files abound!
I recently started writing a new Python service with the knowledge that this would be exclusively deployed via containers. This felt like the right opportunity to go all in on containers and build out a strategy for Docker-first localdev. I set out with the following goals:
- don't ship tests and test dependencies with the final image
- tests run as part of the Docker build
- failing tests will fail the build
- IDE (PyCharm) integration
A bit of research (aka Googling) suggested that multi-stage builds might be useful in this endeavor. Eventually I ended up with a Dockerfile that looks something like this:
FROM python3 as builder
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY src ./src
FROM builder as tests
COPY test_requirements.txt ./
RUN pip install -r test_requirements.txt
COPY tests ./tests
RUN pytest tests
FROM builder as service
COPY docker-entrypoint.sh ./
ENTRYPOINT ["docker-entrypoint.sh"]
EXPOSE 3000
When building an image from this Dockerfile, Docker will build 3 images, one for each of the FROM
statements in the docker file. If you've worked with Dockerfiles before, you know that statement ordering is critical for making efficient use of layer cacheing, and multi-stage builds are no different. Docker builds each of the images in the order they are defined. All of the intermediate stages are ephemeral, only the last image is output by the build process.
In this case, the first stage (builder
) builds an image with all the service dependencies and code. The second stage (tests
) installs the test requirements and test code, and runs the tests. If the tests pass, the build process will continue on to the next stage. If the tests fail, the entire build will fail. This ensures that only images with passing tests are built! Finally, the last stage (service
) builds on top of our builder
image, adding the entrypoint script, defining the entrypoint command and exposing port 3000.
So how did I do wrt the initial goals?
- don't ship tests and test dependencies with the final image ✓
- tests run as part of the Docker build ✓
- failing tests will fail the build ✓
- IDE (PyCharm) integration ❌
I've met most of the goals, but what about the actual development experience? If I open up PyCharm and import my source code, it complains that I have unsatisfied dependencies :( Fortunately PyCharm Professional has the ability to select a python interpreter from inside a Docker image! Cool, but I have to build the image before I can use its interpreter. But thanks to goal #3, if my tests are failing, I can't build my image...
Lucky for us, we can tell docker build
to build one of our intermediate stages explicitly, stopping the build after the desired stage. Now if I run docker build --target builder -t builder .
, I can select the interpreter from the builder
image.
Uh oh! The builder image doesn't include my test dependencies! Of course, that's the whole point of the builder image. Let's add another stage we can use for running and debugging our tests.
FROM python3 as builder
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY src ./src
FROM builder as localdev
COPY test_requirements.txt ./
RUN pip install -r test_requirements.txt
COPY tests ./tests
ENTRYPOINT ["pytest"]
CMD ["tests"]
FROM localdev as tests
RUN pytest tests
FROM builder as service
COPY docker-entrypoint.sh ./
ENTRYPOINT ["docker-entrypoint.sh"]
EXPOSE 3000
With the localdev
stage, I can build and image with all my service and test code and dependencies. I can even make the localdev container run the tests by default when the container is run. By using the interpreter from this image, I can now debug my failing tests.
Let's take a look again at the initial goals:
- don't ship tests and test dependencies with the final image ✓
- tests run as part of the Docker build ✓
- failing tests will fail the build ✓
- IDE (PyCharm) integration ✓
Hooray!
Except there's one thing still bothering me: changes to the service code trigger a reinstallation of our test dependencies. Yuck! Let's take another whack at our Dockerfile:
FROM python3 as service_deps
COPY requirements.txt ./
RUN pip install -r requirements.txt
FROM deps as test_deps
COPY test_requirements.txt ./
RUN pip install -r test_requirements.txt
FROM deps as builder
COPY src ./src
FROM test_deps as tests_builder
COPY src ./src
COPY tests ./tests
FROM tests_builder as localdev
ENTRYPOINT ["pytest"]
CMD ["tests"]
FROM tests_builder as tests
RUN pytest tests
FROM builder as service
COPY docker-entrypoint.sh ./
ENTRYPOINT ["docker-entrypoint.sh"]
EXPOSE 3000
Ok that seems pretty complicated, here's a graph of our image topology:
service_deps
| \
| -\
| \
| -test_deps
| |
| |
builder tests_builder
| | -\
| | -\
| localdev -\
| --tests
service
I don't love that the builder
and tests_builder
stages both copy over the source directory, but the real question is, does this still meet our initial goals while avoiding excessive re-installs of test dependencies? Yeah, it seems to work pretty well. Thanks to Docker's layer caching, we rarely have to re-install dependencies.
That it! If you have any questions or suggestions, please let me know!
Top comments (3)
This is pretty awesome. I wonder if you could copy from the
builder
stage to thetests_builder
stage by modifyingtests_builder
to copy from thebuilder
stage like so:Disclaimer: I'm not 100% sure this would fit your use case, but it's worth a shot. If you try this out, let us know how it worked for you.
You can definitely do this, but Docker still has to create a new layer for it. Copying it from the builder stage could definitely help if building the src dir is more complicated than copying over a single directory.
That's true, I didn't think of the extra layer Docker would add in the process.