After building the pipeline, I wanted to run my software on a brand new server, where I hadn't setup Python and the libraries my program needs like pip, pandas and so on.
I was wondering how could I make sure my program would work exactly the same way on the server, or on some other developer's laptop, without having to worry about what version of Python they have. This is where Docker containers are very useful, and I thought it would be a great opportunity to combine Docker technology with my project.
There are some amazing tutorials on Docker that are great for beginners, like this video crash course, and this article that focuses on Docker in Data Science. Since I still have a lot to learn myself, my post is not going to be a complete Docker guide, and I recommend you to read more about it and get comfortable with this concept.
Let's cover the basics briefly:
- Image = a blueprint of our application. Imagine you can take a snapshot of your program after clicking the "run" button. The image is built from a Dockerfile that has all the commands that are necessary to run your program, and that copies the necessary libraries and dependencies. Inside the image the application has everything it needs in order to work properly (operating system, application code, system tools etc).
- Container = an instance of our image. As explained on the Docker website:
"Container images become containers at runtime...[and they] isolate software from its environment..."
This post will teach you:
- How to build a Docker file that copies anything needed to run your Python project.
- How to create an image from that Docker file.
- How to run this image in a Docker container.
All the instructions are following my previous post about the weather data pipeline that I created.
Now the only thing you have to install is Docker Desktop on your computer.
The Dockerfile is the recipe of commands that are building our image.
I created a file called "Dockerfile" without any extensions:
- Line 1: Base image. This preconfigured Docker image has Python and pip pre-installed and configured.
- Line 3: Set the working directory in the container:
- Line 5: Copy all the dependencies from your project's requirements.txt. Keep in mind that each line of a Dockerfile creates a separate layer. Docker knows when the input to any given layer has changed, so when your application code changes but your requirements do not, Docker doesn't waste time downloading all your requirements.txt every time you build the image.
COPY requirements.txt ./
- Line 6: Reduce the size of your Docker image. Disabling cache allows to avoid installing source files and installation files of pip.
RUN pip install --no-cache-dir -r requirements.txt
- Line 8: Copy the project's folder into an image layer. My Python files are stored locally in a folder called src.
COPY src/ .
- Line 9: Make a new directory called data_cache. This is where all the current weather data will be stored after retrieving it.
RUN mkdir data_cache
- Line 11: Run main.py by default when the container is started. CMD tells Docker what commands to execute.
CMD [ "python", "main.py" ]
Run this to build the Docker image:
$ docker build . -t <yourDockerID>/weather
The following command will run your image. As I mentioned before, the API key you get from OpenWeatherMap has to remain private, and that's why I showed previously how to make it an environment variable.
Now instead of using the .env file to set the environment variable, we can use Docker's command --env.
$ docker run --env api-token=<yourapifromOpenWeatherMap> -it <yourDockerID>/weather
Another way to see our image is to run on the command line:
$ docker images
Now you can proudly say that you have built your own image of your Python application. I uploaded mine on DockerHub so you can just download it and use it for retrieving weather data from web API.