Docker for setting the developer Environnement

#docker #python #datascience #devops

Setting up a Dev Environment for the datascience

When working on a python project, a common way that people will manage their dependencies is running pip freeze > requirements.txt or pip install -r requirements.txt and coupling that with virtualenv to manage project level dependencies (that is required for maintaining the project environment).

Often, when reproducing the project to somebody else's we have to give the .exe version of the file and that is not good for testing, analysis, it is not enough to run pip install -r requirements.txtin the repository.

However, sometimes there are configurations that are specific system level dependencies that are not captured. As we go along with developing the code, we will install system-level dependencies as required by the package.

To be able to replicate all the system level dependencies, you can see how Docker could easily be used and can be shared through the docker hub just like the containers are sent and then the developer environment can be run.

Now we move to the next stage, which is figuring out how to layer instructions, and put together a Dockerfile to set up a docker container with your exact specifications to replicate your results.

We have observed the following issues:

Somebody builds a tool in a different flavor of python with several package changes (each package has a dependent package which can give different version may be a older version for the new framework installed)
Struggling to manage virtualenvs for different packages, broken virtualenvs or else forgetting it also is a big chance here.

The use case we will explore here, is setting up a python machine with all the tools required for a project, and then setting up a jupyter notebook server within to access all those resources:


#this is just a example

FROM python:3

RUN apt-get update && apt-get install -y python3-pip

COPY requirements.txt .

RUN pip install -r requirements.txt

# Install jupyter
RUN pip3 install jupyter

# Create a new system user
RUN useradd -ms /bin/bash demo

# Change to this new user
USER demo

# Set the container working directory to the user home folder
WORKDIR /home/demo

# Start the jupyter notebook
ENTRYPOINT ["jupyter", "notebook", "--ip=0.0.0.0"]

You requirements.txt should have the list of the packages to install them thus we can have list of the modules required to be installed through pip command will be given through the .txt file

tensorflow==2.4.1
seaborn
scikit-learn 0.24.2
spacy==3.0.6
tfx==0.29.0
agate==1.6.1
asn1crypto==0.24.0
autopep8==1.3.5
Babel==2.9.1
backcall==0.2.0
bleach==3.3.0
census==0.8.17

This is regularly my go to list.

After building all these

In the terminal run:

docker build -t dev_ds_env .

Note full stop for running the environment is a must.
This should set up a jupyter notebook that you will be able to access with all the tools in the requirements.txt file installed. COPy commands will work as a magic here.

docker run -p 8888:8888 dev_ds_env

Now access it from your machine, try localhost:8888. It will ask you to copy and paste the token you were given.

Thank you all