Setting up a Dev Environment for the datascience
When working on a python project, a common way that people will manage their dependencies is running pip freeze > requirements.txt
or pip install -r requirements.txt
and coupling that with virtualenv
to manage project level dependencies (that is required for maintaining the project environment).
Often, when reproducing the project to somebody else's we have to give the .exe version of the file and that is not good for testing, analysis, it is not enough to run pip install -r requirements.txt
in the repository.
However, sometimes there are configurations that are specific system level dependencies that are not captured. As we go along with developing the code, we will install system-level dependencies as required by the package.
To be able to replicate all the system level dependencies, you can see how Docker could easily be used and can be shared through the docker hub just like the containers are sent and then the developer environment can be run.
Now we move to the next stage, which is figuring out how to layer instructions, and put together a Dockerfile to set up a docker container with your exact specifications to replicate your results.
We have observed the following issues:
- Somebody builds a tool in a different flavor of python with several package changes (each package has a dependent package which can give different version
may be a older version for the new framework installed
) - Struggling to manage
virtualenvs
for different packages, brokenvirtualenvs
or else forgetting it also is a big chance here.
The use case we will explore here, is setting up a python machine with all the tools required for a project, and then setting up a jupyter notebook server within to access all those resources:
#this is just a example
FROM python:3
RUN apt-get update && apt-get install -y python3-pip
COPY requirements.txt .
RUN pip install -r requirements.txt
# Install jupyter
RUN pip3 install jupyter
# Create a new system user
RUN useradd -ms /bin/bash demo
# Change to this new user
USER demo
# Set the container working directory to the user home folder
WORKDIR /home/demo
# Start the jupyter notebook
ENTRYPOINT ["jupyter", "notebook", "--ip=0.0.0.0"]
You requirements.txt
should have the list of the packages to install them thus we can have list of the modules required to be installed through pip command will be given through the .txt
file
tensorflow==2.4.1
seaborn
scikit-learn 0.24.2
spacy==3.0.6
tfx==0.29.0
agate==1.6.1
asn1crypto==0.24.0
autopep8==1.3.5
Babel==2.9.1
backcall==0.2.0
bleach==3.3.0
census==0.8.17
This is regularly my go to list.
After building all these
In the terminal run:
docker build -t dev_ds_env .
Note full stop
for running the environment is a must.
This should set up a jupyter notebook that you will be able to access with all the tools in the requirements.txt
file installed. COPy
commands will work as a magic here.
docker run -p 8888:8888 dev_ds_env
Now access it from your machine, try localhost:8888
. It will ask you to copy and paste the token you were given.
Thank you all
Top comments (0)