Setting up a Dev Environment for the datascience
When working on a python project, a common way that people will manage their dependencies is running
pip freeze > requirements.txt or
pip install -r requirements.txt and coupling that with
virtualenv to manage project level dependencies (that is required for maintaining the project environment).
Often, when reproducing the project to somebody else's we have to give the .exe version of the file and that is not good for testing, analysis, it is not enough to run
pip install -r requirements.txtin the repository.
However, sometimes there are configurations that are specific system level dependencies that are not captured. As we go along with developing the code, we will install system-level dependencies as required by the package.
To be able to replicate all the system level dependencies, you can see how Docker could easily be used and can be shared through the docker hub just like the containers are sent and then the developer environment can be run.
Now we move to the next stage, which is figuring out how to layer instructions, and put together a Dockerfile to set up a docker container with your exact specifications to replicate your results.
We have observed the following issues:
- Somebody builds a tool in a different flavor of python with several package changes (each package has a dependent package which can give different version
may be a older version for the new framework installed)
- Struggling to manage
virtualenvsfor different packages, broken
virtualenvsor else forgetting it also is a big chance here.
The use case we will explore here, is setting up a python machine with all the tools required for a project, and then setting up a jupyter notebook server within to access all those resources:
#this is just a example FROM python:3 RUN apt-get update && apt-get install -y python3-pip COPY requirements.txt . RUN pip install -r requirements.txt # Install jupyter RUN pip3 install jupyter # Create a new system user RUN useradd -ms /bin/bash demo # Change to this new user USER demo # Set the container working directory to the user home folder WORKDIR /home/demo # Start the jupyter notebook ENTRYPOINT ["jupyter", "notebook", "--ip=0.0.0.0"]
requirements.txt should have the list of the packages to install them thus we can have list of the modules required to be installed through pip command will be given through the
tensorflow==2.4.1 seaborn scikit-learn 0.24.2 spacy==3.0.6 tfx==0.29.0 agate==1.6.1 asn1crypto==0.24.0 autopep8==1.3.5 Babel==2.9.1 backcall==0.2.0 bleach==3.3.0 census==0.8.17
This is regularly my go to list.
After building all these
In the terminal run:
docker build -t dev_ds_env .
full stop for running the environment is a must.
This should set up a jupyter notebook that you will be able to access with all the tools in the
requirements.txt file installed.
COPy commands will work as a magic here.
docker run -p 8888:8888 dev_ds_env
Now access it from your machine, try
localhost:8888. It will ask you to copy and paste the token you were given.
Thank you all
Top comments (0)