Install Anaconda3
- Go to Anaconda's website and install Anaconda for your operating system.
- After Anaconda is installed, then make sure that Python3 and Pip3 are on your PATH variable. In a terminal run
which python3
and thenwhich pip3
. Those commands should return a file path to your Anaconda installation. -
conda
is Anaconda's package manager. Make sure that your conda installation worked:conda --version
. If that returns a version number, then you have installed Anaconda3 and conda correctly.
Run JupyterLab inside a virtual environment
(RECOMMENDED)
Why JupyterLab instead of Jupyter Notebooks?
JupyterLab builds on Jupyter Notebooks and includes more features. In other words, JupyterLab is the next generation of Jupyter Notebooks.
Why use virtual environments?
Every data science project you do will require some combination of external libraries, sometimes with specific versions that differ from the specific versions you used for other projects. If you were to have a single Python installation, these libraries would conflict and cause you all sorts of problems.
The standard solution is to use virtual environments, which are sandboxed Python environments that maintain their own versions of Python libraries (and, depending on how you set up the environment, of Python itself).
As a matter of good discipline, you should always work in a virtual environment, and never using the "base" Python installation.
(Source: Data Science from Scratch, pages 37 and 41)
Step 1: Create an environment.yml
file in your project root directory
You can create and activate virtual environments with conda. (See Managing environments.)
First, create an environment.yml
file in your project root directory. Here is an example of an environment.yml
file:
name: name-of-virtual-environment
channels:
- conda
- conda-forge
dependencies:
- python=3.9.5
- pip
- python-graphviz
- pip:
- scikit-learn==0.24.0
- pandas==1.2.0
- seaborn==0.11.1
- imbalanced-learn==0.7.0
- numpy==1.19.5
- matplotlib==3.3.3
- kmodes==0.11.0
- six==1.15.0
- pydotplus==2.0.2
- statsmodels==0.12.1
- ipykernel==6.15.2
- jupyterlab==3.4.5
NOTES:
- Replace
name-of-virtual-environment
with the name that you want to give to your virtual environment. - Notice the
ipykernel
andjupyterlab
packages at the end of the above list. I will give more details about those later.
This next environment.yml
example is for web development, so you can skip this if you only want to know about data science.
Just as a reference, here is an environment.yml
file that could be used with a Python backend for web development, if you are not using Docker as your development environment:
name: name-of-virtual-environment
dependencies:
- python=3.6
- pip=21.0
# You need to list pip as a dependency and then list any
# packages that need to be installed by pip after all the
# dependencies that are to be installed by conda.
# https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#using-pip-in-an-environment
# https://stackoverflow.com/questions/61715343/solving-environment-failed-when-trying-to-set-up-python-virtual-envrironment
- pip:
- fastapi
- uvicorn
- aiofiles
- plotly==4.14.3
- pandas==1.1.5
Step 2: Create your virtual environment
In a terminal window cd
into your project root directory and run:
conda env create -f environment.yml
This will install the virtual environment that is specified in your environment.yml
file. In case you see a prompt asking you to confirm before proceeding, type y
and press Enter to continue creating the environment. Depending on your system configuration, it may take a while for the process to complete.
Step 3: Verify that the new virtual environment was installed correctly
In your terminal type:
conda env list
You should see the name of your virtual environment in that list, which is the name
specified in your environment.yml
file.
Step 4: Activate your virtual environment
conda activate <name-of-virtual-environment>
NOTE: Replace <name-of-virtual-environment>
with the name that you gave to your virtual environment.
Once your virtual environment has been activated, your command prompt should be prefixed with the name of your virtual environment. For example:
(name-of-virtual-environment) ~/data-science-projects$
Step 5: Run JupyterLab inside your virtual environment
Run the following command inside your activated virtual environment:
jupyter-lab
For example:
(name-of-virtual-environment) ~/data-science-projects$ jupyter-lab
When you run jupyter-lab
the JupyterLab server will run and an instance of JupyterLab will open up in a browser window. The following are a couple of indicators that you are running JupyterLab from inside a virtual environment:
- The output in the terminal where you ran
jupyter-lab
should have the following two lines (which indicate that JupyterLab is being served from the/path/to/anaconda3/envs/
directory, which is where the virtual environments are stored):
[I 2022-09-07 09:31:23.835 LabApp] JupyterLab extension loaded from /path/to/anaconda3/envs/<name-of-virtual-environment>/lib/python3.9/site-packages/jupyterlab
[I 2022-09-07 09:31:23.835 LabApp] JupyterLab application directory is /path/to/anaconda3/envs/<name-of-virtual-environment>/share/jupyter/lab
- When JupyterLab first loads up in your browser, under the "Launcher" tab you will see options for "Python 3 (ipykernel)" under the "Notebook" and "Console" headings.
So what is ipykernel
?
The Jupyter Notebook and other frontends automatically ensure that the IPython kernel is available. However, if you want to use a kernel with a different version of Python, or in a virtualenv or conda environment, you’ll need to install that manually. (Source Installing the IPython kernel)
Since we are using JupyterLab, instead of Jupyter Notebooks, we are able to run it inside of a virtual environment in a pretty simple and automated way. I don't think that is possible with Jupyter Notebooks, but I could be wrong.
If you need to use Jupyter Notebooks, then I think you will have to install ipykernel
and run Jupyter Notebooks inside a virtual environment a different (and more manual) way. The Preface of the book "Data Science for Marketing Analytics" has a good explanation of the process you would have to use.
How to deactivate a virtual environment
When you are done working on a particular project you can deactivate the virtual environment with:
conda deactivate
See Deactivating an environment
How to delete a virtual environment
First make sure your environment is deactivated. Then run this command:
conda env remove --name <name-of-virtual-environment>
Make sure to replace <name-of-virtual-environment>
with the name of the virtual environment that you want to delete.
You can verify that your virtual environment has been deleted by running:
conda env list
You should not see your virtual environment listed.
How to install JupyterLab and run it outside of a virtual environment
(NOT RECOMMENDED)
I do not recommend running JupyterLab outside of a virtual environment. However, I wanted to share an example of how to setup a data science project that does not use a virtual environment to show how it compares to the proper way (above).
Step 1: Install JupyterLab with pip
:
pip install jupyterlab
Note: If you install JupyterLab with conda or mamba, it is recommended to use the conda-forge channel.
Step 2: Check that JupyterLab installed correctly:
jupyter-lab --version
Step 3: Once installed, launch JupyterLab with:
jupyter-lab
Top comments (0)