loading...

How to setup Kaggle Python Docker Image using Docker Desktop for Mac

melvinkokxw profile image Melvin Kok ・3 min read

While Kaggle has published a great guide on how to use their Docker images, the instructions for MacOS are slightly outdated and require some modifications to work properly.

The problem lies in that the guide Kaggle published uses docker-machine, but Docker has removed docker-machine from the later versions of Docker Desktop. The fix is simple, but it would've saved me 30 minutes of Googling if someone wrote about this problem – so here I am.

This is not a replacement for any of the great guides to setting up Docker for Data Science out there. I still recommend reading Kaggle's guide for a better understanding of the process – I just aim to point out the steps that one can use with the updated Docker Desktop for Mac.

Instructions

Updated 17 May 2020

Step 1:

Install Docker Desktop for Mac from here.

Step 2:

Start up Docker and adjust the VM preferences. This menu can be found by clicking on the Docker menubar icon and selecting "Preferences...".

The recommendation is to increase the CPU count, disk size and memory to allow the VM to better handle data science operations.

Step 3:

Pull the image you wish to use. You can get the Kaggle Python image by running

$ docker pull kaggle/python

This step will take a while, as the image is quite large and takes time to download.

Step 4:

Put these lines in your .bashrc or .zshrc or whatever equivalent file:

# Kaggle Docker shorthand functions
kpython(){
  docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python python "$@"  
}
ikpython() {
  docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python ipython
}
kjupyter() {
  (sleep 3 && open "http://localhost:8888")&
  docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
}

These shorthand functions allow you to use kpython as a replacement for calling python, ikpython instead of ipython, and run kjupyter to start a Jupyter Notebook session. These will be done using the specified Docker image, which in this case is kaggle/python. Replace the image name if necessary!

The change I've made to these functions is replacing $(docker-machine ip docker2) in the original instructions to localhost.

Optional adjustments:

Default browser where Jupyter Notebook is opened

Running kjupyter opens http://localhost:8888 on your default browser.
If you want it to open in a different browser, add "<browser app name>" after open. For example, if I wish to open the link in Microsoft Edge instead:

kjupyter() {
  (sleep 3 && open "Microsoft Edge" "http://localhost:8888")&
  docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
}

If you do not want it to automatically open the link, remove (sleep 3 && open "Microsoft Edge" "http://localhost:8888")& from the code.

Sleep time

The kjupyter function holds for 3 seconds before opening the link to allow the Jupyter Notebook session to start up. However, you may find that your session takes longer/shorter to start up. Simply adjust sleep 3 to whatever delay you prefer.

Final notes

I hope these instructions aren't a duplicate of someone else's out there. Hopefully Kaggle updates the guide provided on their GitHub for the updated Docker Desktop and this will no longer be relevant.

Do contact me if there are any problems with the instructions.

References

How to get started with data science in containers
– Originally written by Jamie Hall and posted by the Kaggle Team. I got the bulk of the instructions from here.
How to setup a Data Science workflow with Kaggle Python Docker Image on Laptop – I got the bash functions from this article.

Posted on by:

Discussion

pic
Editor guide