DEV Community

Cover image for Organizing jupyter notebooks (OLD)
Max Myroshnychenko
Max Myroshnychenko

Posted on • Edited on

Organizing jupyter notebooks (OLD)

Jupyter notebooks are great, but I hit a few grave problems once I accumulated a handful.

Some problems you may encounter:

  1. Remembering what was in each one is a chore
  2. It's tempting to pile on more and more code into the same notebook. They grow in complexity and size
  3. They break overtime. If they rely on my outside functions, changes in these functions will break the notebooks.
  4. Since they are not typically included in an IDE, spotting errors such as those in problem 3 is nearly impossible unless you re-run them regularly
  5. Notebook anxiety. As consequence of problems 3 and 4, you are not likely to rerun them, because something surely broke down while you weren't looking!
  6. Some plotting packages' output (holoviews, Altair) are not always saved as a part of the notebook
  7. Sharing results with collaborators takes an extra step of extracting plots and putting them together in something like pdf or html

Ideal solution would:

  1. Automatically save the output of a notebook in a common, shareable format
  2. Include figures, and not just matplotlib based ones
  3. Keep a running list of what's in each notebook
  4. Re-run changed notebooks from scratch to make sure all cells are still runable
  5. Make it easy and stress-free to re-run even ones that didn't change
  6. One command to rerun all relevant notebooks in a project.

Solution: jupyter-book

Jupyter book addresses all of these problems and then some.
For example output, see my small demo. It's very easy to set up - really it's just two commands: jupyter-book create coolproject and jupyter-book build coolproject, but let's go through them in more detail.

How-to

= 1 = Make a new repository (let's call it coolproject) and clone it. Navigate to it in a terminal and then:

pip install jupyter-book
jupyter-book create coolproject
Enter fullscreen mode Exit fullscreen mode

This will make the coolproject folder inside your repo. Navigate to it in your terminal and fire up jupyter:

jupyter notebook
Enter fullscreen mode Exit fullscreen mode

= 2 = Work on some jupyter code. Say you made awesome_code.ipynb, and it's related to the topic wowtopic. Also create wowtopic.ipynbthat only has some text in a markdown cell. You'll use this to introduce the topic. It can be empty for now.
= 3 = In addition to awesome_code.ipynb you created in step 2, you'll find the file toc.yml inside this new folder. Edit it to add awesome_code.ipynb. Since it's related to wowtopic, you'll want toc.yml to have:

- file: wowtopic
  sections:
    - file: awesome_code
Enter fullscreen mode Exit fullscreen mode

= 4 = Compile jupyter code to html: Go to the top folder of your repository and issue the command

jb build coolproject 
Enter fullscreen mode Exit fullscreen mode

That's it! Now, you have instantly shareable html representation of all your notebooks.

View results

To recap, the top-level folder that contains your repo is called coolproject. It has a subfolder coolproject. In it, you have a bunch of jupyter files and toc.yml that organizes them into topics.

The last command above makes all of them into a shareable, interlinked group of html files. Open the file coolproject/coolproject/_build/html/index.html with Chrome or Firefox and never get lost in jupyter files again!

There is an html file in that folder for every jupyter file you put in toc.yml. Share any of them with your collaborators. No need to save figures separately. You can even hide code so it doesn't get in the way. Perfection!

Top comments (0)