DEV Community

Cover image for The good, the bad, & the ugly: how to share Jupyter notebooks
Kevin White for Deepnote

Posted on • Originally published at deepnote.com

The good, the bad, & the ugly: how to share Jupyter notebooks

Why do data teams need to share Jupyter notebooks? Let us count the ways.

Code reviews, team presentations, delivering insights to non-programmers, general hotdogging (we all need to show off now and then) --- the list goes on and on.

But sharing a Jupyter notebook is problematic (i.e., painful). You end up jumping through annoying, time-consuming hoops just to share the results of your analysis. And the workarounds you use often defeat the purpose of sharing your notebook in the first place.

As with most things in data work, there's a good, bad, and ugly way of getting it done. Let's look at the different ways you can share a Jupyter notebook and weigh the advantages and disadvantages of each approach.

What it means to share

Before we start comparing and contrasting different Jupyter notebook sharing options, let's be clear on what we mean by "share." There's a whole spectrum of sharing, ranging from "I literally just want your eyeballs on this output" to "Go ahead and edit my code."

Sometimes you want people to run your notebook. Maybe you even want them to be able to toggle a few dropdown menus so they can actually explore the data. Other times it's more of a look-but-don't-touch situation.

At the end of the day, we share to collaborate. And successful data collaboration is dependent on:

  • Speed (our ability to share work quickly)

  • Reproducibility (our ability to duplicate it)

  • Interactivity (our ability to work together on it)

Those are the criteria we'll use to evaluate different sharing options. So, without further ado, here are your choices for sharing a data notebook.

Ugly: the file option

Downloading and emailing IPYNB files --- the bane of many a data professional's existence.

Since Jupyter files run on your local machine, you can't simply send someone a link to your notebook. Instead, you have to download the file (which takes way too long) and send it off to a teammate so they can fire it up on their machine (which also takes way too long).

Unfortunately, none of your assets are included. Maybe you add your database password to help speed things along (the exact kind of behavior that keeps your security team up at night), but now your colleague has to set up their environment from scratch. Only after the necessary Python packages have been installed and the environment configuration is complete can they run your notebook.

It's not exactly what you'd call high-speed --- but it's better than exporting your notebook as a static file or copying and pasting screenshots. With those options, reproducibility and interactivity go right out the window.

Say you need to share your notebook with a non-technical stakeholder --- they're not interested in running it, just getting to the insights you uncovered. You download it as a PDF and send it over. Then they spot a mistake that needs to be fixed. Or they have a follow-up request not covered in your analysis. Or your results lead to another question that needs to be answered.

No matter the issue, you'll have to go back to your notebook, rerun it, re-export it, resend it, and repeat until the stakeholder is satisfied. On and on and on. You (and your teammates) get stuck in a never-ending loop of busy work that makes exploring and collaborating on data a grind.

Bad: the view option

The challenge of sharing a Jupyter notebook is nothing new --- that's why there's a cottage industry built around making it easier to view them.

GitHub repositories are a great way to organize static data notebooks and make them accessible to teammates, but therein lies the rub: They're static. Notebooks are rendered on GitHub, not run. Reproducing the work, collaborating in real time, commenting --- none of it is an option. Ditto for tools like nbviewer.

There's also Binder, which claims to help you "turn a Git repo into a collection of interactive notebooks." But "interactive" is a stretch. Once you get past the time it takes to load a repository and the custom libraries you need, you realize notebooks are displayed in isolated environments.

After you close that URL, the notebook goes poof. You may feel like you're sharing the real deal, but as soon as you want to actually do anything collaborative, it instantly falls apart. Tools like this are good for quickly reproducing a notebook, but interactivity is ultimately an illusion.

Good: the cloud option

Cloud-based technology allows you to share fully executable notebooks with a link --- no muss, no fuss.

invite-members.png

This isn't the same as JupyterHub, which is a DIY option that requires organizations to install and manage their own Jupyter notebook servers (i.e., you have to manage everything and deal with computing power and storage limits). Cloud-based notebooks are hosted for you, giving you an easy and scalable way to quickly share and reproduce projects.

But keep in mind that not all cloud-based notebooks are created equal --- the ability to easily share your notebook doesn't always mean you can collaborate on it. Take Google Colab, for instance. You and your teammates can't share the same execution environment simultaneously or leave comments for each other. Each time a person makes an edit and saves the notebook, it reverts to their copy and destroys whatever their colleagues were working on.

Then there's the question of permissions. Not all cloud-based notebooks have the same level of granularity when it comes to who can access notebooks and how they can use them (e.g., run a notebook but not alter the code).

access-controls.png

Your best bet is a cloud-based data notebook that's truly collaborative by design (hint: That's what we made).

sql_python.png

It looks like this:

  • Sharing the same environment with collaborators at the same time, complete with database connections and environment configuration

  • Editing code with collaborators in real time and leaving comments for each other

  • Assigning granular access levels to collaborators, from view-only to full code access and everything in between

  • Giving collaborators a shared workspace where they can easily store, organize, and find their teammates' notebooks to view, work on, or duplicate a project

  • Publishing shareable notebooks as articles, dashboards, and interactive apps with just a click to make sharing insights with stakeholders that much easier

Sharing data notebooks seems like such a simple task, but the truth is it's as complicated as any machine learning model. Combining shared environments and accessibility controls --- and spinning it up in a package that's fast and accessible to people of all technical levels --- is not easy. Some sharing options do a passable imitation, but the devil's in the details.

Data collaboration is a computational puzzle every team has to solve on its own. Next time you need to share a Jupyter notebook, consider a solution that's built for real teamwork.

Simplify sharing Jupyter notebooks with Deepnote

Get started for free to see how easy it is to share and collaborate on data notebooks.

Top comments (0)