DEV Community

Daniel
Daniel

Posted on

The Five Worst Things About Jupyter Notebooks

I used to love Jupyter. I still think they are a wonderful tool for many tasks like exploratory data analysis and presenting insights to colleagues nicely and easily. However, while they are great for data science some of the time, other times they are a headache. Like any software tool, they have their downsides. Here are the five worst things about Jupyter Notebooks for data science:

1. It is almost impossible to practice good code versioning

Jupyter Notebooks are terrible for code versioning. The problem is that they are stored as JSON files, which are basically just a bunch of nested dictionaries. This means that when you try to diff two Jupyter Notebooks, you just get a bunch of meaningless data. This makes working in a team with several notebooks extreme tedious and difficult

2. The non-linear workflow of jupyter - It's best and worst part

Jupyter Notebooks have a non-linear workflow. This is b This means that you can execute cells out of order, which can lead to confusion and errors. This is of course also one of the big selling points for Jupyter, but is only useful for early data analysis and exploration and therefore ends up being a downside more often then not.

3. Jupyter is bad for running long asynchronous tasks

Jupyter is not well suited for running long, asynchronous tasks. This is because Jupyter is designed to keep all cells in a notebook running in the same kernel. This means that if one cell is running a long, asynchronous task, it will block the execution of other cells.

This can be a major problem when you're working with data that takes a long time to process, or when you're working with real-time data that needs to be updated regularly. In these cases, it can be much better to use a tool like Dask, which is designed for parallel computing.

4. Jupyter can be slow

Jupyter can be slow to start up, and it can be slow to execute code. This is because Jupyter is an interactive tool, and it has to load the entire notebook in memory in order to provide the interactive features.
If you're working with large data sets or large notebooks, this can be a major problem. Jupyter is simply not designed to be used with large data sets.

5. No IDE integration

This is just my opinion, but not having linting and code styling warnings is a big downside for Jupyter. IDE features are simply too convinient - like the ability to jump between function declarations, code styling and other features make it a lesser developer experience compared to a full fledged IDE.
Now, this is a bit of a lie because I have been using Jupyter through Pycharm Proffessional, being able to use pycharm's debugger in cells is often the best of both worlds.

One more thing

It's often important to consider where computations are run. For code that’s easy to put into Docker, deploying to a cloud solution is easy. For notebooks, there are also good options, though you’re more locked into specific solutions.

If you want to look into Jupyter notebooks, it’s definitely worth looking into Amazon SageMaker and/or Kubeflow.

In conclusion, Jupyter Notebooks are not the ideal tool for data science projects. They are ideal for prototyping, but for you own sanity, migrate away from them before writing serious production code.

Star our Github repo and join the discussion in our Discord channel to help us make BLST even better!
Test your API for free now at BLST!

Top comments (6)

Collapse
 
git_ilan profile image
Git-Ilan

I also prefer Pycharm!

Collapse
 
dendihandian profile image
Dendi Handian

I guess Pycharm has more stable Jupyter notebook integration than VS Code.

Collapse
 
chainguns profile image
Daniel

It's the best! (sorry vs code lovers <3)

Collapse
 
markpro profile image
oh hi mark

I gatta say I been using Hex.tech, it’s really awesome. Handles a few of these things, esp IDE and seamlessness between sql/Python (no affiliation)

Collapse
 
chainguns profile image
Daniel • Edited

Looks super cool! I'll definitely check it out

Collapse
 
nelsoncardenas profile image
Nelson Cárdenas Bolaño

In VSCode you can write notebooks and use Flake8 or things like that to lint your code and take advantage of the VSCode Extensions, but I recognize notebooks generate a lot of problems.