Maybe we still don't have a definitive answer, but at least we have an observational study. The link points to a summary page that have a link to the paper. But I'll summarize the summary below. 😄
After observing 5 Data Scientists at work and interviewing 15 more the authors identified 9 major pain points with computational notebooks. But I'll list only a subset:
- Setup of the notebook, libraries and data sets.
- Exploration and visualization.
- Writing code.
- Version management.
- Sharing and collaboration.
- Reproduction and reuse.
- Production deployment.
If you're a versed Data Scientist you're probably already either giggling or facepalming. Because the list above basically enumerates all the steps in your usual workflow and we have issues with every single one of them!
I now have a question "what's right with computational notebooks?" 😄
And yet the notebooks are indispensable for Data Analysis and present huge improvement upon traditional purely code-centric development. Still we have a lot to borrow from "the old ways" in particular advanced IDE features and integration with Version Control and Continuous Integration Systems.
The paper authors say there's a huge demand for new advanced tools integrating both IDE features and Notebook features. And I have an impression JetBrains are evaluating such an opportunity and maybe even already designing or developing a Data Science tailored IDE with Notebook features and support for R language and major libraries. 😊
UPDATE. Apparently there's already an R language plugin for JetBrains IDEs that JetBrains now officially support and improve. And it indeed brings some Notebook features into a full-featured IDE experience. Yet still I think they might go for a separate Data Science-oriented IDE product.