If you are a data scientist or machine learning software engineer as I am, you probably write most of your code in Jupyter Notebooks. For those of you who are not -- Jupyter is a great system that allows you to combine markdown-based text and executable code in one web-based and web-editable document called notebook.
The way most data scientists work is to install their own copy of Python dev environment (such as Anaconda or even better Miniconda) on their computer, start Jupyter server and then edit/run code on your own PC/Mac. In more demanding situations, your dev environment can be hosted on some high-performance compute server, and accessed through the web. However, using cloud resources as notebook dev environment sounds like even better alternative. This is exactly what Azure Notebooks are --- public Jupyter server hosted in the Azure cloud, which you can use from anywhere through the browser to write your code.
There are many advantages of using Azure Notebooks instead of local Jupyter installation, and I will try to cover them here.
Whether you want to learn Python, or do some experimentation with F# --- you typically need to install your development environment, be that Visual Studio or Anaconda. This requires some time and disk space, and while it will probably pay off if you are into serious development --- spending time on environment setup is not something you want to do just to try a piece of code, or if you drop by to a party at your friend's house and want to show them your latest data analysis result. In those cases, you can just log into online Notebook environment and start coding right away in any of the supported languages: Python 2 and 3, R or F#.
Coming back to the point of visiting your friend's house --- you probably want to be able to have access to your code immediately, without the need to carry the USB stick or download it from OneDrive. Azure Notebooks allow you to keep all your projects online. Notebooks are organized into Projects, which are similar to GitHub repositories, but without version control, and any project can be made either private, or public.
Azure Notebooks is a great way to share the code with other people. Each project has a unique link, and if you want to share it -- just use this link (also making sure the project is marked as public). With the link other people will be able to:
- view your code
- clone it into their own copy of your project, and start playing and editing notebooks online
Because, unlike in Google Colab, you share on per-project basis, you can share several notebooks through one link, and you can also include data, README, and other useful information into the project.
If you need to configure your Python environment in some way or install specific packages - you can also do it through config files, or including
pip install commands into the notebook. Installing packages on F# notebooks is done through
Paket manager, and is documented here.
Notebook is a great way to add thorough instructions to your code, or to add executable code to your text. There are plenty of scenarious where it can be useful:
- Writing instructions or explaining some concepts that are related to algorithms. For example, if you want to explain what affine transformation is, you can provide the explanation first (which can also include formulae, because Azure Notebooks support LaTeX), and then include some executable examples of applying affine transformation to a sample picture. Readers would not only be able to see how the code works, but they will be able to modify the code in place and further play with it
- Writing a text with some arguments supported by data, for example, in data-driven journalism or computational journalism. You can write an article in the form of an Azure Notebook, which will automatically gather data from public sources, produce some live graphs, and compute resulting figures that will be used to drive reader to the conclusion.
One great feature that differentiates Azure Notebooks from all other similar services is the pre-installed RISE extension, which allows you to make presentations. You can mark cells as separate slides, or as continuations to previous slide - and then in presentation mode it will look like animation.
While you will not be able to make fancy presentations in terms of design, but for many cases content and simplicity matter. Azure Notebooks are especially great for academic-style presentations, especially because you can use LaTeX formulae inside any text.
If you are the owner of a GitHub-hosted Python project, you probably want to give people the simple opportunity to try your project out. One of the easiest way to do so is to provide a set of notebooks, which you will then be able to clone immediately in Azure Notebooks. Azure Notebooks support direct cloning from any GitHub repository - so all you need to do is place a piece of code into the
Readme.md file for your repo:
<a href="https://notebooks.azure.com/import/gh/<git_user>/<repo>"> <img src="https://notebooks.azure.com/launch.png" /></a>
In most of the data science tasks, you first spend some time writing your code and making it work on small-scale data, and then you run it using more powerful compute options. For example, if you need GPU for neural network training, it may be wise to start development on non-GPU machine, and then switch to GPU once your code is ready.
Azure Notebooks allow you to do it seamlessly. When starting/opening a library, you can chose to run it on Free compute, or you can select any compatible virtual machine from an Azure subscription associated to your account (by compatible I mean Data Science Virtual Machine under Ubuntu). So, in most of my tasks, I will start with free compute option, develop most of my code, and then switch to the VM. Azure Notebooks will make sure that the same project environment (including notebooks and other project files) is transferred (or, to be more precise, mounted) to the target VM and used there seamlessly.
I have to mention that the free compute option that you are getting is also quite good, with 4 Gb of memory and 1 Gb of disk space.
I personally teach a couple of courses at University, and I find Azure Notebooks to be extremely useful in teaching. I also have to give a lot of presentations, labs, workshops as a Cloud Advocate. Here's how you can use Notebooks:
Give lectures using presentation feature. I find Azure Notebooks slides far easier to maintain and write, because you do not need to focus on design, but on content, which stays in markdown format. Often it is much easier to manipulate plain text, and inserting mathematical formulae in TeX is far easier than using Word Equations. On the down side, inserting diagrams and pictures is more painful, so do not use Notebooks for marketing presentations.
Write Textbook with Samples If you write slides in Notebooks, you can also include some additional text which is not included into the slides, but which elaborates more on the topic. Thus the same notebook can be used both as slides and as a textbook. Moreover, such a textbook will include executable examples, which act as demos, or starting point for student's own work.
Give Labs/Exams. Prepare all the materials and initial code for your lab/exam into one Azure Notebooks project, and then share it immediately with all students through one link. Your students will then clone the code, and start working on it. To collect results, you can get their individual project links from students (if the work is not precisely time-bound), or ask them to send you or upload
.ipynb-file somewhere (if you want to make sure that student finished working on his code on time).
I also use more advanced things like Azure Functions to collect the results from labs, but that is another story, which I will probably also share one day...
While Azure Notebooks is a great tool, there are some things that you need to keep in mind when using them:
- Network access is somehow limited. Because with Azure Notebooks you are getting some free compute, it would not be very wise to let you do whatever you want on the internet, like sending spam. For this reason, network access inside Azure Notebooks is limited to a certain set of resources, that include all Azure resources, GitHub, Kaggle, OneDrive, and some more. So, if you want your notebook to get some data from the outside, you may want to place the data on GitHub/OneDrive, or upload it manually into the project directory of a notebook through the web interface.
- Inserting Images/Diagrams into the text. Because you enter all the text as plain text, you lack the simple PowerPoint/Word features of copy-pasting images and drawing diagrams. So, if you want to insert a picture or a diagram, you need to export it to JPEG/PNG, upload somewhere on the internet (I use GitHub most of the time), and insert it into the text using Markdown syntax. So, if you are creating a marketing/motivation presentation -- use PowerPoint, and for scientific/academic presentation or a university course -- use notebooks.
- GPU Access is not included into Free Compute options for Azure Notebooks at this time.
A good way to learn about specific features of Azure Notebooks, like installing packages, getting external data and drawing a graph, is to look at the sample notebooks. A great collection of Jupyter samples can be found here, and remember -- you can run any Jypyter notebook in Azure by uploading
.ipynb file into a project.
Azure Notebooks are a great tool which can be useful in many scenarious that I have tried to cover here. If you have some more use-cases - feel free to share them in the comments, I would be very interested to know!
While while there are some other options for running notebooks in the cloud, including Google Colab and Binder, the comparison shows that Azure Notebooks is a great tool that has the most useful features.
I hope you will enjoy using Azure Notebooks in your everyday life as much as I do!
P.S. Some more useful documentation may be found here: http://aka.ms/aznb