As I've been doing more and more dotnet development in notebooks with Polyglot Notebooks I've found myself wanting more options to create rich data visualizations inside of VS Code the same way I could use Plotly for data visualization using Python and Jupyter Notebooks.
Thankfully, Microsoft SandDance exists and fills some of that gap in terms of doing rich data visualization from dotnet code.
In this article I'll talk more about what SandDance is, show you how you can use it inside of a Polyglot Notebook in VS Code, and show you a simple way you can use it without needing a Polyglot Notebook.
SandDance is a Microsoft Research project designed to fluidly visualize data in an interactive manner. SandDance supports different visualization types and aggregating, sorting, and colorizing data points to help the user perform complex data analysis.
SandDance can be integrated into a variety of different environments from a web application to a Power BI integration, but most interesting to me is the extension to Polyglot Notebooks that allows you to directly pipe any data source you want into SandDance for analysis.
In order to work with SandDance, we'll need to install the
SandDance.InteractiveExtension NuGet package. Along with that we'll install a few other extensions that make it easier to load and work with tabular data.
#r "nuget:SandDance.InteractiveExtension,*-*" #r "nuget:DataView.InteractiveExtension,*-*" #r "nuget:Microsoft.ML.DataView" #r "nuget:Microsoft.Data.Analysis"
Once these NuGet packages are downloaded and installed into the notebook you can start loading your data with a C# code cell:
using System.IO; using System.Collections.Generic; using Microsoft.Data.Analysis; using Microsoft.ML; // Load a CSV file into a DataFrame string contents = File.ReadAllText("Titanic.csv"); DataFrame ratings = DataFrame.LoadCsvFromString(contents); // Display the first 3 rows of the DataFrame below the cell ratings.Head(3)
This will load the entire CSV file into an interactive
DataFrame and then display the
DataFrame below the cell as shown below:
Microsoft.Data.Analysis.DataFrame class is one I want to explore at more length, but at early glance it appears to be very analogous to a Pandas DataFrame in Python.
Once you have a
DataFrame, you can convert it to a
TabularDataResource and then call the
ExploreWithSandDance extension method to begin visualizing your data.
using Microsoft.DotNet.Interactive.Formatting.TabularData; TabularDataResource tabular = ratings.ToTabularDataResource(); SandDanceDataExplorer explorer = tabular.ExploreWithSandDance(); explorer.Display()
This will open SandDance immediately below the cell using the data source you provided.
The user experience in Polyglot with SandDance currently feels a bit cramped to me and I do not know of an easy way of maximizing the SandDance output to a full window (aside from the steps in the next section) or customizing the behavior of the SandDance widget at the moment.
That being said, both SandDance and Polyglot Notebooks are relatively young technologies and will likely evolve and grow over time.
If SandDance is interesting to you and you don't want to use it inside of Polyglot Notebooks, there's a VS Code extension that allows you to run SandDance for any
.csv file you have saved to disk.
First, install the SandDance extension into VS Code.
Next, right click on any
.csv file to show the
View in SandDance context menu option.
From there you'll be able to explore data in SandDance as if you had imported it with Polyglot Notebooks.
In fact, I often prefer this way of working with SandDance because it gives you more screen space to work with and SandDance as an extension respects your VS Code theme while the version that integrates with Polyglot Notebooks does not.
SandDance is an amazing experience, but I don't see it meeting all of my data visualization needs at the moment.
Specifically, I can see myself wanting to make small lightweight visualizations with a number of pre-configured settings and wanting those visualizations to be embedded in my notebook. This way I could make a scatter plot of a specific type and embed it in my dashboard and share it with other people for a more guided experience like I can in a Jupyter Notebook with visualization libraries like Plotly.
Still, SandDance is a great tool to have in your toolbox and I am hopeful about the direction this technology is growing.