Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. On a recent PoC IoT project that I was part of, we streamed data into Azure Databricks using Event Hubs and had to present some of that data on a dashboard. We followed a micro-service architecture, and our data-access API was developed using ASP.NET Core deployed to a Docker container. Development for our project was done using Visual Studio 2017 & C#/.NET Core.
This post assumes you have created an Azure Databricks workspace and an Apache Spark cluster within that workspace. You will want to go through the following quickstart if you haven't.
Next, I'll walk you through how to query Azure Databricks from .NET Core.
Note: ODBC connectivity to clusters is available in the Azure Databricks Premium Plan.
- Go to the Databricks JDBC / ODBC Driver Download page.
- Fill out the form and submit it. You will receive an email that includes multiple download options.
- In the email, select the driver that you want and download it. I'm using the Windows Simba Spark 1.2 64 bit driver.
- Unzip and add the driver to your program dependencies
Here are some of the parameters required to configure the ODBC driver:
Test the connection after you have configured everything. You should see something like this.
Open Visual Studio and Create a .NET Core Console application.
Next install the System.Data.Odbc nuget package.
The Code sample is a simple query against a Notebook in Databricks.
Given this was a PoC on how to leverage Databricks for real-time event streaming, we did not implement hot / cold storage. If we had, Databricks would be our cold storage and an Azure SQL database would have served as our hot storage.