DEV Community

Itay Sagui
Itay Sagui

Posted on • Originally published at saguiitay.com on

Using dimensions when querying Azure Data Explorer using Power BI

In the previous post I discussed querying Azure Data Explorer using Power BI. I’ve shown how we can use the Azure Data Explorer connector to retrieve data from our cluster in Direct Query mode, and add a couple of filters and a pie chart. One of the things I mentioned, is that the report generates multiple queries to our ADX cluster:

  1. One query to retrieve the list of States (for the filter visualization)
  2. One query to retrieve the list of Sources (for the filter visualization)
  3. One query for the actual data needed for the pie-chart – the aggregated number of events by EventType

If we have a lot of visualizations, this number will grow, and will impact the performance of our report. So, what can we do to reduce the number of queries? Well, based on our data model, quite a lot. I’ll touch a bit about the different options we have to reduce impact on our clusters (thus reduce our COGs), but for now let’s focus on the main issue.

Creating dimensions tables for Azure Data Explorer in Power BI
The number of States and the number of Sources is not expected to change (probably never, but certainly not with high frequency). There’s no reason that each time a user views our report, we’ll retrieve those values. But that’s exactly what happens when our filters are based on a Direct Query data source. Instead, what we want to do is import those dimensions once in our report (and perhaps refresh them periodically) and reuse them for all our users. Let’s give it a try.

We’ll start with creating 2 new data sources using the Azure Data Explorer connector. Notice how this time, we provide a KQL query, instead of just a table name, and how we select the Import mode (instead of Direct Query):

We’ll use the following queries. For retrieving the list of unique States:

StormEvents
| distinct State
Enter fullscreen mode Exit fullscreen mode

and for the unique list of Sources:

StormEvents
| distinct State
Enter fullscreen mode Exit fullscreen mode

Relationships and Visualizations

If we try just to update our visualizations to use the new data source, we’ll lose the cross-filter interactivity – the pie-chart won’t be updated based on our filters selections. We first need to create the relationships between our Dimensions tables (States & Sources) and our original Facts table.

We’ll create both relationships as many-to-many – remember, ADX does not have strong relationships between tables (but you can simulate them using the join operator). Although we can probably set a one-to-many relationship in this case, since the number of values in our dimension tables is rather low, this has no benefit, and the same queries will be generated either way:

Image description

Image description

We can now go back to our report, and select different values in our filters. Notice how the data in the pie-chart gets updated based on our filters selection:

Image description

If you’ll investigate the query generated by Power BI, you’ll see something along the following lines:

StormEvents
| where (["State"] == "CALIFORNIA") and (["Source"] in ("Broadcast Media","AWOS","ASOS","Amateur Radio","Airplane Pilot"))
| summarize ["a0"]=countif(isnotnull(["EventId"])) by ["EventType"]
| limit 1000001
Enter fullscreen mode Exit fullscreen mode

You can see that the ADX Connector added a filter for the California state, as well as a multi-valued filter (using the “in” keyword) for the selected Sources.

There are a few things that are worth mentioning:

  • You can achieve the same behavior with Direct Query dimension tables, but the generated queries will be much less efficient, and will use the “join” KQL operator, instead of just simple filters. In the example above, I’ve used 2 separate dimension tables. This does have the drawback, that even when I select a State, I still see the full list of Sources – there’s no cross filtering between these 2 filters.
  • That being said, I’d recommend using the “Add an Apply button to each slicer to apply the changes when you’re ready” (from the “Query reduction” settings section – PowerBI tends to be quite chatty, which might impact your cluster.
  • Lastly, when I created the dimension tables, I’ve used a query to retrieve the list of unique States and Sources. If you have huge amounts of data, or if calculation of the dimensions is more complex, that might be too slow. In such cases, I’d suggest you look into creating a Materialize View, or using an Update Policy to pre-create the data.

Top comments (0)