DAX Overview

#analytics #dataanalysis #datascience #learn

What is DAX?

DAX is an expression language for distilling your data into bite-sized insights.

Where is DAX used?

Power BI, Power Pivot, SSAS Tabular Mode, and Azure Analysis Services.

What is DAX Good at?

Aggregations:
Dax is used for turning millions of values into one result, such as total annual sales.
Filtering:
DAX is optimized to slice and dice your data, based on filters.

What is DAX Not Good at?

Operational Reporting:
These reports are detail heavy and used to support day-to-day operations. This is the opposite of Analytical Reporting. DAX isn't designed for listing out line items on sales order providing a list of time sheet entires, it is designed fo aggregating data, for combining it together into summaries, not detailed lists that have to be pixel perfect.
Wide Tables:
The more columns you add to table, the worse the performance. So if you have got a table with 50 columns, DAX is going to have much worse compression and much worse performance than normal.
Many-to-Many Relationships:
DAX is optimized for when you have a few transaction tables and some dimension tables surrounding those. DAX doesn't directly support many-to-many relationships between tables. There are ways around this either by changing your formulas or by using bidirectional filtering but it can be a pain or it can lead to results that you are not expecting.

Think in Columns, Not Rows

Now that we understand why DAX is useful, we need to tackle the biggest hurdle to using it, and that is learning to think in columns instead of rows. In order to proceed, we are going to learn some four letters acronyms.

Two uses for Database

OLTP

Online Transactional Processing.

Applications designed for day to day use and frequent updates.

This would be an accounting system, ERP system, or online commerce system. So, they need to support a large number of transactions at a given time. This means that they are often normalized or broken up into multiple tables by entity to minimize number of writes that have to be done.
This means that instead of just the product table, you might have a product table, a product sub-category table, and a product category table. But the benefit is, let's say you need to change the name of a category. By breaking it up like this, you only have to modify one row in one table instead of potentially thousands of rows in a table.

OLAP

Online Analytical Processing

Applications designed for daily reporting and heavy reads.

This is your traditional data warehouse. In these systems, you are doing a lot of complex reporting. So, you want to optimize for reads, not writes. You don't care about updates, because these types of systems are generally updated on a nightly basis in bulk.
These types of systems are stored oftentimes in something called a star schema where you have a ton on transactions in a table in the middle with a bunch of reference tables around it, like the shape of a star.

Optimizing for OLAP

So, why we care about this OLTP and OLAP?
The reason we care is that when we are doing analytics, there are certain things that you may want to optimize for:

Single-Column Aggregations
This may sound like a fancy term, but all we mean is that oftentimes you are going to be displaying metrics and KPIs that consist of one column plus some filters. For example, it might be all of sales for the year 2021, the count of orders by geography, or the total revenue for each department.
A lot of analytical reporting involves taking a column, applying a filter or two, and then mushing it together. So we want our analytical engine to be really good at that.
Large Number of Rows
With an analytical system, you are often including all of the historical data going years back, so you need something that can work with hundreds of thousands or even millions of rows. So we are often trying to pull columns of data, and those columns happen to be much larger than usual because of all this historical data.
Repeated Values
When you are dealing with your accounting system, you only want to change the customer name once. But when you are dealing with a reporting system, you want to minimize the number of tables you have to join together to get all your information. As a result it is common to flatten much of the data into one giant transaction table.
This means you might put the customer name on each transaction instead of in its own table. As a result you would easily have the same value repeated thousands or millions of times in a given column. So, system should be able to deal with repeated values.
Quickly Apply Filters
Finally, whenever someone is doing analytical reporting, they often want to look at data from a number of different angles by different dates, different departments, different geographies, different product lines, etcetera.
All of this is basically a matter of applying and un-applying filters. Any such system then needs to be able to modify filters quickly and efficiently.

A relational row-based database is like a Swiss Army Knife. It is pretty good at a lot of different things, but sometimes, you have a narrowly-defined purpose. Sometimes you need a scalpel instead of a Swiss Army Knife. A columnar database is like that scalpel.
Columnar databases are highly optimized for analytical reporting.

DEV Community