OLAP Cubes

#olap #database #data #productivity

WHAT

An OLAP cube is an aggregation of a fact metric on a number of dimensions. CUBE refers to a multi-dimensional dataset.

The advantage of OLAP cubes is that they are easy to communicate to business users.

The data should be stored at the finest - atomic - level of the data.

OPERATIONS

ROLL-UP

Aggregate or combine values and reduce number of rows or columns.
Example: Sum up sales of each city by country.

DRILL-DOWN

Decompose values and increase number of rows or columns
Example: Decompose sales of a city into sales of districts.

SLICING

Reducing N dimensions to N-1 dimensions by restricting on dimension to a single value.
Example: Taking out all data for one month.

DICING

Same dimensions but computing a sub-cube by restricting some of the values of the dimensions.

QUERY OPTIMIZATION

Business users want to slice, dice, roll-up and drill-down. Every combination will most likely go through all the facts table. This is not very performant.

We can use GROUP BY CUBE(groupme, andme, andmetoo). This will make 1 pass through the facts table and aggregate all possible combinations. This output is most of the time enough to answer all aggregations from business users, without processing the whole facts table. It also has the advantage that it is faster and easier to read than to write several queries and use UNION three times to get them together.

SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate on (dimDate.date_key = factSales.date_key)
JOIN dimStore on (dimStore.store_key = factSales.store_key)
GROUP by cube (dimDate.month, dimStore.country);

Example output:
total, total by month, total by country, total by month and country