An OLAP cube is an aggregation of a fact metric on a number of dimensions. CUBE refers to a multi-dimensional dataset.
The advantage of OLAP cubes is that they are easy to communicate to business users.
The data should be stored at the finest - atomic - level of the data.
Aggregate or combine values and reduce number of rows or columns.
Example: Sum up sales of each city by country.
Decompose values and increase number of rows or columns
Example: Decompose sales of a city into sales of districts.
Reducing N dimensions to N-1 dimensions by restricting on dimension to a single value.
Example: Taking out all data for one month.
Same dimensions but computing a sub-cube by restricting some of the values of the dimensions.
Business users want to slice, dice, roll-up and drill-down. Every combination will most likely go through all the facts table. This is not very performant.
We can use GROUP BY CUBE(groupme, andme, andmetoo). This will make 1 pass through the facts table and aggregate all possible combinations. This output is most of the time enough to answer all aggregations from business users, without processing the whole facts table. It also has the advantage that it is faster and easier to read than to write several queries and use UNION three times to get them together.
SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue FROM factSales JOIN dimDate on (dimDate.date_key = factSales.date_key) JOIN dimStore on (dimStore.store_key = factSales.store_key) GROUP by cube (dimDate.month, dimStore.country);
total, total by month, total by country, total by month and country
Pre-aggregate the OLAP cubes and save them on a special purpose Non-relational database.
Compute the OLAP cubes on the fly from existing relational databases where the dimensional model resides.
Code examples in Python notebooks
To have all the above information in one view, I made a sketchnote.
Top comments (0)