In this blog we'll explore
- Data warehousing
- Data warehousing architechture
- Data ingestion pipelines
It is used by organization to build large scale analytical solution.
Modern data warehousing : Combination of conventional and big data analytics
Conventional data warehousing : Copy data to relational DB and query over it
Big data analytics : It is for large volume of data in more than one format, loaded in real time and stored in data lake from which distributed engineer like spark process it.
Data ingestion and processing
Data from one or more transactional data store, real time streams or other sources is loaded into a data lake.
Load operations involves ETL or ELT, data is cleaned, filtered and restructured for analysis.
Analytical data stores
Data store from large scale analytics
Analytical data model
Encapsulates relationship between data values and dimensional entities to support drill up/drill down analysis.
Model is often described as cube, numeric values are aggregated across 1 or more dimensions.
It shows trend, comparison, key, performance indicator can take form of PPT, graph or report
To ingest large scale data, we need pipelines that orchestrate ETL process. Pipelines can be created and run in azure data factory.
Pipeline consist of one or more activities that operate on data.
Input dataset provides source data.
Azure blob stores dataset, from linked services like SQL, data bricks data is incrementally manipulated. Output is saved in dataset.
Thanks for reading <3