DEV Community 👩‍💻👨‍💻

Cover image for DP-900 Part 11

Posted on • Updated on

DP-900 Part 11


In this blog we'll explore

  1. Data warehousing
  2. Data warehousing architechture
  3. Data ingestion pipelines

Data warehousing

It is used by organization to build large scale analytical solution.

Modern data warehousing : Combination of conventional and big data analytics
Conventional data warehousing : Copy data to relational DB and query over it

Big data analytics : It is for large volume of data in more than one format, loaded in real time and stored in data lake from which distributed engineer like spark process it.

Data warehousing architecture

  1. Data ingestion and processing
    Data from one or more transactional data store, real time streams or other sources is loaded into a data lake.
    Load operations involves ETL or ELT, data is cleaned, filtered and restructured for analysis.

  2. Analytical data stores
    Data store from large scale analytics

  3. Analytical data model
    Encapsulates relationship between data values and dimensional entities to support drill up/drill down analysis.
    Model is often described as cube, numeric values are aggregated across 1 or more dimensions.

  4. Data visualization
    It shows trend, comparison, key, performance indicator can take form of PPT, graph or report

Data ingestion pipelines

To ingest large scale data, we need pipelines that orchestrate ETL process. Pipelines can be created and run in azure data factory.

Pipeline consist of one or more activities that operate on data.
Input dataset provides source data.

Azure blob stores dataset, from linked services like SQL, data bricks data is incrementally manipulated. Output is saved in dataset.

Thanks for reading <3

Top comments (0)

Regex for lazy developers

regex for lazy devs

You know who you are. Sorry for the callout 😆