DEV Community

Cover image for Azure DP-900 Short Notes: Explore Core Data Concepts
Newanga  Wickramasinghe
Newanga Wickramasinghe

Posted on • Originally published at blog.newanga.me

Azure DP-900 Short Notes: Explore Core Data Concepts

πŸ‘‰ Learn Module: Explore Core Data Concepts

Identify the need for data solutions

πŸ‘‰Data is a collection of facts such as numbers, descriptions, and observations used in making of decision.

πŸ‘‰Three types of data

  1. Structured data
    • Tabular data that is represented by rows and columns in a database.
    • Tables in this form are called relational databases.
  2. Semi-structured data
    • Information that doesn't reside in a relational database but still has some structure to it.
    • Ex: JSON, key-value stores and graph databases
  3. Unstructured data
    • Data with no proper structure.
    • Ex: Audio, Video , Binary dat files

πŸ‘‰Based on the type of data, there are multiple ways to store and access data in Azure cloud.

πŸ‘‰Stored data needs to be processed. There are two types of data processing solutions.

  1. Transaction processing systems
    • primary function of business computing.
    • work performed by transactional systems is often referred to as Online Transactional Processing (OLTP).
    • Data is divided into small pieces for faster processing.
    • For example in a database tables are split out into separate groups of columns and this is called normalization.
  2. Analytical systems
    • Support business users who need to query data and gain a big picture view.
    • Capturing raw data and generate insights to make future business decisions.
    • Common tasks of a analysis system
      1. Data Ingestion - Capturing the raw data.
      2. Data Processing - Converting captured data into a common format to be processed.
      3. Data Querying - Querying data to analyze.
      4. Data Visualization - Generating charts such as bar charts, line charts out of queried data in order.

Identify types of data and data storage

πŸ‘‰Relational Data and Non-relational Data have different characteristics.

  1. Relational Data

    • Most well-understood model for holding data.
    • Data normalization helps to reduce any data redundancy within the database.
  2. Non-relational Data

    • Store data in a format that more closely matches the original structure.
    • Data duplication present which increases the storage required.
    • Due to data duplication, any data modification may cause to update data present at multiple locations.

πŸ‘‰Two different types of workloads.

  1. Transactional workloads

    • Transaction is a sequence of operations that are atomic.
    • Mostly commonly use relational databases.
    • A transactional database must adhere to the ACID.
      1. Atomicity = A transaction is treated as a single unit, which either succeeds completely, or fails.
      2. Consistency = A transaction can only take the data in the database from one valid state to another.
      3. Isolation = Concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
      4. Durability = Once a transaction has been committed, it will remain committed even if there's a system failure.
  2. Analytical workloads

    • Read-only systems that store vast volumes of historical data or business metrics.
    • Used for data analysis and decision making.

Describe the difference between batch and streaming data

πŸ‘‰Data processing is converting data into meaningful information.

πŸ‘‰There are two types of data processing.

  1. Batch Processing
    • New data elements are collected into a group and the whole group is then processed at a future time as a batch.
    • Data Scope = Process all the data in the dataset.
    • Data Size = large datasets.
    • Performance = latency is a few hours.
    • Analysis = performing complex analytics.
  2. Streaming and real-time data
    • In stream processing, each new piece of data is processed when it arrives.
    • Beneficial for dynamic data.
    • Ideal for time-critical operations that require an instant real-time response.
    • Data Scope = Access to the most recent data received.
    • Data Size = Individual records orΒ micro batches.
    • Performance = latency in the order of seconds or milliseconds.
    • Analysis = simple response functions.

Top comments (0)