DEV Community

Cover image for Azure DP-900 Part 1
Sakshi
Sakshi

Posted on • Updated on

Azure DP-900 Part 1

Hiii everyone!

I was busy as hell from last few months, and now able to get the time.

This blog is going to be first part of DP-900 Preparation.
I am mentioning things in brief for revision purpose, please read documentation from MS learn first. Will recommend reading these blogs for revision.

LETS BEGIN TADAA

EXPLORE CORE DATA CONCEPTS

As the name is data fundamentals, we should know what data is.
Data is collection of facts, number and description.

We have 3 types of data, defined in this course.

  1. Structured data

Data which adheres to a fixed schema or tabular format is structured data. All the data has field and properties.
Rows represent data entity, column represent attributes.

Example : A table for storing customer info and their details

  1. Semi-structured data

It has structure, but allows some variation.
Example : Customer may have more than one email ID
One such format is JSON

  1. Unstructured data

Images/ Audio file/ Video data/ Documents do not have specific structure and fall in this category.

Now we are done with types of data, lets see DATA STORE...

DATA STORE

Organizations store data in all 3 defined types.

  1. File Store
  2. Databases

In this file format is used to store data, it depends on type of data and application required to modify it.

CSV, and TSV files are structured data.

JSON

It is good for structured and semi structured data.
Hierarchical document schema is used to define objects that have more than one attributes. Hope you are familiar with JSON format.

XML

It is human readable format. Tags are used for element and attributes.

Blobs

Images, videos and audio are stores.
These are application specific documents.
It is for unstructured data
It stores file in raw binary

Optimized file format

It enables compression, indexing and efficient storage.

  • As the name suggests, it is optimized so we can assume it may acquire less space and all
  1. Avro It is a row based format by Apache. Header is stored in JSON, and data is stored in binary information. It is good for compressing data, minimize storage and network bandwidth requirements.

Since it optimized file format it is for compressing data, when data is compressed it take minimum storage.

  1. ORC
    Optimized row column format
    It organizes data into columns rather than rows
    Contains stripes of data, hold data for a column.
    Stripe has index, data and footer

  2. Paraquet
    A Columnar data format
    It has row groups
    Paraquet files contain metadata
    It supports compression and encoding schema

Thanks for reading :)
SEE YOU IN NEXT BLOG

Top comments (0)