Hii! This is the next part in DP900 series. In this we are going to explore databases.
A database (DB) is a central system in which data is stored is queried.
It is used to query and store structured data.
Data is stored in tables, represent entities. Each table has a primary key. Keys in relational DB helps in normalizing database.
It has no relational or tabular schema.
It includes database which stores data as key-value, we can relate it with JSON format.
We have column family databases, columns can be split into 1 to many columns.
We also have Graph databases, which store entities as node, uses link to define relationship.
A transactional system records transaction that encapsulates specific events, organization wants to track.
Here transaction can be taken as unit of small work.
Work performed by this system is Online Transaction Processing (OLTP).
OLTP relies on system optimized for CRUD operations, to support transactional workloads.
OLTP enforces transaction which support ACID properties.
Now what are these ACID properties...
A stands for ATOMICITY
Each transaction is treated as single unit, which succeeds or fails completely.
For example: Transfer of money from one account to other bank account should either succeed or fail, nothing else.
C stands for Consistency
Transaction can only take data from one valid state to another.
I stands for Isolation
Concurrent transaction must not interfere with each other, and result in consistent DB state.
D stands for durability
Once transaction committed will be committed.
OLTP supports live applications.
It used read only system, stores vast volume of historical data metrics.
- Data files are stored
Extract, Transform and Load (ETL) process copies data from files to OLTP databases to data warehouse.
Data warehouse is optimized for reading data. Its schema is based on fact table that contains numeric values you want to analyze.
Data in warehouse may be aggregate to OLAP model. Aggregate numeric values are calculated from fact table.
Data in data lake, warehouse and analytical model is queries.
Some terms we should know :
Data Lakes : Large volume of file based data must be collected and analyzed.
Data warehouses : Stores data in relational database which is optimized for read operations.
OLAP model : Data storage for analytical workloads
Example : Calculate Sales by region, city, address etc
Thanks for reading this blog :)
Hope you liked it
I have one request to everyone reading this, please do not refer it for preparation. I am writing this as a revision content.
Also if you find any error in content, mention that in comments, will correct it asap.