Well, data is everywhere every second of every day.As a backend developer, I used to dodge terms related to data engineering. However, due to a recent project, I've started learning more about it.
So, I came across these terms: Data Lake, Data Warehouse, and Data Mart. I will break them down into simple terms that I can understand.
The format will be as follows:
- Definition: (Definition)
- Characteristics: (Characteristics)
- Why it exists: (Why it exists)
- Tools: (Tools that can be used to implement it)
-
Data Lake:
- Definition: A huge storage space for all raw data (For example: JSON, Videos, Database dumps, etc) where everything is dumped without organization.
-
Characteristics:
- Stores raw data without modification.
- Store structured, semi-structured, and unstructured data.
- Can be Used for the entire data lifecycle.
- Why it exists: Data is valuable nowdays and it can be used for many things. So, store it and you can use it later when you need it.
-
Tools:
- Free: Hadoop Distributed File System (HDFS)
- Paid: Amazon S3, Azure Data Lake Storage, Google Cloud Storage
-
Data Warehouse:
- Definition: An organized storage place where data is structured and cleaned.
-
Characteristics:
- Stores data in a structured way.
- Requires transformed and cleaned data.
- Time-variant data, meaning any existing data will be archived after perid of time (Example: 1 year) and stored in the Data Lake.
- Why it exists: Since data is stored in a structured way, it can be used for reporting and analysis.
-
Tools:
- Free: PostgreSQL, MySQL, MariaDB (limitions: not scalable for HUGE data and not optimized for analytics purposes)
- Paid: Amazon Redshift, Google BigQuery
-
Data Mart:
- Definition: A subset of a Data Warehouse, with a focus on specific topics.
-
Characteristics:
- Users don't need advanced technical knowledge.
- Subset of a Data Warehouse, smaller and topic-focused.
- Users have read-only access to specific information.
- Why it exists: Provides users a quick and easy access to data for specific topics.
-
Tools:
- Free: Microsoft Power BI (limited features)
- Paid: Microsoft Power BI, Tableau, QlikView, Looker
Resources:
Top comments (0)