DEV Community

Cover image for Azure Data Lake Storage
Madhav Ganesan
Madhav Ganesan

Posted on

2 1 1 1 1

Azure Data Lake Storage

Key Concepts

Data Lakehouse

It is a modern data management system that combines the benefits of data lakes and data warehouses. It enables efficient data storage, processing, and analytics in a single architecture.

Delta Lake

It is a technology designed for building Lakehouse architectures.

Open-source storage framework with:

  • ACID transactions for data reliability.
  • Scalable metadata handling.
  • Data versioning for historical tracking.
  • Integrated with big data ecosystems like Apache Spark.
  • Serves as the core technology for a Lakehouse architecture.

Unity Catalog

  • Unified governance solution for data and AI assets on Azure Databricks.
  • Provides centralized access control, auditing, lineage tracking, and data discovery across Databricks workspaces.
  • Enables simplified security and governance for multi-cloud environments.
  • Comparison: Unity Catalog focuses on data governance within Databricks, whereas AWS IAM is a broader identity and access management service.

Delta Table (Data Table Architecture)

  • Default data table format in Azure Databricks.
  • Optimized for data lakes, supporting:
  • Streaming ingestion
  • Batch processing
  • Efficient querying and updates
  • Provides schema enforcement, versioning, and optimized storage.

Delta Live Tables (Data Pipeline Framework)

  • Proprietary framework in Azure Databricks.
  • Designed to simplify ETL (Extract, Transform, Load) pipeline creation and management.

Features:

  • Manages dependencies between datasets intelligently.
  • Automatically deploys and scales infrastructure to maintain timely and accurate data processing.
  • Optimized for real-time and batch data processing workflows.

Stay Connected!
If you enjoyed this post, don’t forget to follow me on social media for more updates and insights:

Twitter: madhavganesan
Instagram: madhavganesan
LinkedIn: madhavganesan

Top comments (0)