DEV Community

Cover image for Athena Basics
Sujithra
Sujithra

Posted on

Athena Basics

What is Athena?
—

  1. Interactive query service for analysis of data stored in S3
  2. — Serverless avoiding setup of infrastructure
  3. — Provides automatic scaling of data volume in queries
  4. — Leverages column-based table creation for parallel processing
  5. — Cloud based in-memory query system

Business Role for Athena
—

  1. User friendly query system for S3 data storage
  2. — Central metadata store architecture like Hive
  3. — Focuses on unstructured and semi-structured data stored in S3
  4. — Common examples of queried data include JSON, CSV,
  5. Apache Parquet, and Apache ORC large data files
  6. — Emphasis is on large capture data files like weblogs, IOT, and other external data

Creating Tables in Athena
—

  1. Athena creates tables using the Apache Hive Data Definition Language
  2. Hive is an open-source Big Data toolset for analytics
  3. Uses SQL compliant statements for table creation

Schema on Read
—

  • Verifies data organization when a query is issued
  • — Provides much faster loading as structure is not validated
  • — Multiple schemas serving different needs for the same data

  • Better option when the schema is not known at loading time

Parallel Processing of Queries
—

  • Parallel operations within an SQL Query
  • — Concurrent users can access columns at the same time
  • — Horizontal and vertical parallelization of a single query operation using multiple nodes

Governed Tables

  • — Governed Tables are tables formed within a data lake created by AWS Lake Formation
  • — Similar to Managed Tables in Hive
  • — When a governed table is dropped the table definition in the metastore and the data file is deleted

Iceberg Table
—

  • An Iceberg Table is an Apache open format table designed to capture a large analytics dataset
  • — Manages a large collection of files as a table
  • — Iceberg tables must be associated with an AWS Glue catalog
  • — Must be created using the Parquet format in AWS
  • — Drop table deletes the meta store and data file

Summary

  • — Athena is a serverless cloud based in-memory query service
  • — Athena federated Query service
  • — Uses a common metadata store architecture for table
  • definitions
  • — Uses common data stored in JSON, ORC, and Parquet formats
  • — Uses standard SQL query language
  • — Supports external, governed, and iceberg tables

Top comments (0)