DEV Community

Cover image for Amazon Athena- Serverless
William-Muko
William-Muko

Posted on

Amazon Athena- Serverless

What is Amazon Athena?

Amazon Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3 using standard SQL. With Athena, there is no infrastructure to manage, and you pay only for the queries that you run. Since you only pay for the queries that you run, it can be a cost-effective solution for running ad-hoc queries or for data analysis projects that are not part of a regular workflow.

Athena's key feature is the ability to analyze data stored in S3 without having to move or transform it first. This makes it an ideal solution for analyzing large, raw data sets such as logs, sensor data, and other type of unstructured data.

One of the key benefits of Athena is its ability to work with a variety of data formats, including CSV, JSON, ORC, Parquet, and Avro. This makes it easy to work with data from different sources and in different formats, without the need for data transformation or ETL processes.

In addition, Athena integrates with other AWS services such as Amazon QuickSight for data visualization, Amazon Glue for data cataloging, and Amazon CloudWatch for monitoring and troubleshooting. This allows you to build a complete data analytics solution on the AWS platform.

Another great feature of Athena is its ability to work with partitioned data. This means that you can partition your data based on specific attributes, such as date or location, which improves query performance and reduces costs. This makes Athena a great option for working with large, partitioned datasets.

Athena also supports a variety of SQL functions and operators, including aggregate functions, filtering, and sorting. This makes it easy to perform complex data analysis and gain insights from your data.

How to use Amazon Athena

  • Set up an S3 bucket: Before you can use Athena, you will need to have your data stored in an S3 bucket. This can be done by manually uploading your data or using a service like Amazon Kinesis or Amazon Glue to automatically move your data into S3.
  • Create a table: Once your data is in S3, you can create a table in Athena that references the location of your data in S3. You will need to define the schema of your data, including the column names and data types. Athena supports a variety of data formats, including CSV, JSON, ORC, Parquet, and Avro.
  • Run queries: Once your table is set up, you can use the Athena query editor to run SQL queries against your data. You can also use the Athena API to run queries programmatically.
  • Analyze your data: Athena allows you to perform a wide range of data analysis tasks, including filtering, sorting, and aggregating your data. You can also use the results of your queries to create visualizations using Amazon QuickSight.
  • Monitor and Optimize: You can monitor the performance of your queries and optimize them using the Athena Query Monitoring and the Query Details page. You can also use the AWS Glue Data Catalog, to improve the performance and optimize the cost of your queries.

It's worth noting that Athena is a serverless service, so there are no servers or infrastructure to manage or provision. Additionally, Athena only charges you for the amount of data scanned per query, so you only pay for what you use.

Ways to Interact with Amazon Athena

It's worth noting that there are several ways to interact with Athena, you can use various AWS SDKs, the AWS Management Console, the Command line Interface, and the Athena API to interact with Athena, Additionally, you can use third-party tools such as Presto CLI or Dataform to interact with Athena.
In this post, I'll share some insights on how you can use the Command line Interface to interact with Amazon Athena.

You can access Amazon Athena using the command line interface (CLI) by using the AWS Command Line Interface (AWS CLI) tool or you can use AWS Cloudshell on the AWS Management console.
AWS Cloudshell comes with other pre-installed tools like Python, AWS CLI, Node.js etc..

Here are the steps to take when you opt to use the AWS Command Line Interface(AWS CLI) tool.

  • First, you will need to install the AWS CLI tool on your computer. You can do this by following the instructions on the AWS documentation

  • Once the AWS CLI tool is installed, you will need to configure it by running the aws configure command and providing your AWS access key and secret key.

  • Now you can run Athena commands using the aws athena command.

For example, to list the tables in your Athena database you can use the command:

aws athena list-tables --query 'TableList[*].Name' --output text

To run a query on Athena using the CLI, you can use the start-query-execution command. For example, to run a query that select all rows from a table named 'mytable':

aws athena start-query-execution --query-string "SELECT * FROM mytable" --result-configuration OutputLocation=s3://mybucket/queryresults/

To get the results of the query, you can use the get-query-results command. For example, to get the results of the query execution with the execution ID '7890547':

aws athena get-query-results --query-execution-id 7890547

SQL queries you run on Amazon Athena

  • Count the number of rows in a table: SELECT COUNT(*) FROM mytable;
  • Select specific columns from a table: SELECT column1, column2 FROM mytable;
  • Filter rows based on a specific condition: SELECT column1, column2 FROM mytable WHERE column2 > 100;
  • Sort the results of a query: SELECT column1, column2 FROM mytable ORDER BY column2 DESC;
  • Group data by a specific column: SELECT column1, COUNT(*) FROM mytable GROUP BY column1;
  • Join two tables: SELECT a.column1, b.column2 FROM table1 a JOIN table2 b ON a.id = b.id;
  • Use aggregate functions: SELECT AVG(column1), SUM(column2) FROM mytable;
  • Use a subquery: SELECT column1 FROM mytable WHERE column2 IN (SELECT column2 FROM mytable2);

In conclusion, if you are looking for an easy and cost-effective way to analyze large data sets in S3, Amazon Athena is an excellent option to consider. Its serverless architecture, standard SQL support, and integration with other AWS services make it a powerful and flexible solution for data analysis.

Top comments (0)