Run queries from your data lakes
Amazon Redshift announced a new feature on 25 July 2023 that allows developers, data engineers, data analysts and data scientists to run SQL queries easily using Query Editor v2 in a provisioned environment or Amazon Redshift Serverless by querying data from AWS Glue Data Catalog.
Lesson Objectives
In this lesson, you will learn how to:
- Change the system-level configuration of the data catalog auto mount.
- Show a list of schemas in the database named in the awsdatacatalog.
- Show a list of tables in a schema.
- Show a list of columns in a table.
How does this work?
This new AWS service integration means that developers and analysts will no longer have to create an external schema from Amazon Redshift in order to be able to query data in data lakes (i.e. Amazon S3 bucket) from AWS Glue Data Catalog.
I have included an image below to demonstrate that an external schema will no longer need to be created.
Hence, you will be able use Query Editor v2 to query data that is cataloged in your AWS Glue Data Catalog.
(Note: Querying the AWS Glue Data Catalog is not available in all Amazon Redshift Regions).
Solution Overview
This is my interpretation of the solution architecture with this new integration between Amazon Redshift Serverless to query directly from AWS Glue Data Catalog using Query Editor v2.0.
Pre-requisites
If you would like to learn more about getting started using AWS Glue and AWS Glue Studio you may read my previous blog posts below:
How to ETL with AWS Glue and AWS Glue Studio to transform your data - Part 1
How to ETL with AWS Glue and AWS Glue Studio to transform your data - Part 2
Tutorial: Querying the AWS Glue Data Catalog using Amazon Redshift Serverless.
Step 1: Sign into your AWS account.
If you do not have an AWS account you may create one here.
Step 2: Sign into the AWS Management Console as an IAM Admin User.
If you do not have an IAM Admin user role you may create one here.
Step 3: Navigate to the AWS Glue dashboard.
I have previously created a database called 'community'.
I have a table called 'aws_community_builders_apj' which I cataloged in the AWS Glue Data Catalog last year.
Step 3: I already created an instance of Amazon Redshift Serverless by navigating to the Amazon Redshift dashboard.
If you wanted to know how to get started with Amazon Redshift Serverless you may read my previous blog here.
Step 4: Once an instance of Amazon Redshift Serverless is created, click on the namespace. In this example, I will click 'default-namespace'.
Step 5: Navigate to the left-handside menu and select 'Query Editor v2'.
Step 6: Double click on serverless-workspace and enter your database credentials as a Federated User. Enter the database name, username and your password.
Step 7: In Query Editor v2 enter these SQL commands.
This command will display whether awsdatacatalog is mounted for the Amazon Redshift Serverless data warehouse.
SHOW data_catalog_auto_mount;
Step 8: ALTER command will change the system level configuration when Amazon Redshift Serverless is paused or resumed.
ALTER SYSTEM SET data_catalog_auto_mount = on;
Step 9: Show a list of schemas that is cataloged in the AWS Glue Data Catalog.
SHOW SCHEMAS FROM DATABASE awsdatacatalog;
Step 10: Show a list of tables in the schema for community.
SHOW TABLES FROM SCHEMA awsdatacatalog.community;
Step 11: Show a list of columns within the table.
SHOW COLUMNS FROM TABLE awsdatacatalog.community.aws_community_builders_apj;
Step 12: If you would like to query the AWS Glue Data Catalog, you will have to grant your user the necessary permission. e.g.The IAM user role to access the database.
GRANT USAGE ON DATABASE awscatalog to'IAM:myIAM USER"
Conclusion
As organizations innovate and wish to analyze all of their data to provide quick answers to their business questions, we have learnt how to empower developers, data analysts, data scientists and business analysts to access data from the AWS Glue Data Catalog without creating an external schema and using Query Editor v2 from Amazon Redshift Serverless or a provisioned data warehouse to generate SQL queries with insights in seconds.
Resource
You may read more about this new capability in the AWS Big Data Blog.
Reference
Until the next lesson, happy learning! 😀
Last Week - AWS Summit New York, 26 July 2023 keynote VP of analytics, databases and machine learning, Dr Swami Sivasubramanian
You may watch the AWS Summit New York keynote from VP of analytics, databases and machine learning Dr Swami Sivasubramanian.
You may also read all the latest announcements and highlights from analytics, database and Generative AI here.
Last Week - AWSonAir on Twitch, 26 July 2023
Watch on-demand on AWSonAir on Twitch below, the latest announcements including AWS Glue from AWS Summit New York.
Next Month - AWS Innovate Data Edition - 30 August 2023
You may join us and register for AWS Innovate, Data Edition in APJ on 30 August 2023. You may register here and how to unlock the value of data and develop your data strategy as you hear from AWS Experts.
Top comments (0)