DEV Community

Cover image for Getting started - Amazon Redshift Serverless automatic mounting of AWS Glue Data Catalog
Wendy Wong for AWS Heroes

Posted on • Updated on

Getting started - Amazon Redshift Serverless automatic mounting of AWS Glue Data Catalog

Run queries from your data lakes

Amazon Redshift announced a new feature on 25 July 2023 that allows developers, data engineers, data analysts and data scientists to run SQL queries easily using Query Editor v2 in a provisioned environment or Amazon Redshift Serverless by querying data from AWS Glue Data Catalog.

Lesson Objectives

In this lesson, you will learn how to:

  • Change the system-level configuration of the data catalog auto mount.
  • Show a list of schemas in the database named in the awsdatacatalog.
  • Show a list of tables in a schema.
  • Show a list of columns in a table.

How does this work?

This new AWS service integration means that developers and analysts will no longer have to create an external schema from Amazon Redshift in order to be able to query data in data lakes (i.e. Amazon S3 bucket) from AWS Glue Data Catalog.

I have included an image below to demonstrate that an external schema will no longer need to be created.

external scehema

Hence, you will be able use Query Editor v2 to query data that is cataloged in your AWS Glue Data Catalog.

(Note: Querying the AWS Glue Data Catalog is not available in all Amazon Redshift Regions).

Solution Overview

This is my interpretation of the solution architecture with this new integration between Amazon Redshift Serverless to query directly from AWS Glue Data Catalog using Query Editor v2.0.

query glue

Pre-requisites

If you would like to learn more about getting started using AWS Glue and AWS Glue Studio you may read my previous blog posts below:

Tutorial: Querying the AWS Glue Data Catalog using Amazon Redshift Serverless.

Step 1: Sign into your AWS account.

If you do not have an AWS account you may create one here.

Step 2: Sign into the AWS Management Console as an IAM Admin User.

If you do not have an IAM Admin user role you may create one here.

Step 3: Navigate to the AWS Glue dashboard.

I have previously created a database called 'community'.

database

I have a table called 'aws_community_builders_apj' which I cataloged in the AWS Glue Data Catalog last year.

table

Step 3: I already created an instance of Amazon Redshift Serverless by navigating to the Amazon Redshift dashboard.

If you wanted to know how to get started with Amazon Redshift Serverless you may read my previous blog here.

serveerless

Step 4: Once an instance of Amazon Redshift Serverless is created, click on the namespace. In this example, I will click 'default-namespace'.

names

Step 5: Navigate to the left-handside menu and select 'Query Editor v2'.

Query v2

Step 6: Double click on serverless-workspace and enter your database credentials as a Federated User. Enter the database name, username and your password.

Step 7: In Query Editor v2 enter these SQL commands.

This command will display whether awsdatacatalog is mounted for the Amazon Redshift Serverless data warehouse.

SHOW data_catalog_auto_mount;
Enter fullscreen mode Exit fullscreen mode

Step 8: ALTER command will change the system level configuration when Amazon Redshift Serverless is paused or resumed.

ALTER SYSTEM SET data_catalog_auto_mount = on;
Enter fullscreen mode Exit fullscreen mode

rwsumed

Step 9: Show a list of schemas that is cataloged in the AWS Glue Data Catalog.

SHOW SCHEMAS FROM DATABASE awsdatacatalog;
Enter fullscreen mode Exit fullscreen mode

show sc

Step 10: Show a list of tables in the schema for community.

SHOW TABLES FROM SCHEMA awsdatacatalog.community;
Enter fullscreen mode Exit fullscreen mode

schmea

Step 11: Show a list of columns within the table.

SHOW COLUMNS FROM TABLE awsdatacatalog.community.aws_community_builders_apj;
Enter fullscreen mode Exit fullscreen mode

apj tables

Step 12: If you would like to query the AWS Glue Data Catalog, you will have to grant your user the necessary permission. e.g.The IAM user role to access the database.

GRANT USAGE ON DATABASE awscatalog to'IAM:myIAM USER"
Enter fullscreen mode Exit fullscreen mode

Conclusion

As organizations innovate and wish to analyze all of their data to provide quick answers to their business questions, we have learnt how to empower developers, data analysts, data scientists and business analysts to access data from the AWS Glue Data Catalog without creating an external schema and using Query Editor v2 from Amazon Redshift Serverless or a provisioned data warehouse to generate SQL queries with insights in seconds.

Resource

You may read more about this new capability in the AWS Big Data Blog.

Reference

Until the next lesson, happy learning! 😀

Last Week - AWS Summit New York, 26 July 2023 keynote VP of analytics, databases and machine learning, Dr Swami Sivasubramanian

You may watch the AWS Summit New York keynote from VP of analytics, databases and machine learning Dr Swami Sivasubramanian.

You may also read all the latest announcements and highlights from analytics, database and Generative AI here.

Last Week - AWSonAir on Twitch, 26 July 2023

Watch on-demand on AWSonAir on Twitch below, the latest announcements including AWS Glue from AWS Summit New York.

Next Month - AWS Innovate Data Edition - 30 August 2023

You may join us and register for AWS Innovate, Data Edition in APJ on 30 August 2023. You may register here and how to unlock the value of data and develop your data strategy as you hear from AWS Experts.

AWS innovate

Top comments (0)