In today's data-driven world, organizations often find themselves working with multiple cloud platforms to leverage the unique strengths of each. A common scenario is using AWS for data storage and processing, while utilizing Snowflake for advanced analytics and data warehousing. But how do you efficiently bridge these two powerful platforms? In this guide, we'll walk you through the process of connecting AWS Glue tables to Snowflake, enabling seamless data access and analytics across your cloud environments.
Prerequisites
Before we dive in, make sure you have:
- An AWS account with appropriate permissions
- A Snowflake account with admin access
- Basic knowledge of AWS IAM, S3, and Glue
- Familiarity with Snowflake
The Connection Process: A Step-by-Step Guide
1. Creating an IAM Role for S3 Access
Our journey begins in AWS, where we'll create an IAM role that grants Snowflake access to your S3 bucket and AWS Key Management Service (KMS).
- Navigate to the IAM console in AWS
- Create a new role
- Attach the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": "arn:aws:kms:region:account-id:key/key-id"
}
]
}
- Add this assume role policy to the trust relationships:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::000000000000:root"
},
"Action": "sts:AssumeRole",
"Condition": {}
}
]
}
Remember to replace placeholder values with your specific details. We'll update the 000000000000
later with Snowflake's account ID.
2. Setting Up a Storage Integration in Snowflake
Now, let's hop over to Snowflake to create a storage integration using our newly minted IAM role.
Execute this SQL in Snowflake:
CREATE STORAGE INTEGRATION s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::account-id:role/role-name'
STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket-name/');
DESC INTEGRATION s3_int;
After creation, run DESC INTEGRATION s3_int;
again and note down the STORAGE_AWS_IAM_USER_ARN
and STORAGE_AWS_EXTERNAL_ID
. You'll need these for the next step.
3. Enhancing Security with External ID
Back in AWS, let's add an extra layer of security to our IAM role using the external ID we just obtained.
Update the trust relationship of your S3 access role with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::account-id:user/username"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "0000_external_id_from_snowflake"
}
}
}
]
}
Use the values from the DESC INTEGRATION
command to replace account-id
, username
, and 0000_external_id_from_snowflake
.
4. Creating a Glue Catalog Connection in Snowflake
Now, let's set up a connection to the AWS Glue Data Catalog in Snowflake.
Run these SQL commands:
CREATE DATABASE glue_db;
CREATE SCHEMA glue_db.glue_schema;
CREATE EXTERNAL CATALOG glue_catalog
WITH LOCATION='aws_glue_catalog'
CATALOG_ROLE='arn:aws:iam::account-id:role/glue-access-role'
CATALOG_EXTERNAL_ID='your_external_id'
REGION='your-aws-region';
DESC CATALOG glue_catalog;
Note down the CATALOG_AWS_IAM_USER_ARN
and CATALOG_EXTERNAL_ID
from the DESC CATALOG
command output.
5. Creating an IAM Role for Glue Access
Back in AWS, we need to create another IAM role specifically for Glue access.
- Create a new IAM role
- Attach this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:region:account-id:catalog",
"arn:aws:glue:region:account-id:database/*",
"arn:aws:glue:region:account-id:table/*/*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": "arn:aws:kms:region:account-id:key/key-id"
}
]
}
- Add this assume role policy to the trust relationships:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "CATALOG_AWS_IAM_USER_ARN"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "CATALOG_EXTERNAL_ID"
}
}
}
]
}
Use the CATALOG_AWS_IAM_USER_ARN
and CATALOG_EXTERNAL_ID
you noted earlier.
6. Creating an Iceberg Table in Snowflake
Finally, let's create an Iceberg table in Snowflake that uses our new storage integration and Glue catalog connection.
Execute this SQL:
CREATE OR REPLACE TABLE glue_db.glue_schema.my_iceberg_table
EXTERNAL_CATALOG = glue_catalog
EXTERNAL_DATABASE = 'your_glue_database'
EXTERNAL_TABLE = 'your_glue_table'
EXTERNAL_LOCATION = 's3://your-bucket-name/path/to/iceberg/data/'
STORAGE_INTEGRATION = s3_int;
7. Auto refresh
This week snowflake release the feature to auto fresh.
https://docs.snowflake.com/user-guide/tables-iceberg-auto-refresh
CREATE CATALOG INTEGRATION auto_refresh_catalog_integration
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = 'my_catalog_namespace'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789123:role/my-catalog-role'
GLUE_CATALOG_ID = '123456789123'
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = 60;
or change later with ALTER CATALOG INTEGRATION auto_refresh_catalog_integration SET REFRESH_INTERVAL_SECONDS = 120;
Conclusion: Unleashing the Power of Cross-Cloud Analytics
Congratulations! You've successfully built a bridge between AWS Glue and Snowflake. This powerful connection allows you to:
- Query AWS Glue data directly from Snowflake
- Combine AWS data with other sources in Snowflake for comprehensive analytics
- Leverage Snowflake's powerful querying capabilities on your AWS data
- Maintain data in different cloud environments while centralizing analytics
Remember to monitor your usage and costs, and always follow best practices for data security and compliance. This setup opens up new possibilities for data analysis and processing, allowing you to get the best of both AWS and Snowflake worlds.
Happy analyzing!
Top comments (0)