DEV Community

Cover image for Creating and Integrating Data Pipeline Using Amazon S3 and Snowflake Data Warehouse. (Using SnowSQL).
Michael Obed
Michael Obed

Posted on

Creating and Integrating Data Pipeline Using Amazon S3 and Snowflake Data Warehouse. (Using SnowSQL).

Data pipeline is widely used in the field of Data Engineering and Analytics to fetch data from external sources for instance AWS Redshift, S3 (Simple Storage Service), GCP (Google Cloud Platform), Oracle, Azure and much more industry –used technologies.

Image description

Data Pipeline – according to Snowflake is concerned with moving data from one place to a destination (such as a data warehouse or large storage service) while simultaneously optimizing and transforming the data. As a result, the data arrives in a state that can be analyzed and used to develop business insights.

This article aims to cover how integrate AWS S3 with Snowflake which is convenient when dealing and working with large data sources.

Requirements:

  • Knowledge of SQL for working in Snowflake.
  • Snowflake account and AWS
  • Data file in preferably CSV (Comma Separated Value), it can be large up to 160 GB of size. Beyond that AWS requires the Cloud developer to use other tools – this is beyond this article’s scope.

Step 1
Setup an AWS account as a root user.

Image description

Step 2
After a successful login into your account go to services as shown. For new users locate storage by scrolling downward then select S3.

Image description

Image description

Step 3
Upon selecting S3 we expect the display as follows – design might change later.

Image description

Click Create bucket then give it a name – in our case mybootcampbucket:

Image description

Image description

Click Create bucket to finish the process.

Image description

To upload the file select the newly created bucket (highlighted).

Image description

Upload the file now as depicted below.

Image description

Image description

Step 4
After uploading the file/dataset. Policy need to be set up. The policy facilitates permission for sharing to external integration when associated with identity and compliance to respective AWS account.
To setup policy click services then head to Security, Identity and Compliance. See below:

Image description

Select IAM (Identity Access Management) then open Policies.

Image description

Image description

After clicking ‘Create policy’, allocate the name to the policy to be created. Our case is Bootcamp2023.

Image description

Under policies, click JSON Policy, this is because Policies are written in JavaScript Object Notation (JSON) which is a data representation Key – Value pair syntax.

Image description

Image description

Image description

After setting up the policy, next we create the Role.

Image description

Give your role a name, in this scenario – our role is named Bootcamp_2023.

Setup the permission.

Image description

Image description

I have highlighted to note that we deal with This account in particular.
Next select the policy to setup a role for.

Image description

Finish up with the role creation.

Image description

Now copy the ARN (Amazon Resource Names) that will help in identifying resources uniquely.

Image description

Step 5 – Setting Up/Creating a Snowflake Account.
To create a Snowflake account head to Snowflake

Image description

After creating an account, we need to create our Warehouse – as shown below.

Image description

Displaying the pipeline dataset loaded from the AWS S3.

Image description

That is all for this article.

Top comments (0)