DEV Community

GargeeBhatnagar for AWS Community Builders

Posted on

Integration of Glue [Serverless Data Integration Service] with CI/CD

“ I have checked the solution to deploy the script code change sequentially with approval in different accounts such as dev, qa and prod. Also the change in configuration of glue should be proper. I have faced an issue while using one common bucket for dev, qa and prod environment for giving python script code, this code is parallelly updated on each environment. But the requirement is to update the code in dev first then after approval update the code to qa then after approval update of code in prod. The solution is only possible when I deploy the glue script python code using codepipeline and glue config using cloudformation stack. The solution works well as per the requirement. Solution is secured with kms and also cheaper in terms of cost.”

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months. AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the AWS Glue Data Catalog.

AWS CodePipeline is a continuous delivery service you can use to model, visualize, and automate the steps required to release your software. You can quickly model and configure the different stages of a software release process. CodePipeline automates the steps required to release your software changes continuously.

In this post, you will experience the integration of glue with CICD pipeline in cross accounts. Here I have created a s3, kms, codecommit repo and codepipeline for deployment of python script code in s3 bucket. Also created glue connection and glue job using cloudformation template.

Architecture Overview

Image description
The architecture diagram shows the overall deployment architecture with data flow, two aws account, code commit, code deploy, code pipeline, s3, cloudformation, kms, glue.

Solution overview

The blog post consists of the following phases:

  1. In Main Account, Create of CodeCommit Repo, S3 Bucket, KMS and CodePipeline
  2. In Dev Account, Create of Glue Connection and Glue Job Using Cloudformation Template
  3. Testing for Deployment of Script Python Code Using CodePipeline

Phase 1: In Main Account, Create of CodeCommit Repo, S3 Bucket, KMS and CodePipeline

  1. Open the console for creation of commit repo, s3 bucket with its encryption and adding bucket policy for cross account access. Also create a custom managed key and code pipeline. After creation of codepipeline will have a pipeline IAM role creation automatically. Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description

Phase 2: In Dev Account, Create of Glue Connection and Glue Job Using Cloudformation Template

  1. Open the Cloudformation console, Create a glue connection and glue job with custom required parameters using template.yaml file. Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description

Phase 3: Testing for Deployment of Script Python Code Using CodePipeline

  1. Python script code for glue job will be stored in commit repo and deployed to s3 bucket using code pipeline. Once the code deploy to s3, it will be reflected in the glue job created cross account. Also we can update the glue configuration by updating the cloudformation stack. Image description Image description Image description Image description Image description Image description Image description Image description Image description Image description

Clean-up

In Dev Account: Delete of cloudformation stack and IAM role. In Main Account: Delete of S3, KMS, IAM, codecommit repo and codepipeline.

Pricing

I review the pricing and estimated cost of this example.
Cost of CodeCommit = $0.0
Cost of Key Management Service = $1.00
Cost of Simple Storage Service = $0.01
Cost of Glue = $0.0
Total Cost = $(0.0 + 1.00 + 0.01 + 0.0) = $1.01

Summary

In this post, I showed “how to do the integration of glue with CICD pipeline in cross accounts”.

For more details on AWS Glue, Checkout Get started AWS Glue, open the AWS Glue console. To learn more, read the AWS Glue documentation.

For more details on AWS CodePipeline, Checkout Get started AWS CodePipeline, open the AWS CodePipeline console. To learn more, read the AWS CodePipeline documentation.

Thanks for reading!

Connect with me: Linkedin
Image description

Top comments (1)

Collapse
 
charlesbate profile image
charles-bate

Nice demo! please where is the cloud formation code used for the demo