Hey all!
in this post I would like to go over an implementation of module in Terraform.
AWS Glue is a full ETL or ELT service, which provides connection to/from various sources. For example, MySQL to S3. And while you create a connection between these two you can add any transformations you want. This can be using code or no-code approach. The best thing is that Glue is a full-blown service which makes the development easy with Glue Notebook, where you can directly test your job.
I had hard time finding in Terraform the right module which combines creating an AWS Crawler, AWS Glue catalog as well as AWS Glue Job.
This example creates all these in one single block. Use case can be simple leading files from CSV processing it, and publishing to DB for other tables.
In order to use a terraform module simply initialize this GitHub Repo: aws-terraform-glue-db-crawler
module "glue" {
source = "github.com/Luk3rson/aws-terraform-glue-db-crawler"
create = terraform.workspace == "dev" ? true : false
glue_database_name = "call-statistics"
name_prefix = "main-application-name"
job_name = "first-job"
s3_target_bucket_name = "et-test-bucket"
s3_database_location = "${local.s3_database_location}/${local.glue_database_name}/${local.glue_table}"
target_path = "data/csv-table"
glue_script_path = "etl_scripts/etl-csv-to-dynamoDB.py"
kms_key = "1234abcd-12ab-34cd-56ef-1234567890ab"
# Job
worker_type = "G.1X"
number_of_workers = "2"
glue_version = "3.0"
execution_class = "FLEX"
default_arguments = {
"--TempDir" = "s3://etl-test-bucket/temp_dir/"
"--class" = "GlueApp"
"--encryption-type" = "sse-s3"
"--job-language" = "python"
"--job-bookmark-option" = "job-bookmark-enable"
}
}
Hope this module is helpful and it will speed up your work pushing development work to production!
Top comments (0)