AWS Glue is a fully managed ETL (extract, transform, and load) service that can categorize your data, clean it, enrich it, and move it between various data stores. AWS Glue consists of a central data repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python code, and a scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there's no infrastructure to manage.

AWS Glue consists of a number of components components:

AWS Glue was introduced in August 2017.

With the ability to scale on demand, AWS Glue helps you focus on high-value activities that maximize the value of your data.
To increase agility and optimize costs, AWS Glue provides built-in high availability and pay-as-you-go billing.

Data registered in the AWS Glue Data Catalog is available to many AWS Services, including

  • Amazon Redshift Spectrum
  • EMR (Hadoop, Hive, HBase, Presto, Spark, Impala, etc.)
  • Amazon Athena
  • AWS Glue scripts

