DEV Community

Mohamed Elbishbeashy
Mohamed Elbishbeashy

Posted on

Deploying AWS Glue Custom Connectors using AWS CDK

AWS Glue Custom Connector are the way to connect AWS Glue services to data sources that are not natively supported by AWS Glue connection types. It gives a wide range of connectivity options either by letting you develop your own connectors or reuse connectors from the Glue connectors marketplace.

To create custom connections, You need first to define a Glue Custom Connector and then create connection instances from this connector. This pattern describes how to deploy both AWS Custom Connector and connection instances using the AWS Cloud Development Kit (CDK).

As an example, we will be deploying a custom connector pointing to the latest SQL Server driver version 11.2, then we will create a connection instance from this connector. We will use AWS Secrets Manager to store the connection instance properties to make the pattern generic.

Target architecture

Target Architecture

  1. Configurations of the Glue Customer connector and connection are coded in Python in a CDK Stack. CDK deploy generates a Cloud Formation stack with the configurations.
  2. CDK deploys the Cloud Formation stack to create the AWS Glue custom connector, connections and connection secret.
  3. AWS connection uses the driver jars from the Amazon S3 bucket and the connection secret from AWS Secrets Manager.

Tools

AWS Glue: AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

Amazon Simple Storage Service: Amazon Simple Storage Service (Amazon S3) is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.

AWS CloudFormation: AWS CloudFormation enables you to create and provision AWS infrastructure deployments predictably and repeatedly. AWS CloudFormation enables you to use a template file to create and delete a collection of resources together as a single unit (a stack).

AWS Cloud Development Kit (CDK): CDK accelerates cloud development using common programming languages to model your applications.

AWS Secrets Manager: AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycle.

Code

The code section below represents a CDK code sample for creating an AWS Glue Connector. The main connection input property that instructs CDK to create a connector not a connection is match_criteriaas it should be set to "template-connection" as follows :

match_criteria=["template-connection"]

Additionally, the connection_type property should be set to "CUSTOM":

connection_type="CUSTOM"

The catalog_id connection property should point to the AWS account ID where the stack will be deployed.

Notice that you need to set the name property as this connector name will be used to instruct CDK on how to link custom connections with this connector. In this example, we set is as follows:

name="SynapseConnector"

In this code example, we parametrized the JDBC connection parameters such as host_url, database, user, password and authentication so that the connector is generic and can be used as a connection to any SQL Server or Synapse database. These parameters can be stored in a secret in AWS Secrets Manager and can be used while creating connection instances from this connector.

        cfn_connection = glue.CfnConnection(self, "SynapseConnector",
            catalog_id="<account-id>",
            connection_input=glue.CfnConnection.ConnectionInputProperty(
                connection_type="CUSTOM",
                connection_properties={
                        "CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
                        "CONNECTOR_TYPE" : "Jdbc",
                        "CONNECTOR_URL" : "s3://<bucket-prefix>/mssql-jdbc-11.2.0.jre8.jar",
                        "JDBC_CONNECTION_URL"  : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]"
                      },
                description="Synapse Connector",
                match_criteria=["template-connection"],
                name="SynapseConnector",
            )
        )
Enter fullscreen mode Exit fullscreen mode

Optionally, you can create a secret within the same CDK stack to hold the connection information.

The advantage of creating the secret within the same stack is that you can directly link it to the created connection. The code block below shows sample code for creating the secret within the CDK stack:

        connection_secret = secretsmanager.Secret(self, "CustomconnectionSM",
            secret_object_value={
                "host_url": SecretValue.unsafe_plain_text("database.sql.azuresynapse.net"),
                "database": SecretValue.unsafe_plain_text("database"),
                "user": SecretValue.unsafe_plain_text("username"),
                "password": SecretValue.unsafe_plain_text("dummy password"),
                "authentication": SecretValue.unsafe_plain_text("ActiveDirectoryPassword")
            }
        )
Enter fullscreen mode Exit fullscreen mode

The following code block is a CDK code sample for creating a connection from the above created connector. Basically, you will use the same configuration as the connector however to instruct the CDK to link this connection to the connector, you need to set the following parameter match_criteria to the connector name as follows:

match_criteria = ["Connection", cfn_connector.conection_input.name ]

Additionally, to link the connection to the created secret, you need to set the SECRET_ID connection property as follows:

"SECRET_ID" : connection_secret.secret_name

For the complete python code for the CDK stack, please refer to the Additional Information section.

Best practices

  • Parameterize the custom connector connection string as much as possible to make it reusable among connections of the same type. You can store parameters specific for each connection in a secret within AWS Secrets Manager.
  • For security best practices, don’t include the connection password within the secrets creation code; instead, create a dummy password while creating the secret and then later you can define an automated or manual mechanism for updating the password within AWS Secrets Manager.
  • It is better to let CDK to generate the secrets name. This is because deleting secrets from SecretsManager does not happen immediately, but after a 7 to 30 days blackout period. During that period, it is not possible to create another secret that shares the same name.

Additional information

The complete python code for the CDK stack:

import os
from aws_cdk import (
    Stack,
    aws_glue as glue,
    aws_secretsmanager as secretsmanager,
    SecretValue,
    CfnParameter
)
from constructs import Construct

class CdkCustomConnectionStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        account_id = os.environ.get("AWS_ACCOUNT_ID")
        driver_jar_path= "s3://<s3-bucket-prefix>/mssql-jdbc-11.2.0.jre8.jar"

        cfn_connector = glue.CfnConnection(self, "SynapseConnector",
            catalog_id=account_id,
            connection_input=glue.CfnConnection.ConnectionInputProperty(
                connection_type="CUSTOM",
                connection_properties={
                        "CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
                        "CONNECTOR_TYPE" : "Jdbc",
                        "CONNECTOR_URL" : driver_jar_path,
                        "JDBC_CONNECTION_URL"  : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=falsehostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]"
                      },
                description="description",
                match_criteria=["template-connection"],
                name="SynapseConnector",
            )
        )

        connection_secret = secretsmanager.Secret(self, "CustomconnectionSM",
            secret_object_value={
                "host_url": SecretValue.unsafe_plain_text("database.sql.azuresynapse.net"),
                "database": SecretValue.unsafe_plain_text("database"),
                "user": SecretValue.unsafe_plain_text("username"),
                "password": SecretValue.unsafe_plain_text("dummy password"),
                "authentication": SecretValue.unsafe_plain_text("ActiveDirectoryPassword")
            }
        )

        cfn_connection = glue.CfnConnection(self, "SynapseConnection",
            catalog_id=account_id,
            connection_input=glue.CfnConnection.ConnectionInputProperty(
                connection_type="CUSTOM",
                connection_properties={
                        "CONNECTOR_CLASS_NAME" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
                        "CONNECTOR_TYPE" : "Jdbc",
                        "CONNECTOR_URL" : driver_jar_path,
                        "JDBC_CONNECTION_URL"  : "[[\"default=jdbc:sqlserver://${host_url};database=${database};user=${user};password=${password};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;authentication=${authentication}\"],\",\"]",
                        "SECRET_ID" : connection_secret.secret_name 
                      },
                description="description",
                match_criteria = ["Connection", cfn_connector.connection_input.name],
                name="SynapseConnection",
            )
        )
Enter fullscreen mode Exit fullscreen mode

Top comments (0)