This is a method for processing only a portion of data from a data source in a Glue Job by adding a WHERE clause to the access with JDBC.
It seams impossible to do with DynamicFrame, so I use DataFrame.
When connecting with DataFrame, it seams necessary to explicitly specify the connection information with JDBC. I use glueContext.extract_jdbc_conf
to retrieve the information from the Glue Connection.
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
sparcContext = SparkContext()
glueContext = GlueContext(sparcContext)
logger = glueContext.get_logger()
sparkSession = glueContext.spark_session
CONNECTION_NAME = "test-connection" # Name of Glue Connection
DB_NAME = "testdb" # Name of the PostgreSQL database to connect to
# Retrieve the information from the Glue Connection
jdbc_conf = glueContext.extract_jdbc_conf(connection_name=CONNECTION_NAME)
logger.info(jdbc_conf)
# Outputs as follows:
# {'enforceSSL': 'false', 'skipCustomJDBCCertValidation': 'false', 'url': 'jdbc:postgresql://hogehoge.xxxxxxxxxxxx.ap-northeast-1.rds.amazonaws.com:5432', 'customJDBCCertString': '', 'user': 'postgres', 'customJDBCCert': '', 'password': 'xxxxxxxxxxxx', 'vendor': 'postgresql'}
# The string in jdbc_conf["url"] is in the format
# jdbc:postgresql://HOST:5432
# and missing the database name at the end.
# When passed to DynamicFrame.fromDF, adding the database name is necessary.
jdbc_url = jdbc_conf["url"] + "/" + DB_NAME
# The SQL is written as follows:
query = "(SELECT * FROM public.testtable WHERE id <= 100) t"
dyf0 = DynamicFrame.fromDF(sparkSession.read.format("jdbc").
option("url", jdbc_url).
option("driver", "org.postgresql.Driver").
option("user", jdbc_conf["user"]).
option("password", jdbc_conf["password"]).
option("dbtable", query).load(), glueContext, "dyf0")
# Do something
When this Glue Job is executed, the following query is issued to PostgreSQL.
SELECT "id","col1","col2","col3" FROM (SELECT * FROM public.testtable WHERE id <= 100) t
Top comments (0)