Our situation was a desire to run AWS Glue ETL (extract, transform, load) jobs where the data source is SQL Server in Azure that has an IP-whitelist firewall.
As AWS Glue is a serverless data integration service, it wasn't reasonable to maintain a range of public IP addresses that we should whitelist in the Azure firewall.
Inspired by this StackOverflow answer, it's possible to create a Glue connection where all traffic to external resources (like Azure) will use an assigned Elastic IP address.
In this post, I'll expand on their workaround with detailed steps and screenshots for the AWS console.
- Head to the VPC console and hit Create VPC in your desired region.
- Look for and select the VPC and more option which will help us to also create necessary subnets, route tables, and network connections - you can see these under the Preview section.
- Configure other options the way you prefer but most importantly select 1 public subnet and 1 private subnet. Hit Create VPC.
- With the VPC and other essentials created, take note that the subnets, and route tables created for us have a similar name to the VPC name we provided in Step 2 (e.g,
- Head to create a NAT gateway. Select the public subnet in which to create the NAT gateway, keep connectivity type as Public. Here we will allocate an Elastic IP to the NAT gateway - this will be the IP inserted into our whitelist in Azure!
- Navigate to Route tables, and at this point it's handy to filter resources by VPC.
- Select the route table that is associated with your private subnet - the route table conveniently should've included private in its name tag.
Edit routes for the private route table and add a route with all destinations (AKA
0.0.0.0/0) with target of our NAT gateway created in Step 5.
- We're now ready to head into AWS Glue and create a connection. We'll select Azure SQL DB as our data source here, but this should work for any other source (e.g, Snowflake).
- I won't get into details of the data source connection (Azure SQL URL, etc) however we'll pay attention the the Network options section.
- Here we'll choose the VPC we created above, and most importantly select the private subnet for this Glue connection. Continue and create the Glue connection for your data source.
- Noting the public IP we allocated in Step 5, we can add it to our whitelist in Azure.
- We're done! Now all AWS Glue jobs using the connection we created above will use the allocated public IP when communicating to external (outside of AWS) resources.