DEV Community

Tim
Tim

Posted on

Connecting A Data Factory To An Existing Runtime In Azure

With Azure Data Factory, we can connect on premise assets to Azure resources, such as an on premise database to an Azure database. At this time, this will require an on premise runtime that communicates with both the on premise resource and the Azure resource. In the video Connect Azure Data Factory To Existing Runtime, we see how to connect an Azure Data Factory to an existing run time (to set up a runtime on premise, review the video How To Setup A Runtime for Azure Data Factory, as this video assumes that we have a runtime setup on one of our servers that has been linked to a data factory). While not discussed in the video, we should review the security of our design - we want to be extremely careful connection on premise assets to the cloud. If we will be sending data from an on premise asset, make sure that both the server with the data and the server running the runtime (if different) are isolated.

Before connecting an Azure Data Factory to an existing runtime, we should make sure that the following our true for our environment:

  • We've considered all security angles for running a runtime that connects an on premise asset to Azure's cloud.
  • We've looked at the scale ahead of time - being aware of the data load that we expect and how we'll parallelize this across assets, environments, etc. Keep in mind that the runtime may become the bottleneck, if we're using it for multiple assets/environments/etc. Part of this analysis is ensuring that we're transferring the least amount of data required to meet our objective.
  • We have monitoring in place to discover issues as quickly as possible, as if we have any production dependency on this, an outage could be costly.

These pre-requisites are a must, as without this analysis ahead of time, we may face costly issues later.

Once we have our runtime setup, we can grant permissions from this runtime to Azure Data Factories. We will want to consider how many data factories we want to share this with, as well as whether sharing this will comply with some of our other standards. For an example, we should not connect one runtime to our Sandbox-QA-Preproduction-Production environments for multiple reasons - scaling, testing and security, to name a few reasons. The only exception to this would be if we were demonstrating what NOT to do. The same logic applies if we're connecting a runtime to multiple assets within an environment: we want to consider scale early, as it will be more costly if we make adjustments later.

Top comments (0)