loading...
Cover image for Data Ingestion with Azure Event Hubs using Python

Data Ingestion with Azure Event Hubs using Python

dev3l profile image Justin Beall ・2 min read

Extract Transform Load (ETL) is a data integration pattern I have used throughout my career. Decoupling each step is easier than ever with Microsoft Azure. Using Azure Event Hubs we should be able to begin to scaffolding an ephemeral pipeline by creating a mechanism to ingest data however it is extracted.

I have been exposed to many flavors of the ETL pattern throughout my career. From manually inserting data (typically when something goes wrong) through complex snowflake message queue architectures (RabbitMQ + Workers). Ease was never the goal. We believed the problem needed to be solved RIGHT NOW (manual) to it can NEVER FAIL (eventual consistency).

Microsoft has turned its ship around with "Any Developer, Any App, Any Platform". Enablement and reduction of complications is a hallmark of the ecosystem, convention over configuration. My favorite of the twelve principles in the Manifesto for Agile Software Development is Simplicity; Azure has nailed it.

Simplicity--the art of maximizing the amount of work not done--is essential.

1. Setup in Azure

First, we need a Microsoft Azure account. I was able to connect my GitHub account to their free tier with very little effort.

GitHub Account Link

2. Create Event Hub namespace and topic

Event Hub Architecture

Follow the Quickstart: Create an event hub using Azure portal. After following the guide, be sure to copy your Event Hub name and primary connection string. We will need these in a bit!

3. Send and Receive Events in Python

Now that we have an Azure Event Hub, let's send some messages. As a starting point, I followed along here: Send events to or receive events from Event Hubs using Python

Example

Feel free to use python-azure-eventhub in GitHub as a starting template!

GitHub README

Output

Example Output

Conclusion

Cloud architecture lets us focus on our application and not the management of the machines. We ride the robustness of the Microsoft Azure platform, for a total cost of ownership that is significantly cheaper than internally managed infrastructure. With the scaffolding of Event Hub in place, our first step to a robust ETL pipeline is complete. Next steps are to hook into Azure Event Grid to chain messages sent to our queue using the pub-sub design pattern.

Discussion

pic
Editor guide