Software is the core of every business, and reliable software requires a robust infrastructure. But what comes to mind when you hear the word infrastructure? If it's virtual machines, load balancers, and network switches, then you're forgetting a critical piece of the puzzle - the data infrastructure. This includes databases, streaming services, and other services that help with storing and moving data.
In this article, I'll highlight the benefits of automating your data infrastructure, warn you about the burden of automation, and guide you through the "how" of automating your data infrastructure.
Pieces of data infrastructure
Before the cloud era, organizations had monolithic relational databases running on bare-metal servers or virtual machines. The database server and the machine on which it was hosted had names (treated as pets), and these formed the data infrastructure for the entire company. In the cloud era multicloud is an opportunity: your current version of the truth might be stored in a relational database running on Amazon Web Service (AWS) while you keep the history of all changes in a long term memory object storage on Google Cloud Platform (GCP).
To bridge the data gap across technologies you could use Aiven for Apache Kafka®, capable of syncing data in streaming mode across clouds as well as multiple other services to manage data access, observability, and governance. All of these combine to form your precious data infrastructure.
Oh, and don't forget the Virtual Private Cloud (VPC), the subnets, and the network gateways that you setup within your data infrastructure!
The burden of automation (done wrong)
You might have started with a few services as a proof-of-concept, and in-house shell scripts seemed like the perfect tools for the job. As time passed, you needed more services and these shell scripts kept growing more complex. At one point, you might have added a manual process for deploying services that are behind a firewall. The database behind this firewall makes you create and manage static IP addresses since they don't accept hostnames.
And by this point, you are in automation hell and these blessed shell scripts are just pieces of a broken semi-automated process.
But there is hope! And that hope is the Infrastructure as Code (IaC) approach.
Benefits of using the Infrastrucure as Code (IaC) approach
The adoption of in-house shell scripts seemed like automation, once. This sort of automation, however, can be unpredictable. It can also lack accountability since you cannot tell who made a recent change.
The goal of IaC is to manage your infrastructure in a manner that is based on software development practices. This includes version control, testing, continuous integration, continuous deployment, and so on. Let's take a look at some of the benefits of using the IaC approach to manage and automate your data infrastructure.
Reliability for data infrastructure
In the cloud era, the failure of an underlying hardware is not a matter of if but when. How can you ensure the reliability of the database services you're running while some hardware fails? This is why the underlying hardware is abstracted as software resources. The automation tool creates, changes, or destroys any resource(s) that deviate from the expected state. The loose coupling of hardware and software ensures that you can build and rebuild systems dynamically and reliably.
A predictable automation tool
Imagine you have a script that deploys a database service. What happens if you run that script 10 times? Does it create 10 services? To avoid this, you'd need to add logic to check if a service already exists before creation. With an IaC approach, this logic is already built into the tool you're using, ensuring that your actual system follows the expected state as defined in the code.
Consistency across environments
How do you ensure that your staging and production environments are identical in terms of compute, network, and storage capacity? How does your distributed operations team ensure that they are configuring identical internal developer platforms for their developers? Automation tools can create repeatable, consistent software environments from version-controlled software blueprints - infrastructure as code. This is true of both application and data infrastructure.
Security from automation
Inconsistency leads to misconfiguration, and misconfiguration leads to security issues. If you're storing long-lived database admin credentials on spreadsheets and sharing them with your colleagues, you're leaving yourself wide open to security attacks and system compromise. Whether you're invoking the command for your automation tool or running the tool in a continuous integration (CI) process, the access to build and configure servers should be dynamic and short-lived. The automation tools can control and audit access, as well as revoking certain access in the event of a breach.
Cost and agility
The time of your engineering team is finite and valuable. Rather than building and configuring systems, you'd probably prefer them to be adding value to your business - by building and fixing your applications. While integrating any automation tool might seem expensive at the start, it saves countless numbers of hours for your engineering team by automating the work of building and managing your IT infrastructure.
When deploying dozens of cloud resources as part of your data infrastructure, you need to ensure that they are created in a specified order to handle dependency issues. You might need to stand up a source and a target Apache Kafka® cluster before setting up an Apache Kafka® MirrorMaker 2 replication flow. The same goes for deleting resources.
The automation and IaC approach do the heavy lifting of figuring out the dependency checks in the background, and allow you to quickly deploy multiple services across clouds and regions, saving your organization both time and money.
Tools to automate your IT infrastructure
Terraform is an open-source IaC tool that helps you automate the provisioning of application and data infrastructure in a multi-cloud deployment model. While there are many choices when it comes to automation tools, Terraform has been battle-tested in production by a number of customers. Terraform uses providers to work with virtually any platform or service with an accessible API.
Check out the following recipes about automating your data infrastructure with Aiven Provider for Terraform.
- Deploy PostgreSQL® services to multiple clouds and regions
- Cross-cluster replication with Apache Kafka® MirrorMaker 2
- Apache Kafka® as source and sink for Apache Flink® job
Visit Aiven Terraform Cookbook for more recipes.
Wrapping up
Now you understand the benefits of automation and have been introduced to the IaC approach to automating your data infrastructure, what's your next step? Start small and do a proof-of-concept for your organization. If you need a testing ground which manages your data infrastructure based on open-source technologies, give Aiven a try.
To get the latest news about Aiven and our services, plus a bit of extra around all things open source, subscribe to our monthly newsletter! Daily news about Aiven is available on our LinkedIn and Twitter feeds.
Top comments (0)