What is infrastructure drift?

#infrastructure #devops #terraform

I’ve spent most of my career as an application/web developer, so primarily glueing X to Y to do a thing that I love doing! And my exposure to operating through DevOps has been through the lens of an application developer.

I’ve recently joined CloudQuery, which is an open-source cloud asset inventory powered by SQL. What this means is that I’m learning more about infrastructure than I’ve ever done before. With that, I’m learning about things like infrastructure as code through Terraform and the concept of infrastructure drift.

What is drift?

Simply put, it’s when the state of your infrastructure is different from the state set by your IaC (Infrastructure-as-Code). For example, if you’ve deployed five lambdas via terraform with the default 128 MB of memory, but now three of them are using 256 MB then you have drift in the states.

What causes drift?

My reading has led me to believe that you can have two kinds of drift, the first is like the above which is Resources Managed by IaC and the second is Resources Not Managed by IaC. The first is when all your resources have been deployed by IaC like Terraform or CloudFormation, the second is resources deployed manually or via another process that doesn’t use IaC. The things that cause drift are the same in both cases, either someone or something manually adding or changing individual resources.

Is drift bad?

Not necessarily, things happen in production, you have to tweak and keep things up and alive when apps are on fire. And sometimes, you need resources outside of your IaC, say security tooling that needs oversight etc.

But how do you know if you have drifted?

And this is where we come to why I went down this rabbit hole, CloudQuery now has drift detection built on top of the open-source cloud asset inventory. You can detect drift via another tool, or via Terraform itself, but these only work against resources managed with Terraform. But with CloudQuery you can see not only the drift from your Terraform state, but also the resources within your Cloud vendor account that aren’t managed as an easy to identify list.

How do I resolve the drift?

If you head to the documentation for cloudquery drift scan you can see an example result, like so:

=== DRIFT RESULTS  ===
5 Resources not managed by Terraform
aws:ec2.ebs_volumes:
- vol-id1
- vol-id2
aws:ec2.instances:
- i-id1
- i-id2
aws:ec2.security_groups:
- sg-id1
93 Resources managed by Terraform (equal IDs)
=== SUMMARY ===
Total number of resources: 98
- 5 not managed by Terraform
- 93 managed by Terraform (equal IDs)
- 94.89% covered by Terraform

You can see the resources that are unmanaged by Terraform, as well as any resources that have drifted from your Terraform state. At this point, you can now decide which resources to leave unmanaged by Terraform, or you can follow the Manage Resource Drift guide in the Terraform documentation to begin managing them and matching the Terraform to production. And for the little piece of extra automation to make your life simpler, you could run CloudQuery as part of your CI to make sure everything is in the right state before or after a deployment.

DEV Community

What is infrastructure drift?

What is drift?

What causes drift?

Is drift bad?

But how do you know if you have drifted?

How do I resolve the drift?

Top comments (0)

Read next

Most Commonly Used Docker Commands!

Managing Kubernetes in Production: A DevOps Engineer’s Essential Guide

Creating and Deploying a Google Cloud Run Service Using Artifact Registry and GitHub Actions

Understanding Kubernetes Services