DEV Community

Cover image for Drift detection
Paweł Piwosz for AWS Community Builders

Posted on • Edited on

Drift detection

This episode tells the story about what drift is and why it is important to understand how crucial this process is in order to ensure proper configuration and security of the infrastructure.

What is drift

I am sure many of you at least heard this term. Let's make it simple. Drift is unwanted by process, undocumented, most often manual, change in the configuration. Another words, someone logged into AWS (for example) and added something to Security Group, or IAM Policy. "For a moment, to check something". And this "small change" opens a lot of possibilities for attackers for long period of time until it is detected. If it is detected.

It is because of drifts that many teams is affraid to run their IaC templates to not "break something". Well, the main issues here are missing process, best practices are not used and IaC is used as "scripted Operator's hands".

So, let me be harsh. If anyone in your team says "do we really need to run this? I am not sure what will happen." Don't be angry on him. Be angry on his manager :)

In case of IaC we have two states, or phases, to care to. Those are quite distinct and easy to define - one is before deployment, second - the lifecycle after deployment.

Implement a proper mechanism based on best practices for the first phase is relatively easy. I spoke about it multiple times on different events (and probably I'll write something about it). Second one, however, is tricky. While first one is activated, or triggered by some event (like push to repo, for example), second must be executed continuously. Another factor here is the quality of drift detection, which is... questionable in many aspects.

But we have Spacelift! Let's see what it can do for us.

Preparations

Before we start to check the drifts, we add more resources to our template. Simple resources - Security Group, EC2 instance.

First, Security Group. Simple one.

data "aws_vpc" "default-vpc" {
  default = true
}

resource "aws_security_group" "my-sg" {
  name        = "test-sg"
  description = "test drifts"
  vpc_id      = data.aws_vpc.default-vpc.id

  ingress {
    description = "test entry"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.10.10.10/32"]
  }

  egress {
    description = "test entry"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
Enter fullscreen mode Exit fullscreen mode

For those ofyou who are unfamiliar with Terraform - in first resource (well, data, in fact) we gather information about the default VPC in the Region and then we use this information to create Security Group in this VPC.

Now the time for EC2

data "aws_ami" "ubuntu-recent" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477"] # Canonical
}

data "aws_subnet_ids" "list-of-subnets" {
  vpc_id = data.aws_vpc.default-vpc.id
}

locals {
  subnet_ids_list = tolist(data.aws_subnet_ids.list-of-subnets.ids)
}

resource "aws_network_interface" "ec2-eni" {
  subnet_id       = local.subnet_ids_list[0]
  security_groups = ["${aws_security_group.my-sg.id}"]
}

resource "aws_instance" "drift-test" {
  ami           = data.aws_ami.ubuntu-recent.id
  instance_type = "t2.micro"

  network_interface {
    network_interface_id = aws_network_interface.ec2-eni.id
    device_index         = 0
  }
}
Enter fullscreen mode Exit fullscreen mode

What we've done here is as follows:

  • get the latest ID of the Ubuntu AMI
  • get the list of Subnets available in VPC (we had to manipulate this a little with locals and tolist)
  • create the ENI (network interface)
  • create EC2

Simple :)

When we apply the template through Spacelift, let's see our resources in Resources tab.

Created resources

We should see 7 resources created.

Time to create a drift

Let's login to the console and do some mess around. What I've done:

  • Change the instance type
  • Change inbound rule in Security Group
  • Add another inbound rule
  • Remove outbound rule

Ok, we generated some drift.

Drift detection

Drift detection is a process which allows us to catch the drift. And now you should see the difficulty already. In fact, drift can happen anytime, so, the detection should be continuous. But how to achieve it? When we continuously detect drifts using Terraform, we will permamently lock the state. We don't want that.

Ah well, this is the topic for another story :)

Let's create our check using Spacelift.

Schedules

Navigate to Settings and select the Scheduling tab in your stack. Click Add new schedule.

Schedules

Well, there is a bad news. This is not working on public workers now. So, we need to setup the private worker, and we will do it in next episode and then we'll continue with drifts!

But let's explore the configuration now. We can add different types of achtions, we selected Drift detection. Reconcile option is very powerful, and provides the kind of self-healing approach. It meas, that if we enable the option, when drift will be detected, the stored template will be applied to recover the desired state of infrastructure. Desired means here stored in VCS, in the template.

Ignore state might be risky. Normally, we shouldn't run detection is our stack is in different state than Finished, as this might cause problems.

And finally Schedule is a cron-like expression to set the schedule.

Takeaways

We learn what is the main cause of drifts and what drifts are. We discussed how we can detect drifts and we checked if we can detect this problem using Spacelift. We learn, that on this day (3rd of March 2023) we have to do some additional work, which we will do in the next episode.

Top comments (0)