Over time, AWS accounts can accumulate resources that are no longer necessary but continue to incur costs. One common example is orphaned EBS snapshots left behind after volumes are deleted. Managing these snapshots manually can be tedious and costly.
This guide shows how to automate the cleanup of orphaned EBS snapshots using Python (Boto3) in an AWS Lambda function, which is then triggered using AWS EventBridge on a schedule or event.
By the end, you’ll have a complete serverless solution to keep your AWS environment clean and cost-effective.
Prerequisites
Installing AWS CLI and Terraform
First, let’s ensure the essential tools are installed.
AWS CLI
The AWS CLI allows command-line access to AWS services. Install it according to your operating system:
macOS: brew install awscli
Windows: AWS CLI Installer
Linux: Use the package manager (e.g., sudo apt install awscli
for Ubuntu).
Verify installation:
aws --version
Terraform
Terraform is a popular Infrastructure as Code (IaC) tool for defining and managing AWS resources.
macOS: brew install terraform
Windows: Terraform Installer
Linux: Download the binary and move it to /usr/local/bin
.
Verify installation:
terraform -version
Configuring AWS Access
Configure your AWS CLI with access keys to allow Terraform and Lambda to authenticate with AWS services.
Get Access Keys from your AWS account (AWS IAM Console).
Configure AWS CLI:
aws configure
Follow the prompts to enter your Access Key, Secret Access Key, default region (e.g., us-east-1
), and output format (e.g., json
).
Next, since we are going to build the entire stack with Terraform, please fork the repository located here, which contains the full code for the project.
Clone it to your local machine and open it in a code editor.
I have used Visual Studio Code, and it appears as follows:
Delete the following two files from the project, as these will be recreated when you run the terraform from your code editor:
- orphan-snapshot-delete.zip
- .terraform.lock.hcl
Next, lets configure the S3 backend:
Create an S3 Bucket for Terraform State
1. Go to the S3 Console:
- Sign in to your AWS account and navigate to the S3 service.
2. Create a New Bucket:
- Click Create bucket.
- Give the bucket a unique name, such as
my-terraform-state-bucket
. - Choose an AWS Region that matches your infrastructure region for latency reasons.
3. Configure Bucket Settings:
- Keep Block Public Access settings enabled to restrict access to the bucket.
- Versioning: Enable versioning to maintain a history of changes to the state file. This is useful for disaster recovery or rollbacks.
- Leave other settings as default.
4. Create the Bucket:
- Click Create bucket to finalize the setup.
Create a DynamoDB Table for State Locking (Optional but Recommended)
Using a DynamoDB table for state locking ensures that only one Terraform process can modify the state at a time, preventing conflicts.
1. Go to the DynamoDB Console:
- In your AWS Console, go to DynamoDB.
2. Create a New Table:
- Click Create table.
- Name your table, e.g.,
terraform-state-locking
. -
Partition Key: Set the partition key to
LockID
and use the String data type.
3. Configure Settings:
- Leave default settings (such as read and write capacity) unless you have specific requirements.
- Create the table by clicking Create table.
Configure IAM Permissions for Terraform
Terraform needs specific permissions to interact with S3 and DynamoDB (if using locking).
This step is necessary only if you are operating under the least privileged access. If you already have administrator access, you can skip this step.
1. Create or Use an IAM User:
- If you don’t have an IAM user for Terraform (You can use your own IAM user and attach these policies to it), create one in the IAM Console.
- Attach policies that grant permissions to access S3 and DynamoDB.
2. Attach S3 and DynamoDB Policies:
Use an inline policy or add the following permissions:
- Access to the S3 bucket.
- Access to the DynamoDB table (if using locking).
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-terraform-state-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:PutItem",
"dynamodb:GetItem",
"dynamodb:DeleteItem",
"dynamodb:DescribeTable"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/terraform-state-locking"
}
]
}
After completing all the prerequisites, let's examine the Python and Terraform code that will perform the actual magic.
Step 1: Python Code for Orphaned Snapshot Cleanup
In the code editor, open the orphan-snapshot-delete.py
file.
The complete function code is as follows:
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
ec2_cli = boto3.client("ec2")
response = ec2_cli.describe_snapshots(OwnerIds=["self"], DryRun=False)
snapshot_id = []
for each_snapshot in response["Snapshots"]:
try:
volume_stat = ec2_cli.describe_volume_status(
VolumeIds=[each_snapshot["VolumeId"]], DryRun=False
)
except ec2_cli.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "InvalidVolume.NotFound":
snapshot_id.append(each_snapshot["SnapshotId"])
else:
raise e
if snapshot_id:
for each_snap in snapshot_id:
try:
ec2_cli.delete_snapshot(SnapshotId=each_snap)
logger.info(f"Deleted SnapshotId {each_snap}")
except ec2_cli.exceptions.ClientError as e:
return {
"statusCode": 500,
"body": f"Error deleting snapshot {each_snap}: {e}",
}
return {"statusCode": 200}
This Lambda function uses Boto3, AWS’s Python SDK, to list all EBS snapshots, check their associated volume status, and delete snapshots where the volume is no longer available. Here’s the complete function code:
Step 2: Terraform Configuration for Serverless Infrastructure
Using Terraform, we’ll create a Lambda function, IAM role, and policy to deploy this script to AWS. Additionally, we’ll set up an EventBridge rule to trigger Lambda on a regular schedule.
Terraform Setup and Provider Configuration
This section configures Terraform, including setting up remote state management in S3.
Open the terraform file name main.tf
in code editor and start reviewing the code as shown in the following sections.
Terraform Setup and Provider Configuration
This section configures Terraform, including setting up remote state management in S3.
Note:
- Change the
required_version
value as per theterraform -version
output. - Update the
bucket
,key
, anddynamodb_table
values for the S3 backend to match what you have created in the previous steps.
terraform {
required_version = ">=1.5.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.72.0"
}
}
backend "s3" {
bucket = "terraform-state-files-0110"
key = "delete-orphan-snapshots/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "tf_state_file_locking"
}
}
provider "aws" {
region = var.aws_region
}
IAM Role and Policy for Lambda
This IAM configuration sets up permissions for Lambda to access EC2 and CloudWatch, enabling snapshot deletion and logging.
resource "aws_iam_role" "lambda_role" {
name = "terraform_orphan_snapshots_delete_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": { "Service": "lambda.amazonaws.com" },
"Effect": "Allow"
}
]
}
EOF
}
resource "aws_iam_policy" "iam_policy_for_lambda" {
name = "terraform_orphan_snapshots_delete_policy"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumeStatus",
"ec2:DescribeSnapshots",
"ec2:DeleteSnapshot"
],
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "attach_iam_policy_to_iam_role" {
role = aws_iam_role.lambda_role.name
policy_arn = aws_iam_policy.iam_policy_for_lambda.arn
}
Packaging and Deploying the Lambda Function
Here, we package the Python code and deploy it as a Lambda function.
data "archive_file" "lambda_zip" {
type = "zip"
source_file = "${path.module}/python/orphan-snapshots-delete.py"
output_path = "${path.module}/python/orphan-snapshots-delete.zip"
}
resource "aws_lambda_function" "lambda_function" {
filename = data.archive_file.lambda_zip.output_path
function_name = "orphan-snapshots-delete"
role = aws_iam_role.lambda_role.arn
handler = "orphan-snapshots-delete.lambda_handler"
runtime = "python3.12"
timeout = 30
}
EventBridge Rule for Lambda Invocation
AWS EventBridge allows you to create scheduled or event-based triggers for Lambda functions. Here, we’ll configure EventBridge to invoke our Lambda function on a schedule, like every 24 hours.
You can learn more about EventBridge and scheduled events in AWS documentation here.
resource "aws_cloudwatch_event_rule" "schedule_rule" {
name = "orphan-snapshots-schedule-rule"
description = "Trigger Lambda every day to delete orphaned snapshots"
schedule_expression = "rate(24 hours)"
}
resource "aws_cloudwatch_event_target" "target" {
rule = aws_cloudwatch_event_rule.schedule_rule.name
arn = aws_lambda_function.lambda_function.arn
}
resource "aws_lambda_permission" "allow_eventbridge" {
statement_id = "AllowExecutionFromEventBridge"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.lambda_function.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.schedule_rule.arn
}
Step 3: Applying the Terraform Configuration
After defining the infrastructure, initialize and apply the Terraform configuration:
terraform init
terraform apply
Step 4: Testing and Monitoring the Lambda Function
To verify that the solution works:
- Manually Trigger the Event (optional): For initial testing, trigger the Lambda function manually from the AWS Lambda console.
- Monitor CloudWatch Logs: The Lambda function writes logs to CloudWatch, where you can review entries to verify snapshot deletions.
-
Adjust the Schedule as Needed: Modify the
schedule_expression
to set a custom frequency for snapshot cleanup.
Enhancements
The following enhancements could be implemented in this project:
Instead of scheduling an Eventbridge rule, the deletion of EBS volumes could be detected by Eventbridge, which would then trigger the Lambda function to delete the corresponding snapshot.
Paging could be incorporated into the Python function to manage situations where the number of snapshots is substantial.
Wrapping Up
By combining Python (Boto3), Lambda, AWS EventBridge, and Terraform, we’ve created a fully automated, serverless solution to clean up orphaned EBS snapshots. This setup not only reduces cloud costs but also promotes a tidy, efficient AWS environment. With scheduled invocations, you can rest assured that orphaned resources are consistently removed.
Try this solution in your own AWS account and experience the benefits of automation in cloud resource management!
Please feel free to share your thoughts on this article in the comments section. Thank you for reading.
Top comments (0)