DEV Community

Marc El Naameh
Marc El Naameh

Posted on

Cleaning Up Unused ENIs On AWS

Running out of IP addresses in your subnets is a real issue that most teams face these days. Most of the times those IPs are reserved but unused!

This blog will address the problem of Unused Elastic Network Interfaces and what to do to free up our IP address Pool for other services.

The Solution will consist of a Lambda function that gets triggered daily and a CloudWatch Alarm to alert us of any errors generated by our Lambda.

At first I will go through the solution and how to create it using the AWS Console, then I will be doing the same solution but using Infrastructure as Code (Terraform).


Creating the Lambda

Before Going through the code, let's set up some of the configuration parameters:

1- The most important one is the timeout, make sure that it is more than 1 minute (this will depend on your workloads)

2- Memory: 200 MB should be enough.

3- No need to put the lambda inside a VPC.

Lambda Code using python:
First, import AWS SDK for Python (Boto3):

PS: To find out more about about the AWS SDK, check out this link

import boto3

client = boto3.client('ec2')
Enter fullscreen mode Exit fullscreen mode

Next Step: Import Subnets that have a specific tag.

key = type
value = private

There are multiple approaches for this. The way that I will be doing it is:
1- Describe all the resources based on tags
2- Add filters on resource type and tag (Key and Value)

# Get Subnets that have a Specific Tag.
tags = client.describe_tags(
    Filters = [
        {
        'Name' : 'resource-type',
        'Values' : [
            'subnet'
            ]
        },
        {
        'Name' : 'tag:type',
        'Values': [
            'private'
            ]
        } 
    ]
)
Enter fullscreen mode Exit fullscreen mode

A formatted output of this method:

{
   "Tags":[
      {
         "Key":"type",
         "ResourceId":"subnet-062715a13f1fffa54",
         "ResourceType":"subnet",
         "Value":"private"
      },
      {
         "Key":"type",
         "ResourceId":"subnet-0ee66ce86ffe0c073",
         "ResourceType":"subnet",
         "Value":"private"
      }
   ],
   "ResponseMetadata":{}
}
Enter fullscreen mode Exit fullscreen mode

Next, Get the subnet id from the result dictionary by parsing the data. Example:

# Get Subnets that have a Specific Tag.
list_subnets = []
i = 0
while i < len(tags['Tags']):
    list_subnets.append(tags['Tags'][i]['ResourceId'])
    i = i+1
Enter fullscreen mode Exit fullscreen mode

Now that we have the subnet IDs, next step is to retrieve all the network Interfaces and delete them.

To narrow down our results to only the ones needed, Filters need to be added. those are:
1- Filter to get ENIs from specific subnets
2- Filter to get ENIs that are unused (Available)

NOTE that in the Value you need to put the Subnet ID that your retrieved before

eni = client.describe_network_interfaces(
    Filters=[
        {  
        'Name': 'subnet-id',
        'Values': [
            subnetid,
            ]       
        },
        {
        'Name': 'status',
        'Values': [
            'available'
            ]
        },
    ]
)
i = 0
while i < len(eni["NetworkInterfaces"]):
    network_interface = client.NetworkInterface(eni["NetworkInterfaces"][i]['NetworkInterfaceId'])
    network_interface.delete()
    i = i+1
Enter fullscreen mode Exit fullscreen mode

For the handler we will have to call and sync the 2 previous functions for the lambda to work properly.


# Delete Available Network Interfaces in Specific Subnets
def lambda_handler(event, context): 
    list_subnet = get_tagged_subnets()
    i = 0
    while i < len(list_subnet):
        delete_available_eni(list_subnet[i])
        i = i+1


    return {
        "statusCode": 200,
    }
Enter fullscreen mode Exit fullscreen mode

Role of the lambda

Specific Permission the lambda needs to have to function properly.

The IAM Policy following the least privilege principle is:

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeTags",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DeleteNetworkInterface"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "1"
        }
    ],
    "Version": "2012-10-17"
}
Enter fullscreen mode Exit fullscreen mode

Lambda Trigger

Invoking the lambda with Amazon EventBridge is divided into two steps:
1- Creating a rule that gets triggered every certain time
2- Assigning the the Lambda as a target for the rule

SNS for any errors

In case of any generated errors by the lambda, receiving an email to troubleshoot the error is a must.

3 AWS Services are needed:

  1. SNS Topic
  2. SNS Subscription
  3. CloudWatch Alarms

Creating the SNS Topic is very straightforward.
Select the Standard one, name it and leave the rest as default.

For the SNS Subscription it is even easier!
Select the topic you wish to subscribe to and the protocol and add your email!

For the CloudWatch Alarms, The screenshots below explain how to set them up:

Alarm1

Alarm2

Alarm3

Infrastructure as code

To benefit from Consistency, Speed, and decrease human error. Let's Deploy our infrastructure using Terraform:

Lambda Function:

PS: Your python Code named main.py would be under a src Directory in the same directory as your terraform project.

module "lambda_clean_eni" {
  source = "terraform-aws-modules/lambda/aws"

  function_name = format("clean_eni")
  description   = "Delete Unused Available ENIs in Subnets that contains EKS Clusters"
  handler       = "main.lambda_handler"
  runtime       = "python3.9"
  publish       = true
  role_name     = "Lambda-Clean-ENI"

  memory_size = 200
  timeout     = 600

  attach_cloudwatch_logs_policy = true

  attach_policy_jsons    = true
  number_of_policy_jsons = 1
  policy_jsons = [
    data.aws_iam_policy_document.clean_eni.json,
  ]

  source_path = "${path.module}/src"
  hash_extra  = filesha256("${path.module}/src/main.py")


  allowed_triggers = {
    EveryHourRule = {
      principal  = "events.amazonaws.com"
      source_arn = aws_cloudwatch_event_rule.clean_eni.arn
    }
  }

  attach_network_policy = true
}
Enter fullscreen mode Exit fullscreen mode

Lambda IAM Role Policy:

data "aws_iam_policy_document" "clean_eni" {
  statement {
    sid = "1"
    actions = [
      "ec2:DeleteNetworkInterface",
      "ec2:DescribeNetworkInterfaces",
      "ec2:DescribeTags",
    ]
    effect    = "Allow"
    resources = ["*"]
  }
}
Enter fullscreen mode Exit fullscreen mode

EventBridge:

resource "aws_cloudwatch_event_rule" "clean_eni" {
  name                = "Clean-Eni-Lambda-Rule"
  description         = "Fires once everyday"
  schedule_expression = "rate(1 day)"
}

resource "aws_cloudwatch_event_target" "clean_eni" {
  rule = aws_cloudwatch_event_rule.clean_eni.name
  arn  = module.lambda_clean_eni.lambda_function_arn
}
Enter fullscreen mode Exit fullscreen mode

Cloudwatch Alarms:

module "alarm_lambda_clean_eni" {
  source  = "terraform-aws-modules/cloudwatch/aws//modules/metric-alarm"

  create_metric_alarm = true

  alarm_name                = "Lambda-clean-eni-error"
  alarm_description         = "Lambda error rate is too high"
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  insufficient_data_actions = []
  evaluation_periods        = 1
  threshold                 = 1
  alarm_actions             = [aws_sns_topic.alarm_error.arn]

  metric_query = [{
    id          = "1"
    return_data = true
    label       = "Error Count"
    metric = [{
      namespace   = "AWS/Lambda"
      metric_name = "Errors"
      period      = 60
      stat        = "Sum"
      unit        = "Count"
      dimensions = {
        FunctionName = module.lambda_clean_eni.lambda_function_name
      }
    }]
  }]
}
Enter fullscreen mode Exit fullscreen mode

SNS Topic and Subscription:

resource "aws_sns_topic" "alarm-error" {
  name = "alarm-error"
}

resource "aws_sns_topic_subscription" "alarm-error-sub" {
  topic_arn = aws_sns_topic.alarm-error.arn
  protocol = "email"
  endpoint = "your@email.com"  
}
Enter fullscreen mode Exit fullscreen mode

Summary: With this Solution, we are able to to successfully mitigate the problem of running out of IP addresses in our subnets.

Side Note: You can customize your filters however you like, so feel free to explore how to make them fit your environment!

Top comments (0)