DEV Community

Cover image for AWS cost last resort, the killer lambda
Kodsama
Kodsama

Posted on

AWS cost last resort, the killer lambda

Having full control over your Amazon Web Services (AWS) costs isn't that easy.

This is the continuation from my previous article Controlling AWS Lambda Costs.

Now that you have set up alerts and limit possible overflows it is time to come to the last resort, starting to kill things.

Of course, one could always manually tear down services when the budget is going too high, but many of us would rather spend hours once to automate this instead by creating a killer lambda. 😄

XKDC automation

Here are some steps I would recommend

 

Setting up Billing Alarms

XKDC alarm

The first and foremost thing to do is to set up budget alerts for the whole AWS cost.

Billing Alarms are the essential tool for monitoring your AWS costs. It is an easy way to get notified if your monthly AWS bill is estimated to cross a set threshold.

The official documentation is great.

  1. Create a Budget:

    • Go to the AWS Management Console.
    • Navigate to the AWS Budgets dashboard.
    • Click on "Create a budget."
    • Follow the steps to create a budget. Set the budget amount to your desired limit.
  2. Configure Alerts:

    • Set up alert notifications for when your budget threshold is reached. You can choose to receive alerts via email or SNS (Simple Notification Service).
    • To use SNS, create an SNS topic if you don't have one, and add subscribers to the topic (e.g., your email address).

Now, when you reach the budget (or expected budget), you will get an email or be able to trigger things with the alert email using SNS. 😉

 

Using a killer lambda

XKDC macguyver

To prevent AWS API Gateway and AWS Lambda from being invoked when a specific budget is reached, we will combine multiple AWS tools:

The idea is this:

  1. The Billing Alert will send a message on SNS (see Setting up Billing Alarms).
  2. The SNS message will trigger a killer lambda.
  3. The killer lambda will store current parameters of each lambda and EC2 instances in a DynamoDB table to restore them later.
  4. The killer lambda will:
    • Limit the API Gateway rate
    • Change lambda parameters to prevent their invocation (be careful to not kill the recovery lambda!)
    • Stop EC2 instances
  5. At the beginning of each billing cycle, the recovery lambda will be triggered (via EventBridge Scheduler).
  6. The recovery lambda will read DynamoDB and restore the lambda parameters.

NOTE: We need to prevent the killer lambda from killing the recovery lambda. For this, we need to first create the recovery lambda to get its name in the whitelist and avoid it being killed.

 

General setup

Step 1: Set Up AWS Budgets and send alert to SNS

See Setting up Billing Alarms

Step 2: Create the DynamoDB database

We need to ensure that the DynamoDB has the following attributes:

  • Table Name: LambdaAndApiSettings
  • Primary Key / Partition Key: ResourceID (String)

You can create the table using the AWS Management Console:

  1. Go to the DynamoDB section.
  2. Click on "Create table".
  3. Set the table name to LambdaAndApiSettings.
  4. Add a primary key/Sort key with the name FunctionName and type String.
  5. Click "Create".

Step 3: (after creating lambdas) Give proper permissions in IAM

Ensure that the IAM role associated with your Lambda function has the necessary permissions. You need to attach a policy to the role that allows access to DynamoDB and Lambda APIs.

Required Permissions:

  • DynamoDB: AmazonDynamoDBFullAccess
  • Lambda: AWSLambda_FullAccess
  • EC2 instances: AmazonEC2FullAccess
  • API Gateway: AmazonAPIGatewayAdministrator

 

Recovery lambda

XKDC reset

To re-enable the services, you can set up a scheduled Lambda function that runs at the beginning of each budget period (e.g., monthly). This function will reset the throttling limits on the API Gateway stages and re-enable the Lambda functions.

Step 1: Create the Lambda Function

  • In the AWS Lambda console, create a new Lambda function.
  • Set the Lambda type to Python.
  • Use the following Python code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: Alexandre Martins (a.k.a Kodsama)
"""
This Lambda function restores the previous state of API Gateway stages, specified Lambda functions, 
and EC2 instances using settings saved in DynamoDB. It retrieves the saved settings from DynamoDB 
and re-applies them to the respective resources. The settings were initially saved by the disabling 
Lambda to ensure the ability to restore the original state.
"""

import boto3
import os
import logging
import json
from botocore.client import BaseClient
from botocore.exceptions import ClientError

# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize clients
api_client = boto3.client("apigateway")
lambda_client = boto3.client("lambda")
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("LambdaAndApiSettings")


def lambda_handler(event: dict, context: object):
    """
    The main handler function for the Lambda function. It restores the previous state of API Gateway stages,
    specified Lambda functions using settings saved in DynamoDB.

    Args:
        event (dict): The event data that triggered the Lambda function [UNUSED].
        context (object): The context object containing metadata about the Lambda invocation.

    Returns:
        dict: A dictionary containing the status code and message.
    """
    try:
        # Get environment variables
        rest_api_id = os.environ.get("API_GATEWAY_ID")
        self_name = context.function_name  # Get the name of the current Lambda function
        simulate = os.environ.get("SIMULATE", "false").lower() == "true"
        # Lambda
        restore_all_lambdas = os.environ.get("RESTORE_ALL_LAMBDAS", "false").lower() == "true"
        lambdas_restore_list = os.environ.get("LAMBDAS_TO_RESTORE", "").split(",")
        lambdas_restore_list = [] if lambdas_restore_list == [""] else lambdas_restore_list
        lambdas_blacklist = os.environ.get("BLACKLISTED_LAMBDAS", "").split(",")
        lambdas_blacklist = [] if lambdas_blacklist == [""] else lambdas_blacklist

        # Process API Gateway stages
        if rest_api_id:
            restore_api_gateway_stages(rest_api_id, api_client, simulate)

        # Process Lambda functions
        if restore_all_lambdas:
            restore_all_lambdas_function(lambda_client, self_name, lambdas_blacklist, simulate)
        elif lambdas_restore_list:
            restore_specified_lambdas(lambda_client, lambdas_restore_list, self_name, lambdas_blacklist, simulate)
        else:
            logger.warn("No Lambda functions specified for restoration. Skipping restoring Lambda functions.")

        logger.info(
            "All done! API Gateway stages, specified Lambda functions have been restored successfully."
        )
        return {"statusCode": 200, "body": "API Gateway, Lambda functions, and EC2 instances restoration complete"}
    except Exception as e:
        logger.error(f"Error occurred: {str(e)}", exc_info=True)
        return {
            "statusCode": 500,
            "body": "Error occurred while restoring API Gateway, Lambda functions, and EC2 instances",
        }


def restore_api_gateway_stages(api_client: BaseClient, rest_api_id: str, simulate: bool):
    """
    Restore the stages of an API Gateway using settings saved in DynamoDB.

    Args:
        api_client (BaseClient): The API Gateway client.
        rest_api_id (str): The ID of the API Gateway.
        simulate (bool): Whether to simulate restoring the API Gateway stages.

    Raises:
        Exception: If an error occurs while restoring the stages.
    """
    assert isinstance(api_client, BaseClient), f"api_client must be a boto3 client instance, not {type(api_client)}"
    assert (
        isinstance(rest_api_id, str) and rest_api_id
    ), f"rest_api_id must be a non-empty string, not {type(rest_api_id)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    if not rest_api_id:
        logger.info(f"No API Gateway ID provided. Skipping.")
        return

    try:
        logger.info(f"Restoring stages for API Gateway with ID: {rest_api_id}")
        stages = api_client.get_stages(restApiId=rest_api_id)
        for stage in stages["item"]:
            stage_name = stage["stageName"]
            logger.info(f"Restoring stage: {stage_name}")

            # Retrieve saved API Gateway stage settings from DynamoDB
            stage_settings = get_saved_settings(f"api-{rest_api_id}-{stage_name}")
            if not stage_settings:
                stage_settings = {
                    "burstLimit": os.environ.get("DEFAULT_BURST_LIMIT", "1000"),
                    "rateLimit": os.environ.get("DEFAULT_RATE_LIMIT", "500"),
                }

            if simulate:
                logger.info(f"Simulation mode: API Gateway stage {stage_name} would be restored.")
                continue

            # Restore API Gateway stage
            api_client.update_stage(
                restApiId=rest_api_id,
                stageName=stage_name,
                patchOperations=[
                    {
                        "op": "replace",
                        "path": "/*/*/throttling/burstLimit",
                        "value": str(stage_settings["burstLimit"]),
                    },
                    {"op": "replace", "path": "/*/*/throttling/rateLimit", "value": str(stage_settings["rateLimit"])},
                ],
            )
            logger.info(f"Stage {stage_name} restored.")
    except api_client.exceptions.ClientError as e:
        logger.error(f"Failed to restore API Gateway stages: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error(
                "Access denied. Ensure the IAM role has the following permissions: AmazonAPIGatewayAdministrator."
            )
        raise


def restore_all_lambdas_function(
    lambda_client: BaseClient,
    self_name: str,
    lambdas_blacklist: list,
    simulate: bool = True,
):
    """
    Restore all Lambda functions using settings saved in DynamoDB.

    Args:
        lambda_client (BaseClient): The Lambda client.
        self_name (str): The name of the current Lambda function.
        lambdas_blacklist (list): A list of blacklisted Lambda function names.
        simulate (bool): Whether to simulate restoring the Lambda functions.

    Raises:
        Exception: If an error occurs while restoring the Lambda functions.
    """
    assert isinstance(
        lambda_client, BaseClient
    ), f"lambda_client must be a boto3 client instance, not {type(lambda_client)}"
    assert isinstance(self_name, str) and self_name, f"self_name must be a non-empty string, not {type(self_name)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    logger.info("Restoring all Lambda functions in the account.")

    paginator = lambda_client.get_paginator("list_functions")
    all_lambdas_restore_list = []
    for page in paginator.paginate():
        for function in page["Functions"]:
            all_lambdas_restore_list.append(function["FunctionName"])

    restore_specified_lambdas(lambda_client, all_lambdas_restore_list, self_name, lambdas_blacklist, simulate)


def restore_specified_lambdas(
    lambda_client: BaseClient,
    lambdas_restore_list: list,
    self_name: str,
    lambdas_blacklist: list,
    simulate: bool = True,
):
    assert isinstance(lambda_client, BaseClient), f"lambda_client must be a boto3 client instance, not {type(lambda_client)}"
    assert isinstance(lambdas_restore_list, list), f"lambdas_restore_list must be a list, not {type(lambdas_restore_list)}"
    assert isinstance(self_name, str) and self_name, f"self_name must be a non-empty string, not {type(self_name)}"
    assert isinstance(lambdas_blacklist, list), f"lambdas_blacklist must be a list, not {type(lambdas_blacklist)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    logger.info(f"Restore Lambdas: {lambdas_restore_list}")

    if not lambdas_restore_list:
        logger.warn(f"Attempting to restore an empty list of Lambdas")
        return

    for function_name in lambdas_restore_list:
        if function_name == self_name or function_name in lambdas_blacklist:
            logger.info(f"Skipping blacklisted or current Lambda function: {function_name}")
            continue

        try:
            saved_settings = get_saved_settings(function_name)
            if not saved_settings:
                saved_settings = {"Concurrency": os.environ.get("DEFAULT_CONCURRENCY", None), "Permissions": []}

            if simulate:
                logger.info(f"Simulation mode: Lambda function {function_name} would be restored.")
                continue

            if saved_settings["Concurrency"] is not None:
                lambda_client.put_function_concurrency(
                    FunctionName=function_name, ReservedConcurrentExecutions=saved_settings["Concurrency"]
                )
                logger.info(f"Restored concurrency for Lambda function: {function_name}")

            for permission in saved_settings["Permissions"]:
                principal = permission["Principal"]
                if isinstance(principal, dict) and "Service" in principal:
                    principal = principal["Service"]

                try:
                    lambda_client.add_permission(
                        FunctionName=function_name,
                        StatementId=permission["Sid"],
                        Action=permission["Action"],
                        Principal=principal,
                    )
                    logger.info(f"Restored permission {permission['Sid']} for Lambda function: {function_name}")
                except ClientError as e:
                    if e.response['Error']['Code'] == 'ResourceConflictException':
                        logger.warning(f"Permission with StatementId {permission['Sid']} already exists for Lambda function: {function_name}. Skipping...")
                    else:
                        raise e

        except ClientError as e:
            logger.error(f"Failed to restore Lambda function {function_name}: {e}")
            if e.response["Error"]["Code"] == "AccessDeniedException":
                logger.error(
                    f"Access denied. Ensure the IAM role has the following permissions: lambda:PutFunctionConcurrency, lambda:AddPermission."
                )
            raise


def get_saved_settings(resource_id: str) -> dict:
    """
    Retrieve settings from DynamoDB. If no data is found, return the default settings.

    Args:
        resource_id (str): The unique identifier for the resource.

    Returns:
        dict: The retrieved settings or empty dict if non existing.
    """
    try:
        response = table.get_item(Key={"ResourceID": resource_id})
        if "Item" in response:
            logger.debug(f"Found settings for {resource_id}")
            return response["Item"]["Settings"]
        else:
            logger.warn(f"No saved settings found for {resource_id}, using default settings.")
            return {}
    except dynamodb.meta.client.exceptions.ClientError as e:
        logger.error(f"Failed to retrieve settings from DynamoDB for {resource_id}: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error("Access denied. Ensure the IAM role has the following permissions: AmazonDynamoDBFullAccess.")
        raise
    except Exception as e:
        logger.error(f"Error retrieving saved data for {resource_id}: {e}")
        raise
Enter fullscreen mode Exit fullscreen mode

Environment Variables

  • API_GATEWAY_ID: The ID of the API Gateway whose stages are to be restored.
  • SIMULATE: When set to true, the lambda will simulate actions without making actual changes.
  • RESTORE_ALL_LAMBDAS: When set to true, all Lambda functions will be restored, except those in the blacklist.
  • LAMBDAS_TO_RESTORE: A comma-separated list of specific Lambda function names to restore.
  • BLACKLISTED_LAMBDAS: A comma-separated list of Lambda function names that should not be restored.
  • RESTORE_ALL_EC2_INSTANCES: When set to true, all EC2 instances will be restored, except those in the blacklist.
  • EC2_TO_RESTORE: A comma-separated list of specific EC2 instance IDs to restore.
  • BLACKLISTED_EC2_INSTANCES: A comma-separated list of EC2 instance IDs that should not be restored.
  • DEFAULT_BURST_LIMIT: The default burst limit for API Gateway throttling, used if no data is found in DynamoDB.
  • DEFAULT_RATE_LIMIT: The default rate limit for API Gateway throttling, used if no data is found in DynamoDB.
  • DEFAULT_CONCURRENCY: The default Lambda concurrency value, used if no data is found in DynamoDB.

Step 2: Give the Lambda function full access in IAM

See General setup

Step 3: Create a CloudWatch Event Rule

  1. Go to the EventBridge Scheduler.
  2. Navigate to "EventBridge Schedule".
  3. Click "Create Schedule".
  4. Choose a schedule name and description.
  5. Set to recurring schedule and CRON based schedule. 6.

If your budget period is monthly, you can use a cron expression like cron(0 0 1 * ? *) to run the function at midnight on the first day of each month.

  1. Select no flexible time window and click Next.
  2. Click on Invoke an AWS Lambda and select the recover lambda as the Target.
  3. Ensure the Lambda function has the necessary permissions to be invoked by the CloudWatch Events rule. Click on Next.
  4. Review the schedule and create it!

 

Killer lambda

XKDC boom

Step 1: Create a Lambda Function

  • In the AWS Lambda console, create a new Lambda function.
  • Set the Lambda type to Python.
  • Use the following Python code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: Alexandre Martins (a.k.a Kodsama)
"""
This Lambda disables API Gateway stages, specified Lambda functions, and stops EC2 instances, saving their settings to DynamoDB. 
It processes environment variables to determine which resources to disable and whether to simulate the action. 
The settings of the resources are saved to DynamoDB before any changes are made, ensuring that current states are preserved.
"""

import boto3
import os
import logging
import json
from botocore.client import BaseClient

# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize clients
api_client = boto3.client("apigateway")
lambda_client = boto3.client("lambda")
ec2_client = boto3.client("ec2")
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("LambdaAndApiSettings")


def lambda_handler(event: dict, context: object):
    """
    The main handler function for the Lambda function. It disables API Gateway stages, specified
    Lambda functions, and stops EC2 instances, saving their settings to DynamoDB.

    Args:
        event (dict): The event data that triggered the Lambda function [UNUSED].
        context (object): The context object containing metadata about the Lambda invocation.

    Returns:
        dict: A dictionary containing the status code and message.
    """
    try:
        # Get environment variables
        rest_api_id = os.environ.get("API_GATEWAY_ID")
        self_name = context.function_name  # Get the name of the current Lambda function
        simulate = os.environ.get("SIMULATE", "false").lower() == "true"
        # Lambda
        lambdas_disable_all = os.environ.get("DISABLE_ALL_LAMBDAS", "false").lower() == "true"
        lambdas_disable_list = os.environ.get("LAMBDAS_TO_DISABLE", "").split(",")
        lambdas_disable_blacklist = os.environ.get("LAMBDAS_TO_NOT_DISABLE", "").split(",")
        lambdas_disable_list = [] if lambdas_disable_list == [""] else lambdas_disable_list
        lambdas_disable_blacklist = [] if lambdas_disable_blacklist == [""] else lambdas_disable_blacklist
        # EC2
        ec2_disable_all = os.environ.get("DISABLE_ALL_EC2_INSTANCES", "false").lower() == "true"
        ec2_disable_list = os.environ.get("EC2_TO_DISABLE", "").split(",")
        ec2_disable_blacklist = os.environ.get("EC2_TO_NOT_DISABLE", "").split(",")
        ec2_disable_list = [] if ec2_disable_list == [""] else ec2_disable_list
        ec2_disable_blacklist = [] if ec2_disable_blacklist == [""] else ec2_disable_blacklist

        # Process API Gateway stages
        if rest_api_id:
            disable_api_gateway_stages(rest_api_id, api_client, simulate)

        # Process Lambda functions
        if lambdas_disable_all:
            disable_all_lambdas(lambda_client, self_name, lambdas_disable_blacklist, simulate)
        elif lambdas_disable_list:
            disable_specified_lambdas(
                lambda_client, lambdas_disable_list, self_name, lambdas_disable_blacklist, simulate
            )
        else:
            logger.warn(
                "No Lambda functions specified and DISABLE_ALL_LAMBDAS is not true. Skipping disabling Lambda functions."
            )

        # Process EC2 instances
        if ec2_disable_all:
            disable_all_ec2(ec2_client, simulate)
        elif ec2_disable_list:
            disable_specified_ec2(ec2_client, ec2_disable_list, ec2_disable_blacklist, simulate)
        else:
            logger.warn(
                "No EC2 instances specified and DISABLE_ALL_EC2_INSTANCES is not true. Skipping disabling EC2 instances."
            )

        logger.info(
            "All done! API Gateway stages, specified Lambda functions, and EC2 instances have been processed successfully."
        )
        return {"statusCode": 200, "body": "API Gateway, Lambda functions, and EC2 instances processing complete"}
    except Exception as e:
        logger.error(f"Error occurred: {str(e)}", exc_info=True)
        return {
            "statusCode": 500,
            "body": "Error occurred while processing API Gateway, Lambda functions, and EC2 instances",
        }


def disable_api_gateway_stages(api_client: BaseClient, rest_api_id: str, simulate: bool):
    """
    Disable the stages of an API Gateway and save their current settings to DynamoDB.

    Args:
        api_client (BaseClient): The API Gateway client.
        rest_api_id (str): The ID of the API Gateway.
        simulate (bool): Whether to simulate disabling the API Gateway stages.

    Raises:
        Exception: If an error occurs while disabling the stages.
    """
    assert isinstance(api_client, BaseClient), f"api_client must be a boto3 client instance, not {type(api_client)}"
    assert (
        isinstance(rest_api_id, str) and rest_api_id
    ), f"rest_api_id must be a non-empty string, not {type(rest_api_id)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    if not rest_api_id:
        logger.info(f"No API Gateway ID provided. Skipping.")
        return

    try:
        logger.info(f"Disabling stages for API Gateway with ID: {rest_api_id}")
        stages = api_client.get_stages(restApiId=rest_api_id)
        for stage in stages["item"]:
            stage_name = stage["stageName"]
            logger.info(f"Disabling stage: {stage_name}")

            # Save current API Gateway stage settings to DynamoDB
            stage_settings = {
                "burstLimit": stage.get("methodSettings", {}).get("/*/*/throttling/burstLimit", "default"),
                "rateLimit": stage.get("methodSettings", {}).get("/*/*/throttling/rateLimit", "default"),
            }
            save_to_dynamodb(f"api-{rest_api_id}-{stage_name}", stage_settings)

            if simulate:
                logger.info(f"Simulation mode: API Gateway stage {stage_name} would be disabled.")
                continue

            # Disable API Gateway stage
            api_client.update_stage(
                restApiId=rest_api_id,
                stageName=stage_name,
                patchOperations=[
                    {"op": "replace", "path": "/*/*/throttling/burstLimit", "value": "0"},
                    {"op": "replace", "path": "/*/*/throttling/rateLimit", "value": "0"},
                ],
            )
            logger.info(f"Stage {stage_name} disabled.")
    except api_client.exceptions.ClientError as e:
        logger.error(f"Failed to disable API Gateway stages: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error(
                "Access denied. Ensure the IAM role has the following permissions: AmazonAPIGatewayAdministrator."
            )
        raise


def disable_all_lambdas(
    lambda_client: BaseClient, self_name: str, lambdas_disable_blacklist: list, simulate: bool = True
):
    """
    Disable all Lambda functions except the current one and blacklisted ones, and save their settings to DynamoDB.

    Args:
        lambda_client (BaseClient): The Lambda client.
        self_name (str): The name of the current Lambda function.
        lambdas_disable_blacklist (list): A list of blacklisted Lambda function names.
        simulate (bool): Whether to simulate disabling the Lambda functions.

    Raises:
        Exception: If an error occurs while disabling the Lambda functions.
    """
    assert isinstance(
        lambda_client, BaseClient
    ), f"lambda_client must be a boto3 client instance, not {type(lambda_client)}"
    assert isinstance(self_name, str) and self_name, f"self_name must be a non-empty string, not {type(self_name)}"
    assert isinstance(
        lambdas_disable_blacklist, list
    ), f"lambdas_disable_blacklist must be a list, not {type(lambdas_disable_blacklist)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    logger.info("Disabling all Lambda functions in the account.")

    paginator = lambda_client.get_paginator("list_functions")
    all_lambdas_disable_list = []
    for page in paginator.paginate():
        for function in page["Functions"]:
            all_lambdas_disable_list.append(function["FunctionName"])

    disable_specified_lambdas(lambda_client, all_lambdas_disable_list, self_name, lambdas_disable_blacklist, simulate)


def disable_specified_lambdas(
    lambda_client: BaseClient,
    lambdas_disable_list: list,
    self_name: str,
    lambdas_disable_blacklist: list,
    simulate: bool = True,
):
    """
    Disable specified Lambda functions except the current one and blacklisted ones, and save their settings to DynamoDB.

    Args:
        lambda_client (BaseClient): The Lambda client.
        lambdas_disable_list (list): A list of Lambda function names.
        self_name (str): The name of the current Lambda function.
        lambdas_disable_blacklist (list): A list of blacklisted Lambda function names.
        simulate (bool): Whether to simulate disabling the Lambda functions.

    Raises:
        Exception: If an error occurs while disabling the Lambda functions.
    """
    assert isinstance(
        lambda_client, BaseClient
    ), f"lambda_client must be a boto3 client instance, not {type(lambda_client)}"
    assert isinstance(
        lambdas_disable_list, list
    ), f"lambdas_disable_list must be a list, not {type(lambdas_disable_list)}"
    assert isinstance(self_name, str) and self_name, f"self_name must be a non-empty string, not {type(self_name)}"
    assert isinstance(
        lambdas_disable_blacklist, list
    ), f"lambdas_disable_blacklist must be a list, not {type(lambdas_disable_blacklist)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    logger.info(f"Disable Lambdas: {lambdas_disable_list}")

    if not lambdas_disable_list:
        logger.warn(f"Attempting to disable an empty list of Lambdas")
        return

    for function_name in lambdas_disable_list:
        if function_name == self_name or function_name in lambdas_disable_blacklist:
            logger.info(f"Skipping blacklisted or current Lambda function: {function_name}")
            continue

        try:
            # Save current Lambda settings to DynamoDB
            save_lambda_settings(lambda_client, function_name)

            if simulate:
                logger.info(f"Simulation mode: Lambda function {function_name} would be disabled.")
                continue

            # Disable concurrency
            logger.info(f"Disabling Lambda function: {function_name}")
            lambda_client.put_function_concurrency(FunctionName=function_name, ReservedConcurrentExecutions=0)
            logger.info(f"Lambda function {function_name} disabled.")

        except lambda_client.exceptions.ClientError as e:
            logger.error(f"Failed to disable Lambda function {function_name}: {e}")
            if e.response["Error"]["Code"] == "AccessDeniedException":
                logger.error(
                    f"Access denied. Ensure the IAM role has the following permissions: lambda:PutFunctionConcurrency."
                )
            raise


def disable_all_ec2(ec2_client: BaseClient, simulate: bool = True):
    """
    Stop all EC2 instances and save their settings to DynamoDB.

    Args:
        ec2_client (BaseClient): The EC2 client.
        simulate (bool): Whether to simulate stopping the EC2 instances.

    Raises:
        Exception: If an error occurs while stopping the EC2 instances.
    """
    assert isinstance(ec2_client, BaseClient), f"ec2_client must be a boto3 client instance, not {type(ec2_client)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    try:
        logger.info("Stopping all EC2 instances in the account.")
        response = ec2_client.describe_instances()
        instances = [
            instance["InstanceId"] for reservation in response["Reservations"] for instance in reservation["Instances"]
        ]

        disable_specified_ec2(ec2_client, instances, [], simulate)

    except ec2_client.exceptions.ClientError as e:
        logger.error(f"Failed to stop EC2 instances: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error("Access denied. Ensure the IAM role has the following permissions: AmazonEC2FullAccess.")
        raise


def disable_specified_ec2(
    ec2_client: BaseClient, ec2_disable_list: list, ec2_disable_blacklist: list, simulate: bool = True
):
    """
    Stop specified EC2 instances except blacklisted ones, and save their settings to DynamoDB.

    Args:
        ec2_client (BaseClient): The EC2 client.
        ec2_disable_list (list): A list of EC2 instance IDs to stop.
        ec2_disable_blacklist (list): A list of blacklisted EC2 instance IDs.
        simulate (bool): Whether to simulate stopping the EC2 instances.

    Raises:
        Exception: If an error occurs while stopping the EC2 instances.
    """
    assert isinstance(ec2_client, BaseClient), f"ec2_client must be a boto3 client instance, not {type(ec2_client)}"
    assert isinstance(ec2_disable_list, list), f"ec2_disable_list must be a list, not {type(ec2_disable_list)}"
    assert isinstance(
        ec2_disable_blacklist, list
    ), f"ec2_disable_blacklist must be a list, not {type(ec2_disable_blacklist)}"
    assert isinstance(simulate, bool), f"simulate must be a boolean, not {type(simulate)}"

    logger.info(f"Disable EC2 instances: {ec2_disable_list} (exempt: {ec2_disable_blacklist})")

    for instance_id in ec2_disable_list:
        if instance_id in ec2_disable_blacklist:
            logger.info(f"Skipping blacklisted EC2 instance: {instance_id}")
            continue

        try:
            # Save current EC2 settings to DynamoDB
            save_ec2_settings(ec2_client, instance_id)

            if simulate:
                logger.info(f"Simulation mode: EC2 instance {instance_id} would be stopped.")
                continue

            # Stop the EC2 instance
            logger.info(f"Stopping EC2 instance: {instance_id}")
            ec2_client.stop_instances(InstanceIds=[instance_id])
            logger.info(f"EC2 instance {instance_id} stopped.")

        except ec2_client.exceptions.ClientError as e:
            logger.error(f"Failed to stop EC2 instance {instance_id}: {e}")
            if e.response["Error"]["Code"] == "AccessDeniedException":
                logger.error("Access denied. Ensure the IAM role has the following permissions: AmazonEC2FullAccess.")
            raise


def save_lambda_settings(lambda_client: BaseClient, function_name: str):
    """
    Save the current settings of a Lambda function to DynamoDB.

    Args:
        lambda_client (BaseClient): The Lambda client.
        function_name (str): The name of the Lambda function.

    Raises:
        Exception: If an error occurs while retrieving or saving the settings.
    """
    logger.info(f"Saving settings for Lambda '{function_name}'")
    assert isinstance(
        lambda_client, BaseClient
    ), f"lambda_client must be a boto3 client instance, not {type(lambda_client)}"
    assert (
        isinstance(function_name, str) and function_name
    ), f"function_name must be a non-empty string, not {type(function_name)}"

    try:
        # Get current concurrency setting
        concurrency_response = lambda_client.get_function_concurrency(FunctionName=function_name)
        concurrency = concurrency_response.get("ReservedConcurrentExecutions", None)

        # Get current permissions
        permissions_response = lambda_client.get_policy(FunctionName=function_name)
        permissions = json.loads(permissions_response["Policy"])["Statement"]

        # Save settings to DynamoDB
        save_to_dynamodb(function_name, {"Concurrency": concurrency, "Permissions": permissions})
    except lambda_client.exceptions.ResourceNotFoundException:
        logger.info(f"No existing settings for Lambda function: {function_name}")
    except lambda_client.exceptions.ClientError as e:
        logger.error(f"Failed to retrieve settings for Lambda function {function_name}: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error("Access denied. Ensure the IAM role has the following permission: AWSLambda_FullAccess.")
        raise


def save_ec2_settings(ec2_client: BaseClient, instance_id: str):
    """
    Save the current settings of an EC2 instance to DynamoDB.

    Args:
        ec2_client (BaseClient): The EC2 client.
        instance_id (str): The ID of the EC2 instance.

    Raises:
        Exception: If an error occurs while retrieving or saving the settings.
    """
    logger.info(f"Saving settings for EC2 instance {instance_id}")
    assert isinstance(ec2_client, BaseClient), f"ec2_client must be a boto3 client instance, not {type(ec2_client)}"
    assert (
        isinstance(instance_id, str) and instance_id
    ), f"instance_id must be a non-empty string, not {type(instance_id)}"

    try:
        # Get current instance details
        response = ec2_client.describe_instances(InstanceIds=[instance_id])
        instance = response["Reservations"][0]["Instances"][0]
        instance_settings = {
            "InstanceType": instance.get("InstanceType"),
            "KeyName": instance.get("KeyName"),
            "State": instance.get("State", {}).get("Name"),
            "PreviousState": instance.get("State", {}).get("Name"),
        }

        # Save settings to DynamoDB
        save_to_dynamodb(instance_id, instance_settings)
    except ec2_client.exceptions.ClientError as e:
        logger.error(f"Failed to retrieve settings for EC2 instance {instance_id}: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error("Access denied. Ensure the IAM role has the following permissions: AmazonEC2FullAccess.")
        raise


def save_to_dynamodb(resource_id: str, settings: dict):
    """
    Save settings to DynamoDB and verify the saved

 data.

    Args:
        resource_id (str): The unique identifier for the resource.
        settings (dict): The settings to save.

    Raises:
        Exception: If an error occurs while saving to DynamoDB or the saved data doesn't match.
    """
    logger.info(f"Saving settings for {resource_id} to DynamoDB")
    assert (
        isinstance(resource_id, str) and resource_id
    ), f"resource_id must be a non-empty string, not {type(resource_id)}"
    assert isinstance(settings, dict), f"settings must be a dictionary, not {type(settings)}"

    try:
        table.put_item(Item={"ResourceID": resource_id, "Settings": settings})
        logger.info(f"Settings for {resource_id} saved")

        # Verify the saved data
        response = table.get_item(Key={"ResourceID": resource_id})
        if "Item" not in response or response["Item"]["Settings"] != settings:
            raise Exception(f"Verification failed: Data for {resource_id} does not match.")

        logger.info(f'Verified settings for {resource_id}: {response["Item"]}')
    except dynamodb.meta.client.exceptions.ClientError as e:
        logger.error(f"Failed to save settings to DynamoDB for {resource_id}: {e}")
        if e.response["Error"]["Code"] == "AccessDeniedException":
            logger.error("Access denied. Ensure the IAM role has the following permissions: AmazonDynamoDBFullAccess.")
        raise
    except Exception as e:
        logger.error(f"Error verifying saved data for {resource_id}: {e}")
        raise

Enter fullscreen mode Exit fullscreen mode

Environment Variables

  • API_GATEWAY_ID: The ID of the API Gateway whose stages are to be disabled.
  • SIMULATE: When set to true, the lambda will simulate actions without making actual changes.
  • DISABLE_ALL_LAMBDAS: When set to true, all Lambda functions will be disabled, except those in the blacklist.
  • LAMBDAS_TO_DISABLE: A comma-separated list of specific Lambda function names to disable.
  • LAMBDAS_TO_NOT_DISABLE: A comma-separated list of Lambda function names that should not be disabled.
  • DISABLE_ALL_EC2_INSTANCES: When set to true, all EC2 instances will be disabled, except those in the blacklist.
  • EC2_TO_DISABLE: A comma-separated list of specific EC2 instance IDs to disable.
  • EC2_TO_NOT_DISABLE: A comma-separated list of EC2 instance IDs that should not be disabled.
  • BLACKLISTED_LAMBDAS: A comma-separated list of Lambda function names that should not be disabled.
  • BLACKLISTED_EC2_INSTANCES: A comma-separated list of EC2 instance IDs that should not be disabled.

Step 2: Give the Lambda function full access in IAM

See General setup

Step 3: Blacklist the restore Lambda

Don't forget to add the restore lambda name in the BLACKLISTED_LAMBDA_FUNCTIONS, otherwise it will be killed as well.

Step 4: Set Up SNS to Trigger the Lambda Function

  1. Find the SNS topic you created in the Billing Alarm stage, in the SNS console.

  2. Subscribe the Lambda Function to the SNS Topic:

    • Add a subscription to the SNS topic.
    • Choose the protocol as "AWS Lambda" and select your Lambda function.

 

Closing Thoughts

Here you go! By following these steps, you can ensure that your AWS API Gateway and Lambda functions are disabled when you reach a specific budget threshold, preventing further costs and limiting the possible cost overrun. This ensures that your services are only disabled when the budget is reached and are automatically restored at the beginning of the next period, keeping cost in check by killing your product when you spent too much on it. 💰💤

Top comments (0)