Piotr Pabis for AWS Community Builders

Posted on Nov 10, 2024 • Originally published at pabis.eu

CloudFormation template generator with LLMs/GenAI

#bedrock #aws #infrastructureascode

Today, I will be guiding you through implementation of a script that creates a CloudFormation templated based on given instructions to a large language model. With this, I will also demonstrate how to connect ell library to AWS Bedrock. Moreover, we will provide the agent with tools to fetch most recent CloudFormation schema downloaded from AWS, so that all the capabilities and appropriate syntax are available when generating the template.

A completed project with more features such as changing the model and the API, adding example CloudFormation templates to prompt to keep styling conventions is available in my GitHub repository: https://github.com/ppabis/cf-generator-ai-agent

Environment setup

I assume that you have already Python installed along with venv so that you can create a new virtual environment. We will need the following packages. Put them into your requirements.txt file:

requests
pyyaml
fuzzywuzzy
python-Levenshtein # Optional
ell-ai
boto3

Install them inside activated virtual environment:

source venv/bin/activate
pip install -r requirements.txt

Creating a library of schemas

We will start by downloading and transforming CloudFormation schemas into local filesystem. That way we will be able to allow the AI agent to access them when needed and append it to the prompt sent to the LLM. The schemas are available at this address: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resource-type-schemas.html. But we will download the zip directly using Python. In a new file called update.py created the following function:

import requests, io, zipfile

def download_zip(region: str = "us-east-1"):
    """
    Downloads the zip from AWS with CloudFormation schemas for a given region.
    Returns a zip file that you can browse.
    """
    URL = f"https://schema.cloudformation.{region}.amazonaws.com/CloudformationSchema.zip"
    response = requests.get(URL)
    if response.status_code != 200:
        raise Exception(f"Failed to download zip from {URL}")

    content = response.content
    zip_buffer = io.BytesIO(content)
    return zipfile.ZipFile(zip_buffer)

It won't save the file into the filesystem, rather keep it in memory as we need to extract it either way. All the files inside this Zip archive are JSON files. We will clean them up, and save them to a directory called schemas as YAMLs. The first cleaning step is to remove unnecessary keys: handlers, propertyTransforms and sourceUrl. Next we will move the most important values to the top, namely typeName, description and some others (see the code below). Let's define a new function that will do the cleaning for us.

def cleanup_schema(schema: dict) -> dict:
    """
    Sort the schema by the specified keys and leave other keys in whatever order at the end.
    Also get rid of unnecessary keys.
    """
    # Remove some values
    for key in ['handlers', 'propertyTransforms', 'sourceUrl']:
        schema.pop(key, None)

    sorted_content = {}

    # Keep these values in this order
    for key in [
        'typeName', 
        'description', 
        'properties', 
        'definitions', 
        'createOnlyProperties', 
        'additionalProperties', 
        'readOnlyProperties', 
        'writeOnlyProperties'
    ]:
        value = schema.pop(key, None)
        if value:
            sorted_content[key] = value

    # additionalProperties might not be present so get rid of them
    if not sorted_content.get('additionalProperties', False):
        sorted_content.pop('additionalProperties', None)

    # Add everything else that was not covered above
    sorted_content.update(schema)

    return sorted_content

So now as we have this function defined we can use it to read all the schemas and clean them up. Create schemas directory in the same directory as you run the script and use the following function to clean all the schemas from the provided Zip file.

import yaml, json

def clean_all_schemas(zip):
    for name in zip.namelist():
        if name.endswith(".json"):
            with zip.open(name) as file:
                schema = json.load(file)
                cleaned = cleanup_schema(schema)
                with open(f"schemas/{name}.yaml", "w") as f:
                    yaml.dump(cleaned, f, default_flow_style=False, sort_keys=False, width=1000)

However, we can improve the schema even further. Some values properties of a resource in CloudFormation are complex. For example, in S3 the bucket name is simply a string but CORS configuration is a map with some properties.

  BucketName:
    description: "A name for the bucket. If you don't specify a name, AWS CloudFormation generates a unique ID and uses that ID for the bucket name. The bucket name must contain only lowercase letters, numbers, periods (.), and dashes (-) and must follow [Amazon S3 bucket restrictions and limitations](https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html). For more information, see [Rules for naming Amazon S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html#bucketnamingrules) in the *Amazon S3 User Guide*. \n  If you specify a name, you can't perform updates that require replacement of this resource. You can perform updates that require no or some interruption. If you need to replace the resource, specify a new name."
    type: string
  CorsConfiguration:
    $ref: '#/definitions/CorsConfiguration'
    description: Describes the cross-origin access configuration for objects in an Amazon S3 bucket. For more information, see [Enabling Cross-Origin Resource Sharing](https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html) in the *Amazon S3 User Guide*.

All the definitions are available in the tree under the referenced address. We can implement a function that will replace all references with actual objects so that the LLM has it all in one place and keep attention. Find the function for this transformation at the end of the post or check out the repository. It is quite complex and can divert us from the main topic. To get all the schemas, create a schemas directory and connect the two functions from above. You can use this script separately from the AI agent one.

...
zip = download_zip()
clean_all_schemas(zip)

A tool for resolving schemas

First, we will focus on creating a tool that will let our agent ensure that its knowledge about CloudFormation is up-to-date and prevent hallucinations of invalid resource parameters. I will again use ell library to interface with the APIs easily.

Let's define a class that will load the schemas from the filesystem and create a dictionary of typeName to file name. That way the agent can search by the actual CloudFormation resource rather than some other name. In the new file schema.py I will define the class SchemaIndex.

import os, yaml

class SchemaIndex:
    """
    A class to manage and retrieve schema definitions from YAML files.

    Attributes:
        type_name_dict (dict): A dictionary mapping lowercase type names to their corresponding YAML file names.
    """

    def __init__(self, directory='schemas'):
        self.type_name_dict = {}
        self.load_yaml_files(directory)
        self.directory = directory


    def load_yaml_files(self, directory):
        """Loads all the YAML schemas from directory and maps them to a lowercase representation of their typeName."""
        if os.path.exists(directory):
            # Filter only yaml files
            yaml_files = [file for file in os.listdir(directory) if file.endswith('.yaml') or file.endswith('.yml')]
            for yaml_file in yaml_files:
                with open(os.path.join(directory, yaml_file), 'r') as file:
                    try:
                        yaml_content = yaml.safe_load(file)
                        # Check if this YAML contains a typeName so if it is a CloudFormation schema
                        type_name = yaml_content.get('typeName')
                        if type_name:
                            self.type_name_dict[type_name.lower()] = yaml_file
                            # Also add the type name without the AWS:: prefix
                            self.type_name_dict[type_name.lower().replace("aws::", "")] = yaml_file
                        else:
                            print(f"No typeName found in {yaml_file}")
                    except yaml.YAMLError as e:
                        print(f"Error parsing {yaml_file}: {e}")

This class will load all the YAML files from schemas directory. The file for S3 bucket will be stored under aws::s3::bucket as well as s3::bucket. Now we can do two things: give the LLM list of all the keys first and let it choose or implement a fuzzy search. I chose the latter because it saves on tokens and API calls. We will use fuzzywuzzy library for this purpose. In SchemaIndex class I will create another function that will try to find the best match for the thing requested by the AI agent.

from fuzzywuzzy import process

class SchemaIndex:
    ...
    def _closest_key(self, type_name) -> str | None:
        """ Find the closest matching key in the type_name_dict. """
        # Exact match - return immediately
        if type_name.lower() in self.type_name_dict:
            return type_name.lower()
        # Find a match with score > 70 or just return None
        closest_match, score = process.extractOne(type_name.lower(), self.type_name_dict.keys())
        return closest_match if score > 70 else None


    def get(self, type_name) -> str:
        """ Look up a type name in the type_name_dict and load the corresponding YAML file. """
        closest_key = self._closest_key(type_name)

        if not closest_key:
            return f"No schema file found for {type_name}"

        yaml_file = self.type_name_dict[closest_key]
        try:
            with open(os.path.join(self.directory, yaml_file), 'r') as file:
                # Return the entire file as it is
                return file.read()
        except IOError as e:
            return f"Error opening {yaml_file}: {e}. Failed to get definition of {type_name}"

In the example above, the agent can use the get method in order to retrieve a schema from the filesystem. The key does not need to match exactly. For example in order to retrieve AWS::S3::Bucket schema, the agent can use s3 bucket or even bucket. Now we need to define an ell tool that will be formatted appropriately for the API. Define the tool function outside the SchemaIndex class. It will create a singleton of SchemaIndex on the first use. Remember that in this case, the comments and descriptions of the fields are very important as it is not only for the fellow programmer to understand but by the LLM as well.

import ell
from pydantic import Field

schemaIndex = None

@ell.tool()
def get_cloudformation_schema(
    type_name: str = Field(description="The type name to get the schema for, e.g. AWS::S3::Bucket but can also be fuzzy such as 's3 bucket'")
) -> str:
    """
    Get the CloudFormation schema for a given type name.
    The resulting schema is a YAML document.
    It contains all the properties and descriptions for a given CloudFormation resource type.
    """
    global schemaIndex
    schemaIndex = schemaIndex or SchemaIndex() # Initialize the singleton if it is not yet

    return schemaIndex.get(type_name)

You can test the schema index by using some example names you come up with and put them in __name__ == '__main__' block. Next you can run it using python schema.py. After testing, remove this block.

# Just for testing!
if __name__ == '__main__':
    schemaIndex = SchemaIndex()
    for test in ["AWS::S3::Bucket", "s3 bucket", "inexistent"]:
        schemaLines = schemaIndex.get(test).split("\n")
        print( "\n".join(schemaLines[:3]) ) # Print top 3 lines

Generator AI Agent

Now we can create a new AI agent that will generate new CloudFormation templates based on our instructions. What is more, this agent will be able to use the tool we created above to fetch schemas about resources. In the previous post we were using ell library with OpenAI API. This is the default behavior so there wouldn't be much to implement. However, today I will show you how to connect it to Amazon Bedrock - the process is almost as straightforward.

Let's start with a system prompt. What is a good prompt for such agent? Well, we need to tell the model what we are trying to do. What is more we should provide the output we want the model to produce. Here is an example of such prompt:

SYSTEM_PROMPT = """You are an AI assistant that generates CloudFormation templates based on given instructions.
                Moreover, you are provided with a tool that you can call in order to ensure what can be done
                with each resource in CloudFormation. Instead of hallucinating, you can verify that with the tool.
                Everything you generate should be a valid YAML document. Use comments to communicate with the user
                instead of leaving plain text around YAML code. As the last message where you verified everything
                with the tools RESPOND ONLY WITH THE YAML CODE WITHOUT MARKDOWN OR ANY OTHER TEXT. JUST THE YAML
                CODE AND COMMENTS INSIDE."""

I will define a class that will represent our agent. It will have a method for handling the loop needed for tool calls just as previously and an ell decorated method used as the user prompt. However, there's another problem. The decorator from ell is defined before the class is initialized. In order to dynamically change the value in the decorator, we can create a nested function and call it from the class method. In the new file agent.py:

import ell
from typing import List
from ell.types import ToolCall
from schema import get_cloudformation_schema

# From above
SYSTEM_PROMPT = """You are an AI assistant..."""

class GeneratorAgent:
    """
    This agent creates a new template based on provided instructions.
    The agent can reach out to the tool to get the most recent CloudFormation schema
    for validation.
    """

    def __init__(self, model: str):
        self.model = model

    def create_template(self, messages: List[ell.Message]):
        # Nested function with decorator
        @ell.complex( 
            model=self.model,
            tools=[ get_cloudformation_schema ],
            temperature=0.4
        )
        def _create_template(messages: List[ell.Message]) -> List[ell.Message]:
            return [ ell.system(SYSTEM_PROMPT) ] + messages

        return _create_template(messages)

    def generate(self, prompt: str):
        """Controls the loop of messages between the LLM, user and tools."""
        conversation = [ ell.user(prompt) ]
        response: ell.Message = self.create_template(conversation)

        max_iterations = 30
        while max_iterations > 0 and (response is ToolCall or response.tool_calls):
            tool_results = response.call_tools_and_collect_as_message()
            # Include what the user wanted, what the assistant requested to run and what the tool returned
            conversation = conversation + [response, tool_results]
            response = self.create_template(conversation)
            max_iterations -= 1

        if max_iterations <= 0:
            raise Exception("Too many iterations, probably stuck in a loop.")

        return response.text

Connecting to Bedrock

So, in order to use for example Claude on Bedrock we need to install boto3. To authenticate, you can use any valid AWS authentication mechanisms: credentials file, environment variables or even instance profile if you develop on an EC2 machine. The important part is that the following lines run before any ell decorator is seen by Python - so before any imports or definitions of these functions. I also recommend using us-west-2 region (Oregon) as it always has the most recent models as the first one. What is more, if you plan to use Ell Studio, you need a commit model. For this, I recommend Claude 3 Haiku or Llama 3.1 8B as it is cheap enough for this.

import boto3, ell
bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
# Define the commit model if you plan to use Ell Studio
# Otherwise just use: `ell.init(default_client=bedrock)`
ell.init(
    store="log",
    autocommit=True,
    autocommit_model="meta.llama3-1-8b-instruct-v1:0",
    default_client=bedrock
)

Of course, you need to also have access to the models you plan to use. Head to your AWS Console and request model access. You can learn about it more here. Afterwards, we can safely initialize the agent with a chosen model and try it out! I will use latest Claude 3.5 Sonnet v2 to create the templates.

from agent import GeneratorAgent
genAgent = GeneratorAgent("anthropic.claude-3-5-sonnet-20241022-v2:0")
template = genAgent.generate("Create a new S3 bucket named 'example-bucket-with-cors'. The bucket should have a CORS configuration allowing origin example.com")
print(template)

It should print a valid CloudFormation file. I recommend using Claude 3.5 Sonnet v2 for this task. I tried other models and they are not the best at following the instructions. Llama 3.1 doesn't use tools (or Ell does not support this combination yet), Haiku 3 does not follow prompt and adds unnecessary text and Haiku 3.5, surprisingly, totally breaks everything 🥴.

# Creates a basic S3 bucket with CORS configuration allowing example.com origin
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: example-bucket-with-cors
      # Configure CORS rule to allow requests from example.com
      CorsConfiguration:
        CorsRules:
          - AllowedOrigins:
              - "example.com"
            AllowedMethods:
              - GET
              - PUT
              - POST
              - DELETE
              - HEAD
            # Allow standard headers
            AllowedHeaders:
              - "*"
            # Set max age for preflight cache
            MaxAge: 3600

Bonus: Inlining the definitions in CloudFormation schemas

To improve the schemas even more you can recursively replace all the references to definitions so that all the arguments of the CloudFormation resource are close to each other. This will make it easier for the AI agent to follow along.

The referenced object can be of many types: a list, a map or unlikely just a simple type. What's more, each dict or list can contain a reference to another such object. So we will have to run the function recursively. In seen we keep track of the references that we have passed already so that we don't get into an infinite loop. Also we will check the length of this list to not reach absurd levels of nesting resulting in very long files.

def replace_ref(obj, seen: set):
    # Check if the object is a map
    if isinstance(obj, dict):
        # If this dict is simply a reference to a definition, find it in the schema
        # and copy it over. Also keep the $ref key to remember what was this about.
        if '$ref' in obj:
            ref = obj['$ref'].split('/')[-1]

            # If we are too deeply nested, just return the ref value to the type and skip
            if len(seen) > 4:
                print(f"Warning: references got deeper than 4 levels ({seen}) in {schema['typeName']}")
                return {"Ref": "::".join(seen)}

            if ref in schema.get('definitions', {}) and ref not in seen:
                seen.add(ref)
                inlined: dict = {'Ref': ref} # Keep the address in the $ref key
                definition = schema['definitions'][ref].copy()
                definition = replace_ref(definition, seen) # Replace recursively
                inlined.update(definition)
                seen.pop()
                return inlined
        # Otherwise, recursively run this function through the entire definition
        return { k: replace_ref(v, seen) for k, v in obj.items() }
    elif isinstance(obj, list):
        # Recursively run this if the definition is a list of referenced objects
        return [ replace_ref(item, seen) for item in obj ]
    else:
        return obj

We will define this function as a nested part of another function so that we can set schema as an argument and not repeat it every time. It is not ideal as the safeguarding mechanism is not too smart - some refs are still not replaced because the nesting is too deep (for example in AWS::WAFv2::WebACL).

def inline_definitions(schema: dict) -> dict:
    # Nested function
    def replace_ref(obj, seen: set):
        # Check if the object is a map
        if isinstance(obj, dict):
            # If this dict is simply a reference to a definition, find it in the schema
            # and copy it over. Also keep the $ref key to remember what was this about.
            if '$ref' in obj:
                ref = obj['$ref'].split('/')[-1]

                # If we are too deeply nested, just return the ref value to the type and skip
                if len(seen) > 4:
                    print(f"Warning: references got deeper than 4 levels ({seen}) in {schema['typeName']}")
                    return {"Ref": "::".join(seen)}

                if ref in schema.get('definitions', {}) and ref not in seen:
                    seen.add(ref)
                    inlined: dict = {'$ref': ref} # Keep the address in the $ref key
                    definition = schema['definitions'][ref].copy()
                    definition = replace_ref(definition, seen) # Replace recursively
                    inlined.update(definition)
                    seen.pop()
                    return inlined
            # Otherwise, recursively run this function through the entire definition
            return { k: replace_ref(v, seen) for k, v in obj.items() }
        elif isinstance(obj, list):
            # Recursively run this if the definition is a list of referenced objects
            return [ replace_ref(item, seen) for item in obj ]
        else:
            return obj
    # Body of inline_definitions
    inlined_schema = schema.copy()
    inlined_schema['properties'] = replace_ref( inlined_schema.get('properties', {}), set() )
    inlined_schema.pop('definitions', None)
    return inlined_schema

That way we can prepare the schema even further and make it more usable for the AI agent. The CORS configuration example after applying this function will look like this:

  CorsConfiguration:
    $ref: CorsConfiguration
    type: object
    additionalProperties: false
    properties:
      CorsRules:
        type: array
        uniqueItems: true
        insertionOrder: true
        items:
          $ref: CorsRule
          type: object
      ...
      description: Describes the cross-origin access configuration for objects in an Amazon S3 bucket. For more information, see [Enabling Cross-Origin Resource Sharing](https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html) in the *Amazon S3 User Guide*.

There are other transformations that you can still apply, such as removing links and shortening some values but I will give it to you as homework assignment.