Jakob Ondrey for AWS Community Builders

Posted on Mar 20, 2022

Lessons in AWS Python CDK: 2-Parameterizing Your Stack

#aws #python #beginners #tutorial

TLDR

I don't like using the cdk.context.json file. I use a config/<env>.yaml file that holds environment specific values and omegaconf to parse it.

Say you have dev and prod VPCs that already exist and you want to use the same stack to deploy to both.

Files:

config/dev.yaml

vpc:
  vpc_id: vpc-1234567890

config/prod.yaml

vpc:
  vpc_id: vpc-9876543210

In the stack:

from omegaconf import Omegaconf

deploy_env = "dev" # <-- This could/should be set by your CICD pipeline
conf = OmegaConf.load("config/{0}.yaml".format(deploy_env))

# load pre-existing vpc into variable
vpc = ec2.Vpc.from_lookup(self, conf.vpc.vpc_id, vpc_id=conf.vpc.vpc_id)

In typical yaml fashion everything is a dictionary or a list and you can, therefore, navigate and loop. (details below)

About Me

My name is Jakob and I am a DevOps Engineer. I used to be a lot of other things as well (Dish Washer, Retail Employee, Camp Counselor, Army Medic, Infectious Disease Researcher), but now I am a DevOps Engineer. I received no formal CS education but I'm not self taught, because I had thousands of instructors who taught me through their tutorials and blog posts. The culture of information sharing within the software engineering community is vital to everyone, especially those like me who didn't have other options. So, as I learn new things I will be documenting them through the eyes of someone learning for the first time, because those are the people most in need of a guide. Happy Learning! And don't be a stranger.

Note: I am NOT going to be sourcing and fact checking everything here. This is not an O'Reilly book. The language and descriptions are intended to allow beginners to understand. If you want to be pedantic about details or critique my framing of what something is or how it works feel free to write your own post about it. These posts are intended to be enough to get you started so that you can begin breaking things in new ways on your own!

The Problem

One thing that I found some difficulty with when I started using the AWS-CDK was how to handle deploying into multiple pre-existing environments. Of course, the CDK makes it easy to create a stand-alone stack with a new VPC, subnets, buckets, and certificates, but sometimes we have a pre-existing environment we need to deploy into. Or perhaps we want to use variations on the same stack to deploy into multiple environments, eg. smaller instances instances for a development environment.

In Terraform, we might pass in a dev.tfvars or prod.tfvars file. With the CDK you can use the recommended cdk.context.json file and pass context dependent parameters into the stack. But after scanning over the documentation for how to add values to the context file, I decided it was too annoying and I wanted a better way.

My Solution

I have settled on pairing a yaml parsing library called omegaconf to make my own .tfvars-like parameter file. Let me show you how it works.

The Setup

First, create yourself come configuration files. I made a folder at the root of the project called /config where I created yaml files named for each environment, eg. dev.yaml, uat.yaml, prod.yaml. Lets also add a couple things to dev.yaml

aws:
  account: "12345678910"
  region: us-east-1
env: Dev

Now, throw omegaconf into your requirements.txt file and pip install it. (you are using a virtual environment right?)

Import it into both the app.py file at the root of your project and any stack files you plan on using the parameters.

from omegaconf import Omegaconf

Implementation

Depending on how you are going to manage deploying to your environments, you are going to load a different config file.

For example, if you are just going to deploy from your local computer, you can set an variable the the deploy environment at the top of your file (so that it is visible and easy to change) and then use that variable to load your config.

 1   import aws_cdk as cdk
 2   from uber_for_cats.uber_for_cats_stack import UberForCatsStack
 3   from omegaconf import Omegaconf
 4   
 5   deploy_env = "dev"
 6   
 7   conf = OmegaConf.load("config/{0}.yaml".format(deploy_env))
 8   
 9   app = cdk.App()
10   UberForCatsStack(app, "UberForCats{0}".format(conf.env),
11       env=cdk.Environment(account=conf.aws.account, region=conf.aws.region),
12       )
13   
14   app.synth()

On lines 5-7 the environment is set to dev and the config/dev.yaml file is set to conf.

If you were using a CICD pipeline to automatically deploy, deploy_env could be set based on a pipeline variable and line 5 could look like this.

 5   deploy_env = os.getenv("DEPLOY_ENV")

But the result will be the same the values set in the config/dev.yaml file will be used and the file will be read as if the strings were there.

 1   import aws_cdk as cdk
 2   from uber_for_cats.uber_for_cats_stack import UberForCatsStack
 3   from omegaconf import Omegaconf
 4   
 5   deploy_env = "dev"
 6   
 7   conf = OmegaConf.load("config/dev.yaml")
 8   
 9   app = cdk.App()
10   UberForCatsStack(app, "UberForCatsDev",
11       env=cdk.Environment(account="12345678910", region="us-east-1"),
12       )
13   
14   app.synth()

This allows you to set other values in config/uat.yaml such as a different account, different sized instances/EBS volumes, autoscaling rules, etc. depending on what is required in each environment.

Something to Remember

Yaml is basically a nested dictionary that can also contain lists. When you see a - before something it is a list and therefore iterable. Take, for example, the following.

aws:
  account: "12345678910"
  region: us-east-1
vpc: 
  vpc_id: vpc-aabbccdd
  subnet:
    private:
    - subnet-65asdf651sadf65
    - subnet-c65as1df65f56sa
    - subnet-afas65df1a6sdf5

Those private subnet IDs are a list and you can iterate over them.

private_subnets = []
for i, subnet in enumerate(conf.vpc.subnet.private):
    private_subnets.append(
        ec2.Subnet.from_subnet_id(self, "pri{0}".format(i), subnet_id=subnet)
    )

This would create a list of subnet objects (ISubnet) that you can use for the placement of an autoscaling group or EKS cluster. Neat!

Congrats, Those are the basics! You should be able to get started.

Some Additional Tricks

While using a yaml file for configuration settings, I ran into some situations that might be worth sharing.

Sometimes strings require extra steps

I am using the CDK to make some EKS clusters. Part of this process is creating node-groups (think autoscaling groups for Kubernetes). I wanted to leverage some spot instances for a portion of our development cluster and part of the process of creating these nodegroups in CDK is specifying the compute size and class. Because this could be different between environments I put it in the config file.

node_group:
  spot:
    min: 1
    max: 5
    type:
    - i_class: BURSTABLE3
      i_size: LARGE
    - i_class: BURSTABLE3
      i_size: XLARGE
    - i_class: COMPUTE6_INTEL
      i_size: LARGE
    - i_class: COMPUTE6_INTEL
      i_size: XLARGE

But that won't work for setting instance types:

instance_type = ec2.InstanceType.of(
    ec2.InstanceClass.conf.node_group.spot.type[0].i_class,
    ec2.InstanceSize.conf.node_group.spot.type[0].i_size,
    )

And, honestly, it isn't very readable either. My workaround was making dictionaries of instance classes and sizes. then using the value in the yaml as the key the the appropriate class/size.

node_group:
  spot:
    min: 1
    max: 5
    type:
    - i_class: t3
      i_size: large
    - i_class: t3
      i_size: xl
    - i_class: c6i
      i_size: large
    - i_class: c6i
      i_size: xl

ec2_class = {
    "t3": ec2.InstanceClass.BURSTABLE3,  # max 2xl
    "c6i": ec2.InstanceClass.COMPUTE6_INTEL,  # min large
}

ec2_size = {
    "large": ec2.InstanceSize.LARGE,
    "xl": ec2.InstanceSize.XLARGE,
}

With the combination of the above, we can now use the string in the yaml as the key to pull in the ec2 objects in the format that the CDK requires. Below I make a list of the specified instance combinations and pass it into the EKS cluster as nodegroup capacity, but you could just as easily specify a list of classes and types and use a python library like itertools to make ALL of the combinations in one line. That might sacrifice readability though. So actually, don't do that. But you could...

spot_instance_types = []
for instance_type in node_group.spot.type:
    this_type = ec2.InstanceType.of(
        ec2_class[instance_type.i_class], 
        ec2_size[instance_type.i_size],
    )
    spot_instance_types.append(this_type)

cluster.add_nodegroup_capacity(
    "{0}-spot-nodegroup".format(conf.env),
    nodegroup_name="{0}-spot-ng".format(conf.env),
    capacity_type=eks.CapacityType.SPOT,
    min_size=node_group.spot.min,
    max_size=node_group.spot.max,
    instance_types=spot_instance_types, # <-- list of instance types
    disk_size=250,
    subnets=ec2.SubnetSelection(subnets=private_subnets), # <-- those subnets from before!
)

Booleans are Useful

So we are promoting this project out to production and someone doesn't think spot instances are a good idea even though you have diversified your spot pools. Throw a trigger into the config.

node_group:
  spot:
    enabled: False
    min: 1
    max: 5

Then you can run your node creation based on it!

if conf.node_group.spot.enabled:
    spot_instance_types = []
    for instance_type in node_group.spot.type:
        this_type = ec2.InstanceType.of(
            ec2_class[instance_type.i_class], 
            ec2_size[instance_type.i_size],
        )
        spot_instance_types.append(this_type)

    cluster.add_nodegroup_capacity(
        "{0}-spot-nodegroup".format(conf.env),
        nodegroup_name="{0}-spot-ng".format(conf.env),
        capacity_type=eks.CapacityType.SPOT,
        min_size=node_group.spot.min,
        max_size=node_group.spot.max,
        instance_types=spot_instance_types, # <-- list of instance types
        disk_size=250,
        subnets=ec2.SubnetSelection(subnets=private_subnets), # <-- those subnets from before!
    )

Beautiful!

DEV Community

Lessons in AWS Python CDK: 2-Parameterizing Your Stack

The Problem

My Solution

The Setup

Implementation

Something to Remember

Some Additional Tricks

Sometimes strings require extra steps

Booleans are Useful

Top comments (0)

Read next

Journey from 0 to DevRel

How to chat with Local LLM in Obsidian

Building Bedrock Agents for AWS Account Metadata and Cost Analysis

🚀⚙️ JavaScript Visualized: the JavaScript Engine