DEV Community

loading...
Cover image for Create an AWS ECS Cluster Using Terraform

Create an AWS ECS Cluster Using Terraform

Tacio Nery
Backend Software Engineer with 10 years of experience and passion in solving problems by using algorithms.
・7 min read

Hey everyone, I'd like to share my experience with Terraform and AWS. In this post I'll describe the resources I used to build a infrastructure on AWS and deploy a NodeJS application on it.

Resources

The application I needed to deploy is a monolithic NodeJS application, so, to deploy and make it scalable I decided to use containers with an autoscaling tool to scale the application based on CPU and Memory usage. To build this environment on AWS I used the services listed below:

  1. VPC and Networking (Subnets, Internet Groups...)
  2. Elastic Container Registry
  3. Elastic Container Service
  4. Application Load Balancer
  5. Auto Scaling
  6. Cloud Watch

Terraform Initial Configuration

The Terraform configuration I used was quite simple. The first step is create a Bucket on AWS S3 to store the Terraform State. It's not required but, it'll make our life easier if someone else needs to maintain this infrastructure. This is the main.tf file with this configuration.

# main.tf | Main Configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "2.70.0"
    }
  }

  backend "s3" {
    bucket = "terraform-state-bucket"
    key    = "state/terraform_state.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region     = var.aws_region
  access_key = var.aws_access_key
  secret_key = var.aws_secret_key
}
Enter fullscreen mode Exit fullscreen mode

The provider section is using some variables. We can define variables in a tfvars. I'll explain it later in this post.

UPDATE: With this initial configuration, just run terraform init.

VPC and Networking

Let's create a VPC and configure some Networking resources we're gonna use further. The sample code bellow will create a VPC.

# vpc.tf | VPC Configuration

resource "aws_vpc" "aws-vpc" {
  cidr_block           = "10.10.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = {
    Name        = "${var.app_name}-vpc"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

For Networking, it is necessary to create Public and Private Subnets within the VPC, also a Internet Gateway and Route Tables for Public Subnets. The sample bellow will create these resources

# networking.tf | Network Configuration

resource "aws_internet_gateway" "aws-igw" {
  vpc_id = aws_vpc.aws-vpc.id
  tags = {
    Name        = "${var.app_name}-igw"
    Environment = var.app_environment
  }

}

resource "aws_subnet" "private" {
  vpc_id            = aws_vpc.aws-vpc.id
  count             = length(var.private_subnets)
  cidr_block        = element(var.private_subnets, count.index)
  availability_zone = element(var.availability_zones, count.index)

  tags = {
    Name        = "${var.app_name}-private-subnet-${count.index + 1}"
    Environment = var.app_environment
  }
}

resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.aws-vpc.id
  cidr_block              = element(var.public_subnets, count.index)
  availability_zone       = element(var.availability_zones, count.index)
  count                   = length(var.public_subnets)
  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.app_name}-public-subnet-${count.index + 1}"
    Environment = var.app_environment
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.aws-vpc.id

  tags = {
    Name        = "${var.app_name}-routing-table-public"
    Environment = var.app_environment
  }
}

resource "aws_route" "public" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.aws-igw.id
}

resource "aws_route_table_association" "public" {
  count          = length(var.public_subnets)
  subnet_id      = element(aws_subnet.public.*.id, count.index)
  route_table_id = aws_route_table.public.id
}
Enter fullscreen mode Exit fullscreen mode

Container Registry and ECS Cluster

Now, it's time to create the Container Registry and the ECS Cluster. First let's create the Container Registry with the code bellow:

# ecr.tf | Elastic Container Repository

resource "aws_ecr_repository" "aws-ecr" {
  name = "${var.app_name}-${var.app_environment}-ecr"
  tags = {
    Name        = "${var.app_name}-ecr"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

The ECR is a repository where we're gonna store the Docker Images of the application we want to deploy. It works like the Docker Hub, if you're familiar with Docker. You can build the Docker Image locally and push it to the ECR or use a CI/CD platform to do it.

Now we're going to create the ECS Cluster, Service and Task Definition.
A service is a configuration that enables us to run and maintain a number of tasks simultaneously in a cluster. The containers are defined by a Task Definition that are used to run tasks in a service.

Before we create the ECS Cluster, we need to create an IAM policy to enable the service to pull the image from ECR.

# iam.tf | IAM Role Policies

resource "aws_iam_role" "ecsTaskExecutionRole" {
  name               = "${var.app_name}-execution-task-role"
  assume_role_policy = data.aws_iam_policy_document.assume_role_policy.json
  tags = {
    Name        = "${var.app_name}-iam-role"
    Environment = var.app_environment
  }
}

data "aws_iam_policy_document" "assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "ecsTaskExecutionRole_policy" {
  role       = aws_iam_role.ecsTaskExecutionRole.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}
Enter fullscreen mode Exit fullscreen mode

Now let's create what we need for ECS. First we create the ECS Cluster:

resource "aws_ecs_cluster" "aws-ecs-cluster" {
  name = "${var.app_name}-${var.app_environment}-cluster"
  tags = {
    Name        = "${var.app_name}-ecs"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

I created a Log Group on CloudWatch to get the containers logs.

resource "aws_cloudwatch_log_group" "log-group" {
  name = "${var.app_name}-${var.app_environment}-logs"

  tags = {
    Application = var.app_name
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

I created a Task Definition compatible with AWS FARGATE, I preferred to do so in order to have a better cost of this infrastructure.

data "template_file" "env_vars" {
  template = file("env_vars.json")
}

resource "aws_ecs_task_definition" "aws-ecs-task" {
  family = "${var.app_name}-task"

  container_definitions = <<DEFINITION
  [
    {
      "name": "${var.app_name}-${var.app_environment}-container",
      "image": "${aws_ecr_repository.aws-ecr.repository_url}:latest",
      "entryPoint": [],
      "environment": ${data.template_file.env_vars.rendered},
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "${aws_cloudwatch_log_group.log-group.id}",
          "awslogs-region": "${var.aws_region}",
          "awslogs-stream-prefix": "${var.app_name}-${var.app_environment}"
        }
      },
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080
        }
      ],
      "cpu": 256,
      "memory": 512,
      "networkMode": "awsvpc"
    }
  ]
  DEFINITION

  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  memory                   = "512"
  cpu                      = "256"
  execution_role_arn       = aws_iam_role.ecsTaskExecutionRole.arn
  task_role_arn            = aws_iam_role.ecsTaskExecutionRole.arn

  tags = {
    Name        = "${var.app_name}-ecs-td"
    Environment = var.app_environment
  }
}

data "aws_ecs_task_definition" "main" {
  task_definition = aws_ecs_task_definition.aws-ecs-task.family
}
Enter fullscreen mode Exit fullscreen mode

An observation about the Task Definition is that I'm using the Terraform data function to set some environment variables that I defined in a JSON file (it needs an improvement to use AWS EKS or some other way to store secrets).

Ok, now let's create the ECS Service.

resource "aws_ecs_service" "aws-ecs-service" {
  name                 = "${var.app_name}-${var.app_environment}-ecs-service"
  cluster              = aws_ecs_cluster.aws-ecs-cluster.id
  task_definition      = "${aws_ecs_task_definition.aws-ecs-task.family}:${max(aws_ecs_task_definition.aws-ecs-task.revision, data.aws_ecs_task_definition.main.revision)}"
  launch_type          = "FARGATE"
  scheduling_strategy  = "REPLICA"
  desired_count        = 1
  force_new_deployment = true

  network_configuration {
    subnets          = aws_subnet.private.*.id
    assign_public_ip = false
    security_groups = [
      aws_security_group.service_security_group.id,
      aws_security_group.load_balancer_security_group.id
    ]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.target_group.arn
    container_name   = "${var.app_name}-${var.app_environment}-container"
    container_port   = 8080
  }

  depends_on = [aws_lb_listener.listener]
}
Enter fullscreen mode Exit fullscreen mode

I also defined a Security Group to avoid external connections to the containers.

resource "aws_security_group" "service_security_group" {
  vpc_id = aws_vpc.aws-vpc.id

  ingress {
    from_port       = 0
    to_port         = 0
    protocol        = "-1"
    security_groups = [aws_security_group.load_balancer_security_group.id]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name        = "${var.app_name}-service-sg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

Application Load Balancer

The next step is to setup a Load Balancer. As you could notice on the ECS configuration is that there's a reference to a load_balancer on it.

resource "aws_alb" "application_load_balancer" {
  name               = "${var.app_name}-${var.app_environment}-alb"
  internal           = false
  load_balancer_type = "application"
  subnets            = aws_subnet.public.*.id
  security_groups    = [aws_security_group.load_balancer_security_group.id]

  tags = {
    Name        = "${var.app_name}-alb"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

Now let's add a security group for the Load Balancer

resource "aws_security_group" "load_balancer_security_group" {
  vpc_id = aws_vpc.aws-vpc.id

  ingress {
    from_port        = 80
    to_port          = 80
    protocol         = "tcp"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
  tags = {
    Name        = "${var.app_name}-sg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

We also need to create a Load Balancer Target Group, it will relate the Load Balancer with the Containers.

resource "aws_lb_target_group" "target_group" {
  name        = "${var.app_name}-${var.app_environment}-tg"
  port        = 80
  protocol    = "HTTP"
  target_type = "ip"
  vpc_id      = aws_vpc.aws-vpc.id

  health_check {
    healthy_threshold   = "3"
    interval            = "300"
    protocol            = "HTTP"
    matcher             = "200"
    timeout             = "3"
    path                = "/v1/status"
    unhealthy_threshold = "2"
  }

  tags = {
    Name        = "${var.app_name}-lb-tg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

One very important thing here is the attribute path within health_check. This is a route on the application that the Load Balancer will use to check the status of the application.

At last let's create a HTTP listener for out Load Balancer.

resource "aws_lb_listener" "listener" {
  load_balancer_arn = aws_alb.application_load_balancer.id
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.target_group.id
  }
}
Enter fullscreen mode Exit fullscreen mode

Autoscaling

So, autoscaling is essential for the application I'm working on. To configure it on AWS I just needed to create an Autoscaling Target and two simple Autoscaling Policies. One to scale by CPU usage and another one for Memory usage.

# autoscaling.tf | Auto Scaling Group

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = 2
  min_capacity       = 1
  resource_id        = "service/${aws_ecs_cluster.aws-ecs-cluster.name}/${aws_ecs_service.aws-ecs-service.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_policy_memory" {
  name               = "${var.app_name}-${var.app_environment}-memory-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    target_value = 80
  }
}

resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
  name               = "${var.app_name}-${var.app_environment}-cpu-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = 80
  }
}
Enter fullscreen mode Exit fullscreen mode

So, the application will scale up if the memory or the cpu usage reaches 80% of usage. When it comes bellow this value, the application will scale down.

Variables

I believe you noticed we used a lot of variables for the Terraform configuration files. To use variables I created a file called variables.tf. This file only have the variables definitions.

# variables.tf | Auth and Application variables

variable "aws_access_key" {
  type        = string
  description = "AWS Access Key"
}

variable "aws_secret_key" {
  type        = string
  description = "AWS Secret Key"
}

variable "aws_region" {
  type        = string
  description = "AWS Region"
}

variable "aws_cloudwatch_retention_in_days" {
  type        = number
  description = "AWS CloudWatch Logs Retention in Days"
  default     = 1
}

variable "app_name" {
  type        = string
  description = "Application Name"
}

variable "app_environment" {
  type        = string
  description = "Application Environment"
}

variable "cidr" {
  description = "The CIDR block for the VPC."
  default     = "10.0.0.0/16"
}

variable "public_subnets" {
  description = "List of public subnets"
}

variable "private_subnets" {
  description = "List of private subnets"
}

variable "availability_zones" {
  description = "List of availability zones"
}
Enter fullscreen mode Exit fullscreen mode

The values for each variable are defined in a file called terraform.tfvars.

aws_region        = "us-east-1"
aws_access_key    = "your aws access key"
aws_secret_key    = "your aws secret key"

# these are zones and subnets examples
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnets     = ["10.10.100.0/24", "10.10.101.0/24"]
private_subnets    = ["10.10.0.0/24", "10.10.1.0/24"]

# these are used for tags
app_name        = "node-js-app"
app_environment = "production"
Enter fullscreen mode Exit fullscreen mode

This file is not committed in my repository. I created it locally and use S3 to manage access and control its versions. It needs some improvements as well that I'll do further.

UPDATE: Now, with all the configuration files properly written, run the command terraform plan to check what changes are going to be done and terraform apply to review and apply the changes.

You can also be asking about the Database. Well, in this project I created a Cluster on MongoCloud and put the credentials on the environment.

The full code can be found on my [Github].(https://github.com/thnery/terraform-aws-template)

Thanks

Thank you for reading this post. I hope it could be useful. If you have any feedback, please, let me know.

Discussion (1)

Collapse
marcellodesales profile image
Marcello de Sales

Any idea on how to simplify your approach by creating the basics for aws.amazon.com/blogs/containers/au...?