Oloruntobi Olurombi

Posted on Sep 14, 2024

End-to-End Deployment and Monitoring of EKS and Flask Apps with Terraform, GitHub Actions, Helm, Prometheus and Grafana

#kubernetes #githubactions #terraform #monitoring

In the realm of cloud-native applications, automation plays a vital role in simplifying deployments and optimising infrastructure management. In this article, we will delve into how to automate the provisioning of an Amazon EKS cluster and deploy a Flask application using Terraform and GitHub Actions. Along the way, we'll cover essential security best practices to safeguard your environment, explore monitoring techniques to ensure system health, and discuss building a resilient CI/CD pipeline for continuous integration and deployment. This comprehensive approach will help you achieve efficient, secure, and scalable cloud-native application management.

Overview
Prerequisites
Infrastructure Automation: EKS Cluster Setup
Application Deployment: Flask App on EKS
Monitoring with Prometheus and Grafana
Security Best Practices
Conclusion

Overview

The goal of this project is to automate the deployment of a containerised Flask application on an EKS (Elastic Kubernetes Service) cluster. Using Terraform to provision AWS resources and GitHub Actions to automate the CI/CD pipeline, this setup allows for seamless infrastructure management and application deployment.

Why Terraform?

Terraform enables you to write declarative code for infrastructure. Instead of manually creating resources like VPCs, subnets, or an EKS cluster, we automate everything via Infrastructure as Code (IaC).

Why GitHub Actions?

GitHub Actions provides a powerful way to integrate CI/CD, testing, static analysis, and security checks into the code deployment process.

Prerequisites

Before diving into the automation, here are the prerequisites you’ll need to get started:

AWS Account: Create an AWS account if you don’t have one.
IAM Access Keys: Set up access keys with permissions for managing EKS, EC2, and S3.
S3 Bucket: Create an S3 bucket to store your Terraform state files securely.
AWS CLI: Install and configure the AWS CLI.
Terraform: Make sure Terraform is installed on your local machine or use GitHub Actions for automation.
GitHub Secrets: Add AWS credentials (access keys, secret keys) and other sensitive data as GitHub secrets to avoid hardcoding them.
Synk: Create a Synk account and get your Token.
SonarCloud: Create a SonarCloud account and get your Token, Organisation key and Project key.

Infrastructure Automation: EKS Cluster Setup

Automating infrastructure deployment is key to maintaining scalable, consistent, and reliable environments. In this project, Terraform is utilised to automate the provisioning of an EKS cluster, its node groups, and the supporting AWS infrastructure. This includes VPC creation, IAM roles, S3 bucket setup, and cloud resources like CloudWatch and CloudTrail for logging and monitoring.

Terraform Setup

Let’s start by provisioning the necessary infrastructure. Below is the detailed explanation of the key resources defined in the Terraform files.

EKS Cluster and Node Group (main.tf):

This provision an EKS cluster and node group with IAM roles attached.
The cluster supports encryption using a KMS key, and the worker nodes are set up to scale between a minimum of 2 nodes. Outputs include the cluster name and endpoint for easy reference.

touch main.tf

terraform {
  backend "s3" {
    bucket = "regtech-iac"
    key = "terraform.tfstate"
    region = "us-east-1"
    encrypt = true 
  }
}

# Provides an EKS Cluster
resource "aws_eks_cluster" "eks_cluster" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster_role.arn

  version = "1.28"

  vpc_config {
    subnet_ids = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id, aws_subnet.public_subnet_3.id]
  }

  encryption_config {
    provider {
      key_arn = aws_kms_key.eks_encryption_key.arn
    }
    resources = ["secrets"]
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Cluster handling.
  # Otherwise, EKS will not be able to properly delete EKS managed EC2 infrastructure such as Security Groups.
  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy_attachment,
    aws_iam_role_policy_attachment.eks_service_policy_attachment,
  ]
}

# Provides an EKS Node Group 

resource "aws_eks_node_group" "eks_node_group" {
  cluster_name    = aws_eks_cluster.eks_cluster.name
  node_group_name = var.node_group_name
  node_role_arn   = aws_iam_role.eks_node_group_role.arn
  subnet_ids      = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id, aws_subnet.public_subnet_3.id]

  scaling_config {
    desired_size = 2
    max_size     = 2
    min_size     = 2
  }
  update_config {
    max_unavailable = 1
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy_attachment,
    aws_iam_role_policy_attachment.eks_cni_policy_attachment,
    aws_iam_role_policy_attachment.ec2_container_registry_readonly,
  ]
}

# Extra resources 
resource "aws_ebs_volume" "volume_regtech"{
    availability_zone = var.az_a
    size = 40
    encrypted = true
    type = "gp2"
    kms_key_id        = aws_kms_key.ebs_encryption_key.arn
}

resource "aws_s3_bucket" "regtech_iac" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_server_side_encryption_configuration" "regtech_iac_encrypt_config" {
    bucket = aws_s3_bucket.regtech_iac.bucket
    rule {
        apply_server_side_encryption_by_default {
        kms_master_key_id = aws_kms_key.s3_encryption_key.arn  
        sse_algorithm = "aws:kms"
        }
    }
}


# OutPut Resources
output "endpoint" {
  value = aws_eks_cluster.eks_cluster.endpoint
}

output "eks_cluster_name" {
    value = aws_eks_cluster.eks_cluster.name
}

Networking (vpc.tf):

Defines a VPC, public subnets for the EKS cluster, and private subnets for other resources, ensuring flexibility in network architecture.

vpc.tf

# Provides a VPC resource
resource "aws_vpc" "main" {
  cidr_block       = var.vpc_cidr_block
  instance_tenancy = "default"

  tags = {
    Name = var.tags_vpc
  }
}


# Provides an VPC Public subnet resource
resource "aws_subnet" "public_subnet_1" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.p_s_1_cidr_block
  availability_zone = var.az_a
  map_public_ip_on_launch = true

  tags = {
    Name = var.tags_public_subnet_1
  }
}

resource "aws_subnet" "public_subnet_2" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.p_s_2_cidr_block
  availability_zone = var.az_b
  map_public_ip_on_launch = true

  tags = {
    Name = var.tags_public_subnet_2
  }
}

resource "aws_subnet" "public_subnet_3" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.p_s_3_cidr_block
  availability_zone = var.az_c
  map_public_ip_on_launch = true

  tags = {
    Name = var.tags_public_subnet_3
  }
}

# Provides an VPC Private subnet resource
resource "aws_subnet" "private_subnet_1" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.private_s_1_cidr_block
  availability_zone = var.az_private_a
  map_public_ip_on_launch = false 

  tags = {
    Name = var.tags_private_subnet_1
  }
}

resource "aws_subnet" "private_subnet_2" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.private_s_2_cidr_block
  availability_zone = var.az_private_b
  map_public_ip_on_launch = false 

  tags = {
    Name = var.tags_private_subnet_2
  }
}

resource "aws_subnet" "private_subnet_3" {
  vpc_id     = aws_vpc.main.id
  cidr_block = var.private_s_3_cidr_block
  availability_zone = var.az_private_c
  map_public_ip_on_launch = false 

  tags = {
    Name = var.tags_private_subnet_3
  }
}

IAM Roles (iam.tf):

IAM roles and policies for the EKS cluster, node groups, and autoscaler. Includes roles for security services like CloudWatch and CloudTrail, ensuring robust monitoring.

iam.tf

# Declare the aws_caller_identity data source
data "aws_caller_identity" "current" {}


# IAM Role for EKS Cluster Plane 

resource "aws_iam_role" "eks_cluster_role" {
    name = var.eks_cluster_role_name

    assume_role_policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "eks.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy_attachment" {
    policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
    role = aws_iam_role.eks_cluster_role.name 
}

resource "aws_iam_role_policy_attachment" "eks_service_policy_attachment" {
  role       = aws_iam_role.eks_cluster_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
}

# IAM Role for Worker node

resource "aws_iam_role" "eks_node_group_role" {
    name = var.eks_node_group_role_name

    assume_role_policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "ec2.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy_attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_node_group_role.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy_attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_node_group_role.name
}

resource "aws_iam_role_policy_attachment" "ec2_container_registry_readonly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_node_group_role.name
}

resource "aws_iam_instance_profile" "eks_node_instance_profile" {
    name = var.eks_node_group_profile
    role = aws_iam_role.eks_node_group_role.name
}


# Policy For volume creation and attachment

resource "aws_iam_role_policy" "eks_node_group_volume_policy" {
  name   = var.eks_node_group_volume_policy_name
  role   = aws_iam_role.eks_node_group_role.name
  policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "ec2:CreateTags",
          "ec2:DescribeTags",
          "ec2:DescribeVolumes",
          "ec2:DescribeVolumeStatus",
          "ec2:CreateVolume",
          "ec2:AttachVolume"
        ],
        "Resource": "arn:aws:ec2:${var.region}:${data.aws_caller_identity.current.account_id}:volume/*"
      }
    ]
  })
}


# IAM Role for CloudWatch 

resource "aws_iam_role" "cloudwatch_role" {
  name = "cloudwatch_role_log"

  assume_role_policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "cloudwatch.amazonaws.com"
        },
        "Effect": "Allow",
        "Sid": ""
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cloudwatch_policy_attachment" {
  role       = aws_iam_role.cloudwatch_role.name
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchLogsFullAccess"
}

# IAM Role for CloudTrail

resource "aws_iam_role" "cloudtrail_role" {
  name = "cloudtrail_role_log"

  assume_role_policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "cloudtrail.amazonaws.com"
        },
        "Effect": "Allow",
        "Sid": ""
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cloudtrail_policy_attachment" {
  role       = aws_iam_role.cloudtrail_role.name
  policy_arn = "arn:aws:iam::aws:policy/AWSCloudTrail_FullAccess"
}


# KMS Key Policy for Encryption

resource "aws_kms_key" "ebs_encryption_key" {
  description = "KMS key for EBS volume encryption"
}

resource "aws_kms_key" "s3_encryption_key" {
  description = "KMS key for S3 bucket encryption"
}

resource "aws_kms_key" "eks_encryption_key" {
  description = "KMS key for EKS secret encryption"
}


resource "aws_s3_bucket_policy" "regtech_iac_policy" {
  bucket = aws_s3_bucket.regtech_iac.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "cloudtrail.amazonaws.com"
        }
        Action = "s3:GetBucketAcl"
        Resource = "arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}"
      },
      {
        Effect = "Allow"
        Principal = {
          Service = "cloudtrail.amazonaws.com"
        }
        Action = "s3:PutObject"
        Resource = "arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}/AWSLogs/${data.aws_caller_identity.current.account_id}/*"
        Condition = {
          StringEquals = {
            "s3:x-amz-acl" = "bucket-owner-full-control"
          }
        }
      }
    ]
  })
}

CloudWatch and Monitoring (cloudwatch.tf):

This provisions CloudWatch log groups, an SNS topic for alerts, and a CloudWatch alarm to monitor CPU utilisation. CloudTrail logs are configured to monitor S3 and management events.

touch cloudwatch.tf

resource "aws_cloudwatch_log_group" "eks_log_group" {
  name              = "/aws/eks/cluster-logs-regtech"
  retention_in_days = 30
}

resource "aws_cloudtrail" "security_trail" {
  name                          = "security-trail-log"
  s3_bucket_name                = aws_s3_bucket.regtech_iac.bucket
  include_global_service_events = true
  is_multi_region_trail         = true
  enable_log_file_validation    = true

  event_selector {
    read_write_type           = "All"
    include_management_events = true

    data_resource {
      type = "AWS::S3::Object"
      values = ["arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}/"]
    }
  }
}


resource "aws_sns_topic" "alarm_topic" {
    name = "high-cpu-alarm-topic"
}

resource "aws_sns_topic_subscription" "alarm_subscription" {
    topic_arn = aws_sns_topic.alarm_topic.arn
    protocol = "email"
    endpoint = "oloruntobiolurombi@gmail.com"
}

resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
  alarm_name          = "high_cpu_usage"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "70"

  alarm_actions = [
    aws_sns_topic.alarm_topic.arn
  ]
}

AutoScaler IAM (iam-autoscaler.tf):

This will provision roles and policies for enabling the EKS Cluster Autoscaler are included, which will help in adjusting the number of worker nodes based on resource demands.

data "aws_iam_policy_document" "eks_cluster_autoscaler_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"
      values   = ["system:serviceaccount:kube-system:cluster-autoscaler"]
    }

    principals {
      identifiers = [aws_iam_openid_connect_provider.eks.arn]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "eks_cluster_autoscaler" {
  assume_role_policy = data.aws_iam_policy_document.eks_cluster_autoscaler_assume_role_policy.json
  name               = "eks-cluster-autoscaler"
}

resource "aws_iam_policy" "eks_cluster_autoscaler" {
  name = "eks-cluster-autoscaler"

  policy = jsonencode({
    Statement = [{
      Action = [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ]
      Effect   = "Allow"
      Resource = "*"
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_autoscaler_attach" {
  role       = aws_iam_role.eks_cluster_autoscaler.name
  policy_arn = aws_iam_policy.eks_cluster_autoscaler.arn
}

output "eks_cluster_autoscaler_arn" {
  value = aws_iam_role.eks_cluster_autoscaler.arn
}

Routing (`security_groups.tf`)

This defines the security groups required for your infrastructure. Security groups act as virtual firewalls that control the inbound and outbound traffic to your resources.

touch security_groups.tf

# Provides a security group 
resource "aws_security_group" "main_sg" {
    name = "main_sg"
    description = var.main_sg_description
    vpc_id = aws_vpc.main.id 

    ingress  {
        description = "ssh access"
        from_port = 22
        to_port = 22
        protocol = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
    }

    ingress  {
        description = "Kubernetes API access"
        from_port = 443
        to_port = 443 
        protocol = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
    }

    egress {
        from_port = 0
        to_port = 0
        protocol = -1 
        cidr_blocks = ["0.0.0.0/0"]
    }

    tags = {
        Name = var.tags_main_sg_eks
    }
}

Variables (`variables.tf`)

This file holds the variable definitions that are utilized throughout your Terraform configurations. These variables serve as configurable parameters, offering default values that can be modified or overridden based on specific requirements. By centralizing variable management, this approach ensures flexibility and reusability, allowing you to tailor infrastructure settings without altering the core configuration files. This is particularly useful when deploying infrastructure across different environments or making adjustments to scale resources.

touch variables.tf

variable "region" {
    type = string 
    default = "us-east-1"
}

variable "bucket_name" {
    type = string 
    default = "regtech-logs"
}

variable "aws_access_key_id" {
    type = string
    default = ""
}

variable "aws_secret_access_key" {
    type = string
    default = ""
}

variable "tags_vpc" {
    type = string 
    default = "main-vpc-eks"
}

variable "tags_public_rt" {
    type = string 
    default = "public-route-table"
}

variable "tags_igw" {
    type = string 
    default = "internet-gateway"
}

variable "tags_public_subnet_1" {
    type = string 
    default = "public-subnet-1"
}

variable "tags_public_subnet_2" {
    type = string 
    default = "public-subnet-2"
}

variable "tags_public_subnet_3" {
    type = string 
    default = "public-subnet-3"
}

variable "tags_private_subnet_1" {
    type = string 
    default = "private-subnet-1"
}

variable "tags_private_subnet_2" {
    type = string 
    default = "private-subnet-2"
}

variable "tags_private_subnet_3" {
    type = string 
    default = "private-subnet-3"
}

variable "tags_main_sg_eks" {
    type = string
    default = "main-sg-eks"
}

variable "instance_type" {
    type = string 
    default = "t2.micro"
}

variable "cluster_name" {
    type = string 
    default = "EKSCluster"
}

variable "node_group_name" {
    type = string 
    default = "SlaveNode"
}

variable "vpc_cidr_block" {
    type = string 
    default = "10.0.0.0/16"
}

variable "p_s_1_cidr_block" {
    type = string 
    default = "10.0.1.0/24"
}

variable "az_a" {
    type = string 
    default = "us-east-1a"
}

variable "p_s_2_cidr_block" {
    type = string 
    default = "10.0.2.0/24"
}

variable "az_b" {
    type = string 
    default = "us-east-1b"
}

variable "p_s_3_cidr_block" {
    type = string 
    default = "10.0.3.0/24"
}

variable "az_c" {
    type = string 
    default = "us-east-1c"
}

variable "private_s_1_cidr_block" {
    type = string 
    default = "10.0.4.0/24"
}

variable "az_private_a" {
    type = string 
    default = "us-east-1c"
}

variable "private_s_2_cidr_block" {
    type = string 
    default = "10.0.5.0/24"
}

variable "az_private_b" {
    type = string 
    default = "us-east-1c"
}

variable "private_s_3_cidr_block" {
    type = string 
    default = "10.0.6.0/24"
}

variable "az_private_c" {
    type = string 
    default = "us-east-1c"
}

variable "main_sg_description" {
    type = string 
    default = "Allow TLS inbound traffic and all outbound traffic"
}


variable "eks_node_group_profile" {
    type = string 
    default = "eks-node-group-instance-profile_log"
}

variable "eks_cluster_role_name" {
    type = string 
    default = "eksclusterrole_log"
}

variable "eks_node_group_role_name" {
    type = string 
    default = "eks-node-group-role_log"
}

variable "eks_node_group_volume_policy_name" {
    type = string 
    default = "eks-node-group-volume-policy"
}

variable "eks_describe_cluster_policy_name" {
    type = string 
    default = "eks-describe-cluster-policy_log"
}

variable "tags_nat" {
    type = string 
    default = "nat-gateway_eip"
}

variable "tags_k8s-nat" {
    type = string 
    default = "k8s-nat"
}

Provider (`provider.tf`)

This step is essential in any Terraform project, as it defines the provider configuration, specifying which cloud platform or service you'll be interacting with. In this case, the provider is AWS, and this configuration establishes the connection between Terraform and the AWS environment. It ensures that all infrastructure resources are provisioned and managed within the specified cloud platform. Properly setting up the provider is foundational to the entire Terraform workflow, enabling seamless communication with AWS services such as EC2, S3, and EKS. Without it, Terraform wouldn't know where to deploy or manage the infrastructure.

touch

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = var.region
  access_key = var.aws_access_key_id
  secret_key = var.aws_secret_access_key 
}

IAM OpenID (`oidc.tf`)

This component sets up an IAM OpenID Connect (OIDC) provider, which plays a critical role in enabling secure authentication and identity management for your AWS resources. By establishing this OIDC provider, your Kubernetes clusters can seamlessly integrate with AWS IAM, allowing you to manage permissions and roles for applications running within the cluster. This is particularly important for securely granting temporary, limited access to AWS services like S3 or DynamoDB, without the need for hardcoding credentials. The OIDC provider facilitates trust between AWS IAM and external identity providers, enabling scalable, secure access control across your infrastructure.

touch oidc.tf

data "tls_certificate" "eks" {
  url = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer
}

GitHub Actions Workflow setup: `eks-setup.yaml`

The eks-setup.yaml file is designed to automate the process of deploying an EKS cluster on AWS. This workflow streamlines the entire infrastructure setup, eliminating manual intervention and ensuring consistency across deployments.

Purpose:

This workflow automates the provisioning of AWS infrastructure, focusing on setting up an Amazon EKS cluster. By leveraging Terraform within GitHub Actions, it ensures that your EKS cluster is deployed efficiently and consistently, aligning with infrastructure-as-code best practices.

Steps:

AWS Login: Configures the AWS credentials required for secure authentication, ensuring Terraform has the proper access to interact with AWS services.

Terraform Initialisation: Initialises Terraform by downloading and configuring the necessary provider plugins (such as AWS), setting up the working environment to handle the infrastructure resources.

Terraform Plan: Generates a detailed execution plan, outlining the changes Terraform will make to the infrastructure, without actually applying those changes yet. This step helps verify the proposed updates.

Terraform Apply: Executes the Terraform configuration, applying the planned changes and provisioning the EKS cluster along with any related resources. This fully automates the creation of the Kubernetes control plane, networking, and worker nodes in AWS.

This workflow is essential for ensuring a repeatable, scalable deployment process while maintaining the flexibility to adjust infrastructure configurations based on changing requirements.

name: Set up EKS with Terraform

on: push

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_REGION: ${{ secrets.AWS_REGION }}
  EKS_CLUSTER_NAME: ${{ secrets.EKS_CLUSTER_NAME }}

jobs:
  LogInToAWS:
    runs-on: ubuntu-latest
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

  TerraformInit:
    runs-on: ubuntu-latest
    needs: LogInToAWS
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Initialize Terraform
        run: terraform init

  TerraformPlan:
    runs-on: ubuntu-latest
    needs: TerraformInit
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Terraform Plan
        run: terraform plan

  TerraformApply:
    runs-on: ubuntu-latest
    needs: TerraformPlan
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Apply Terraform configuration
        run: terraform apply -auto-approve

Essentially, the workflow carries out several key tasks to automate the deployment process:

First, it checks out the latest version of your code from the repository, ensuring that the most up-to-date changes are available.
Next, it configures the necessary AWS credentials to securely authenticate with your cloud environment, allowing the workflow to interact with AWS services.
Finally, it initialises Terraform, setting up the working environment, and applies the EKS cluster configuration, provisioning the infrastructure based on the defined Terraform scripts. This ensures the EKS cluster is deployed correctly, ready to manage Kubernetes resources efficiently.

Lastly, we need to configure our control machine, which has kubectl installed, to communicate with the EKS cluster. This involves updating the machine with the appropriate cluster name and AWS region. By doing so, kubectl can interact with the cluster, allowing us to manage resources such as pods, services, and deployments. Without this step, the control machine won’t be able to send commands to the cluster, which is crucial for managing and monitoring our Kubernetes environment effectively.

aws eks update-kubeconfig --region region-code --name my-cluster

With our infrastructure now fully provisioned and operational, the next step is to set up our Flask application and deploy it onto the newly created environment. This involves configuring the necessary application dependencies, setting up environment variables, and ensuring the app is containerised for deployment. Once everything is configured, we can seamlessly deploy the Flask app to our infrastructure, leveraging the scalability and reliability of the EKS cluster to manage the application. This deployment marks the transition from infrastructure setup to delivering a functional, production-ready application.

Flask App and Docker Setup:

This guide will walk you through setting up a basic Flask application, configuring the necessary files, and preparing it for testing. We’ll also cover how to set up a Python virtual environment to manage dependencies and keep your project isolated.

Step 1: Set Up the Flask Application

Create a new directory for your application

Start by creating a dedicated directory for your Flask project. This keeps everything organised:

mkdir regtech-docker-app
cd regtech-docker-app

(Optional) Set Up a Python Virtual Environment

It’s highly recommended to use a Python virtual environment to isolate your project dependencies. This ensures that packages installed for this project won’t affect other Python projects:

python3 -m venv venv
source venv/bin/activate

Once the virtual environment is active, you’ll see (venv) before your terminal prompt.

Install Flask

Next, you need to install Flask. Before doing so, create your main application file:

touch app.py

Then install Flask using pip:

pip install Flask

Create the Flask Application

Now, let's populate the app.py file with a basic Flask app. This app includes CSRF protection and loads configurations from a separate config.py file:

from flask import Flask
from flask_wtf.csrf import CSRFProtect
from config import Config

app = Flask(__name__)
app.config.from_object(Config)

csrf = CSRFProtect(app)

@app.route('/')
def hello():
    return "Hello, Welcome to Zip Reg Tech!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

This sets up a simple route (/) that returns a greeting message when accessed.

Create a requirements.txt File

To ensure all dependencies are properly documented, generate a requirements.txt file. This file will list all installed packages, including Flask:

pip freeze > requirements.txt

Now add the following dependencies:

blinker==1.8.2
click==8.1.7
Flask==3.0.3
itsdangerous==2.2.0
Jinja2==3.1.4
MarkupSafe==2.1.5
Werkzeug==3.0.4
pytest
requests
Flask-WTF

This step ensures that anyone who clones your repository can easily install all the required dependencies using pip install -r requirements.txt.

Create the Configuration File

You’ll need a configuration file to manage environment-specific settings like the app’s secret key. Let’s create a config.py file:

touch config.py

Populate it with the following code:

# config.py
import os
import secrets

class Config:
    SECRET_KEY = os.getenv('SECRET_KEY', secrets.token_urlsafe(32))

This file uses the environment variable SECRET_KEY if available; otherwise, it generates a secure token on the fly. You can set the environment variable in your Docker container or deployment environment.

Write Unit Tests for the Flask App

To ensure the application works as expected, let’s add some unit tests. Create a test_app.py file:

touch test_app.py

Inside the file, add the following code to test the root endpoint (/):

from app import app 

def test_hello():
    with app.test_client() as client:
        response = client.get('/')
        assert response.data == b"Hello, Welcome to Zip Reg Tech!"
        assert response.status_code == 200

This test simulates an HTTP request to the Flask app and verifies that the correct response is returned.

Create Integration Tests

To simulate real-world conditions, we’ll write an integration test that starts the Flask app, makes a request to it, and verifies the response. Create a test_integration.py file:

touch test_integration.py

Now add the following code for integration testing:

import requests
from app import app
import multiprocessing
import time

# Run Flask app in a separate process for integration testing
def run_app():
    app.run(host="0.0.0.0", port=5000)

def test_integration():
    # Start the Flask app in a background process
    p = multiprocessing.Process(target=run_app)
    p.start()

    # Give the app a moment to start up
    time.sleep(2)

    # Make an HTTP request to the running Flask app
    response = requests.get('http://localhost:5000/')

    # Check that the response is as expected
    assert response.status_code == 200
    assert response.text == "Hello, Welcome to Zip Reg Tech!"

    # Terminate the Flask app process
    p.terminate()

This integration test launches the Flask app in a separate process and makes an HTTP request to the / route. It then verifies the status code and response body before terminating the app.

By following these steps, you have a fully functional Flask app with unit and integration tests. The next step will be setting up Docker for containerisation, ensuring your app is ready for deployment in any environment.

Step 2: Set Up Docker and Push the Image to Amazon Elastic Container Registry (ECR)

In this section, we'll create a Docker image for our Flask app, configure the necessary Docker files, and then push the image to Amazon Elastic Container Registry (ECR) for deployment.

Create the Dockerfile

A Dockerfile is a script that defines how your Docker image is built. In the same directory as your Flask application, create a new file named Dockerfile and add the following content:

# Use an official Python runtime as a base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define environment variable to prevent Python from buffering stdout/stderr
ENV PYTHONUNBUFFERED=1

# Run the application
CMD ["python", "app.py"]

Base Image: We are using python:3.9-slim as our base image. This is a lightweight version of Python that reduces the size of the final Docker image.

Working Directory: The WORKDIR /app command sets the working directory for subsequent instructions.

Copying Files: The COPY . /app command copies the current directory’s contents (including your Flask app and config files) into the container.

Install Dependencies: The RUN pip install --no-cache-dir -r requirements.txt installs all the necessary dependencies as specified in the requirements.txt file.

Expose Port: The EXPOSE 5000 command allows the Flask app to communicate over port 5000 from inside the container.

Set Environment Variable: ENV PYTHONUNBUFFERED=1 ensures that Flask logs are outputted in real time rather than being buffered.

Run the App: The CMD directive specifies that app.py will be executed when the container starts.

Create the .dockerignore File

To keep your Docker image clean and avoid copying unnecessary files into the container, create a .dockerignore file in your project’s root directory. This file works similarly to .gitignore, telling Docker which files to exclude from the build context.

Create the .dockerignore file with the following content:

venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.Python

venv/: Prevents your local Python virtual environment from being included in the Docker image.

__pycache__/: Excludes Python cache files generated during development.

File extensions: Ignoring Python compiled files like .pyc, .pyo, and .pyd.

.Python: Excludes the Python build files.

Build and Test the Docker Image Locally

To make sure everything works, you can build and run your Docker image locally. In your project directory, run the following commands:

docker build -t regtech-docker-app .
docker run -p 5100:5000 regtech-docker-app

docker build: This command creates the Docker image, tagging it as regtech-docker-app.

docker run: Runs the Docker container on port 5100, forwarding traffic from your machine to the container.

At this point, your Flask app should be running at http://localhost:5100.

Create an Elastic Container Registry (ECR) Repository

Log in to the AWS Console:

Open the AWS Management Console and search for ECR (Elastic Container Registry).

Create a New Repository:

On the ECR landing page, click the Create repository button.
Give your repository a name (e.g., regtech-docker-app), then click Create.

Confirm Repository Creation: Once the repository is created, you’ll be redirected to a confirmation screen showing the repository details.

You’ll see the repository listed, confirming it was successfully created:

Setup Kubernetes Deployment and Service:

In this section, we will configure a Kubernetes Deployment and a NodePort Service for our Dockerized Flask application. This setup will ensure that your Flask app runs smoothly and is accessible from outside the Kubernetes cluster.

Create a Kubernetes Deployment and Service

A Deployment will handle managing the Flask application, making sure the desired number of Pods are always running. The NodePort Service will expose the Flask application on a specific port that can be accessed externally from outside the cluster. It maps a port on the Kubernetes nodes to the port where the Flask app is running inside the Pods.

Create a file named deploy.yml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: regtech-app-deployment
  labels:
    app: regtech-app
spec:
  replicas: 1
  selector:  
    matchLabels:
      app: regtech-app
  template:
    metadata:
      labels:
        app: regtech-app
    spec:
      containers:
      - name: regtech-app 
        image: REPOSITORY_TAG
        ports: 
        - containerPort: 5000 

---
apiVersion: v1 
kind: Service
metadata: 
  name: regtech-app-service
spec:
  type: NodePort
  selector:
    app: regtech-app
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 5000
      nodePort: 30030

Explanation of the Configuration

Deployment:

Deploys a single replica of the Flask application.

Each Pod runs a container from your Docker image specified in REPOSITORY_TAG.

The container exposes port 5000, which is the port the Flask application listens on.

Service:

Creates a NodePort Service that makes the Flask application accessible externally.

Maps port 5000 in the cluster to port 5000 on the Flask application container.

Exposes the application on NodePort 30030 (or a random port in the range 30000-32767 if not specified).

Deploying the Flask Application

The deployment of the Flask application is managed through a CI/CD pipeline, which automates the process of building the Docker image and deploying it to the Kubernetes cluster.

Required GitHub Secrets

You'll need to configure the following secrets in your GitHub repository:

AWS_ACCESS_KEY_ID: Your AWS access key.

AWS_SECRET_ACCESS_KEY: Your AWS secret access key.

AWS_REGION: The AWS Region where your cluster resides.

EKS_CLUSTER_NAME: Your EKS Cluster name.

NEW_GITHUB_TOKEN: Your github access token

ORGANIZATION_KEY: Your Sonarcloud organisation key

PROJECT_KEY: Your Sonarcloud project key

SONAR_TOKEN: Your SonarQube token.

SONAR_HOST_URL: The URL of your SonarQube server (for SonarCloud, this can be https://sonarcloud.io).

SNYK_TOKEN: The API token from Snyk.

Adding Secrets to GitHub:

Navigate to your GitHub repository.

Go to Settings > Secrets > Actions.

Add the required secrets (SONAR_TOKEN, SONAR_HOST_URL, and SNYK_TOKEN).

CI/CD Pipeline Process (regtech-app.yaml)

Here’s a step-by-step breakdown of the CI/CD workflow:

Linting and Static Analysis (SonarCloud):

Ensures code quality and identifies potential issues.

Unit and Integration Tests:

Validates the functionality of your code before deployment.

Security Scan (Snyk):

Detects vulnerabilities in your code dependencies.

Build Docker Image:

Packages the Flask application into a Docker image.

Push to Amazon ECR:

Publishes the Docker image to Amazon Elastic Container Registry (ECR).

Deploy to EKS:

Deploys the Docker image to your EKS cluster using kubectl.

Rollback:

Automatically rolls back the deployment if it fails.

Here’s the workflow configuration for regtech-app.yaml:

name: Deploy Flask App to EKS  

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

env:
  AWS_REGION: ${{ secrets.AWS_REGION }}
  EKS_CLUSTER_NAME: ${{ secrets.EKS_CLUSTER_NAME }}

jobs:
  # Step 1: Source Code Testing (Linting, Static Analysis, Unit Tests, Snyk Scan)
  Lint-and-Static-Analysis:
    name: Linting and Static Analysis (SonarQube) 
    runs-on: ubuntu-latest 
    steps:
      - name: Checkout repository 
        uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: SonarCloud Scan
        uses: sonarsource/sonarcloud-github-action@master  
        env:
          GITHUB_TOKEN: ${{ secrets.NEW_GITHUB_TOKEN }} 
          #ORGANIZATION_KEY: ${{ secrets.ORGANIZATION_KEY }} 
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }} 
        with:
          args: >
            -Dsonar.organization=${{ secrets.ORGANIZATION_KEY }}
            -Dsonar.projectKey=${{ secrets.PROJECT_KEY }} 
            -Dsonar.exclusions=venv/**
            -Dsonar.c.file.suffixes=-
            -Dsonar.cpp.file.suffixes=-
            -Dsonar.objc.file.suffixes=-

      - name: Check SonarCloud Quality Gate 
        run: |
          curl -u ${{ secrets.SONAR_TOKEN }} "https://sonarcloud.io/api/qualitygates/project_status?projectKey=${{ secrets.PROJECT_KEY }}" | grep '"status":"OK"' || exit 1

  UnitAndIntegrationTests:
    name: Unit and Integration Tests on Source Code
    runs-on: ubuntu-latest
    needs: Lint-and-Static-Analysis
    steps:
      - name: Checkout repository 
        uses: actions/checkout@v3

      - name: Set up Python 
        uses: actions/setup-python@v4
        with:
          python-version: '3'

      - name: Check Python version
        run: python --version

      - name: Verify venv creation
        run: ls -la venv/bin/

      - name: Clean up and recreate virtual environment
        run: |
          rm -rf venv
          python3 -m venv venv

      - name: Create virtual environment
        run: |
          python3 -m venv venv

      - name: Check Python executable path
        run: |
          which python3

      - name: List directory contents
        run: |
          cd /home/runner/work/regtech_accessment_cicd
          ls -la

      - name: Install dependencies 
        run: |
          cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
          source venv/bin/activate
          ls -la venv venv
          pip3 install -r requirements.txt

      - name: Run Unit Tests 
        run: |
          cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
          source venv/bin/activate 
          pytest test_app.py  

      - name: Run Integration tests
        run: |
          cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
          source venv/bin/activate
          pytest test_integration.py   

  SNYK-SCAN:
    name: Dependency Scanning (Snyk)
    runs-on: ubuntu-latest 
    needs: UnitAndIntegrationTests
    steps:
      - name: Checkout repository 
        uses: actions/checkout@master

      - name: Set up Python 
        uses: actions/setup-python@v4
        with:
          python-version: '3'

      - name: Check Python version
        run: python --version

      - name: Clean up and recreate virtual environment
        run: |
          rm -rf venv
          python3 -m venv venv

      - name: Create virtual environment
        run: |
          python3 -m venv venv

      - name: Check Python executable path
        run: |
          which python3

      - name: List directory contents
        run: |
          cd /home/runner/work/regtech_accessment_cicd
          ls -la

      - name: Install dependencies 
        run: |
          cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
          source venv/bin/activate
          ls -la venv venv
          pip3 install -r requirements.txt

      - name: Set up Snyk
        uses: snyk/actions/python-3.10@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
        with:
          args: --severity-threshold=high 

# Step 2: Build Docker Image 
  BuildImage-and-Publish-To-ECR:
    name: Build and Push Docker Image
    runs-on: ubuntu-latest 
    needs: SNYK-SCAN 
    steps:
    - name: Checkout 
      uses: actions/checkout@v4 

    - name: Login to ECR 
      uses: docker/login-action@v3 
      with:
        registry: 611512058022.dkr.ecr.us-east-1.amazonaws.com
        username: ${{ secrets.AWS_ACCESS_KEY_ID }}  
        password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        region: ${{ secrets.AWS_REGION }}

    - name: Build Image 
      run: | 
        docker build -t regtech-app .
        docker tag regtech-app:latest 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}
        docker push 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}

# Step 3: Docker Image Testing (Integration Tests Inside Container)
  Integration-Tests:
    name: Integration Tests on Docker Image
    runs-on: ubuntu-latest 
    needs: BuildImage-and-Publish-To-ECR
    steps:

      - name: Login to ECR 
        uses: docker/login-action@v3 
        with:
          registry: 611512058022.dkr.ecr.us-east-1.amazonaws.com
          username: ${{ secrets.AWS_ACCESS_KEY_ID }}  
          password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          region: ${{ secrets.AWS_REGION }}

      - name: Pull Docker Image from ECR 
        run: |
          docker pull 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}

      - name: Run Integration Tests inside Docker Container 
        run: |
          docker run --rm -v $(pwd):/results 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER} pytest --junitxml=/results/integration-test-results.xml

      - name: List Files in Current Directory
        run: |
          ls -l 

      - name: Upload Integration Test Results 
        uses: actions/upload-artifact@v3 
        with: 
          name: integration-test-results
          path: integration-test-results.xml
          if-no-files-found: warn

# Step 4: Install Kubectl
  Install-kubectl:
    name: Install Kubectl on The Github Actions Runner
    runs-on: ubuntu-latest
    needs: Integration-Tests
    steps:
    - name: Checkout 
      run: |
        curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/OS_DISTRIBUTION/amd64/kubectl
        chmod +x ./kubectl
        sudo mv ./kubectl /usr/local/bin/kubectl

# Step 5: Deploy To EKS
  Deploy-To-Cluster:
    runs-on: ubuntu-latest
    needs: Install-kubectl 
    steps:
    - name: Checkout
      uses: actions/checkout@v4

    - name: Configure credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ secrets.AWS_REGION }}

    - name: Download KubeConfig File 
      env:
        KUBECONFIG: ${{ runner.temp }}/kubeconfig

      run: | 
        aws eks update-kubeconfig --region ${{ secrets.AWS_REGION }} --name ${{ secrets.EKS_CLUSTER_NAME }} --kubeconfig $KUBECONFIG
        echo "KUBECONFIG=$KUBECONFIG" >> $GITHUB_ENV 
        echo $KUBECONFIG 

    - name: Deploy to EKS 
      run: |
        sed -i "s|image: REPOSITORY_TAG|image: 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}|g" ./deploy.yml
        kubectl apply -f ./deploy.yml

    - name: Check Deployment Status 
      id: check-status 
      run: |
        kubectl rollout status deployment.apps/regtech-app-deployment || exit 

    - name: Rollback Deployment 
      if: failure()
      run: | 
        echo "Deployment failed. Rolling back..."
        kubectl rollout undo deployment.apps/regtech-app-deployment

To determine which node your deployment is running on, follow these steps:

First, get the details of your running pods, including the node they are scheduled on, by running the following command:

kubectl get po -o wide

This command will provide detailed information about the pods, including their IP addresses, node assignments, and the container status.

Next, identify the public IP address of the node where your pod is running. To do this, list all the nodes in your cluster:

kubectl get nodes -o wide

You'll see a list of the nodes in your Kubernetes cluster, along with their corresponding external IPs.

Once you've identified the public IP of the node, copy it. You'll combine this public IP with the NodePort assigned to your Flask application to access it externally.

Finally, construct the URL by combining the public IP of the node with the NodePort (e.g., 30030 from the example). Enter this into your browser like so:

http://<PUBLIC_IP>:<NODE_PORT>

Hit enter, and if everything is set up correctly, your Flask app will be live and accessible from the browser.

Monitoring with Prometheus and Grafana

After deploying your application, it's crucial to set up monitoring to track its performance, health, and any potential issues in real-time. In this guide, we’ll walk through setting up Prometheus for monitoring, leveraging Grafana for visualization later.

Prometheus

Prometheus is a powerful monitoring and alerting toolkit that collects and stores metrics from both your application and the Kubernetes cluster itself.

We will use Helm, the Kubernetes package manager, to deploy Prometheus. Follow the steps below:

Create a namespace for Prometheus:

kubectl create namespace prometheus

Add the Prometheus Helm chart repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Deploy Prometheus using Helm:

helm upgrade -i prometheus prometheus-community/prometheus --namespace prometheus --set alertmanager.persistence.storageClass="gp2" --set server.persistentVolume.storageClass="gp2"

This command installs or upgrades the Prometheus instance in the prometheus namespace, setting the storage class for both Alertmanager and Prometheus Server's persistent volumes to gp2 (Amazon EBS General Purpose volumes).

Verify the deployment:

kubectl get pods -n prometheus

At this point, some of your pods might be in a Pending state due to missing Amazon Elastic Block Store (EBS) volumes. This is because the Amazon EBS CSI (Container Storage Interface) driver is required to manage EBS volumes for Kubernetes. Let’s set up the driver.

Amazon EBS CSI Driver Setup

Create an IAM OIDC identity provider for your cluster:

eksctl utils associate-iam-oidc-provider --cluster $cluster_name --approve

Ensure that eksctl is installed on your control machine for this step.

Create an IAM role for the Amazon EBS CSI plugin:

eksctl create iamserviceaccount \
    --name ebs-csi-controller-sa \
    --namespace kube-system \
    --cluster my-cluster \
    --role-name AmazonEKS_EBS_CSI_DriverRole \
    --role-only \
    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
    --approve

Add the Amazon EBS CSI driver as an EKS add-on:

To check the required platform version:

aws eks describe-addon-versions --addon-name aws-ebs-csi-driver

To install the EBS CSI driver add-on using eksctl:

eksctl create addon --name aws-ebs-csi-driver --cluster my-cluster --service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole --force

Replace my-cluster with your actual cluster name and with your AWS account number.

Verify the installation of the EBS CSI driver:

eksctl get addon --name aws-ebs-csi-driver --cluster my-cluster

Update the EBS CSI driver if needed.

eksctl update addon --name aws-ebs-csi-driver --version v1.11.4-eksbuild.1 --cluster my-cluster \
  --service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole --force

Re-check the pod status:

kubectl get pods -n prometheus

Your Prometheus pods should now be in the Running state.

Label the Prometheus server pod:

To connect the Prometheus server pod with a service:

kubectl label pod <pod-name> app=prometheus

Expose Prometheus via NodePort

Prometheus has a built-in web interface for accessing metrics. To access Prometheus externally, we will expose it via a NodePort service.

Create a Prometheus service YAML file:

touch prometheus-service.yml

Define the NodePort service configuration:

apiVersion: v1
kind: Service
metadata:
  name: prometheus-nodeport
  namespace: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - name: web
      port: 9090
      targetPort: 9090
      protocol: TCP
      nodePort: 30000  # You can choose any available port on your nodes
  type: NodePort

Apply the service configuration:

kubectl apply -f prometheus-service.yml

Access Prometheus:

Now you can access the Prometheus web UI by navigating to:

http://<NODE_PUBLIC_IP>:30000

Replace with the public IP address of any node in your cluster.

By following these steps, we now have a fully functioning Prometheus setup monitoring both our Kubernetes cluster and application metrics. Next, we will integrate Grafana to create rich, visual dashboards for real-time analysis of the collected metrics.

Grafana: Visualizing and Monitoring with Dashboard

A visualisation tool that integrates with Prometheus to create dashboards. It also allows you to set up alerts for events like high CPU usage, memory spikes, or pod failures in your EKS cluster.

Here’s how to deploy Grafana on your EKS cluster using Helm:

Add the Grafana Helm repository:

First, you'll need to add the Grafana Helm chart repository to Helm:

helm repo add grafana https://grafana.github.io/helm-charts

Create a Grafana namespace:

Set up a dedicated namespace for Grafana:

kubectl create namespace grafana

Create a Grafana YAML configuration file:

Grafana requires a configuration file to connect it to Prometheus. Create a file called grafana.yml with the following content:

  datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.prometheus.svc.cluster.local
      access: proxy
      isDefault: true

Ensure the Prometheus URL is correctly specified to point to the Prometheus service in your cluster.

Deploy Grafana using Helm:

Use Helm to deploy Grafana into your Kubernetes cluster. You can set up persistent storage for Grafana dashboards, configure the admin password, and provide the path to your grafana.yml file:

helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true \
    --set adminPassword='EKS!sAWSome' \
    --values /home/ec2-user/grafana.yaml \
    --set service.type=NodePort

Replace the adminPassword with a strong password of your choice, and ensure the path to your grafana.yml file is correct.

Verify the deployment

Check if the Grafana pods are running successfully in the Grafana namespace:

kubectl get pods -n grafana

Access Grafana:

Grafana will be exposed using a NodePort. You can access the Grafana UI using the public IP of any node in your cluster and the specified NodePort. For example:

nodeport:30281

Replace with the actual IP address of one of your nodes. The default NodePort is set to 30281, but it can vary based on your configuration.

Once you’ve accessed the Grafana web UI, log in using the credentials:

Username: admin

Password: The admin password you set during the Helm installation.

Create a new dashboard:

After logging in, you can create your first dashboard. To make things easier, you can import a pre-built dashboard tailored for Kubernetes monitoring.

Click "Create" → "Import" on the Grafana console.

Import a pre-built Kubernetes dashboard:

On the "Find and Import Dashboards for Common Applications" section, input the dashboard ID 17119 and click "Load".

Configure the data source:

Select Prometheus as the data source for the dashboard.

Click "Import" to load the dashboard.

Configure the data source:

Select Prometheus as the data source for the dashboard.

Click "Import" to load the dashboard.

View your dashboard:

After importing the dashboard, you can now visualize the performance of your Kubernetes cluster. The dashboard will display real-time metrics such as CPU and memory usage, pod status, and more.

By following these steps, we have a Grafana instance running on your EKS cluster, integrated with Prometheus to collect and visualise metrics.

Conclusion

By integrating Terraform and GitHub Actions, we fully automate the setup and management of our AWS infrastructure and Kubernetes-based application deployments. This setup ensures:

Scalability: You can easily scale your infrastructure to meet demand.

Efficiency: Automating deployments speeds up the process, reduces errors, and makes your development workflow smoother.

Security: Following security best practices protects your application and data.

☕️ If this article helped you avoid a tech meltdown or gave you a lightbulb moment, feel free to buy me a coffee! It keeps my code clean, my deployments smooth, and my spirit caffeinated. Help fuel the magic here!.

Happy deploying!

DEV Community

End-to-End Deployment and Monitoring of EKS and Flask Apps with Terraform, GitHub Actions, Helm, Prometheus and Grafana

Table of Contents

Overview

Why Terraform?

Why GitHub Actions?

Prerequisites

Infrastructure Automation: EKS Cluster Setup

Terraform Setup

EKS Cluster and Node Group (main.tf):

Networking (vpc.tf):

IAM Roles (iam.tf):

CloudWatch and Monitoring (cloudwatch.tf):

AutoScaler IAM (iam-autoscaler.tf):

Routing (`security_groups.tf`)

Variables (`variables.tf`)

Provider (`provider.tf`)

IAM OpenID (`oidc.tf`)

GitHub Actions Workflow setup: `eks-setup.yaml`

Purpose:

Steps:

Flask App and Docker Setup:

Step 1: Set Up the Flask Application

Step 2: Set Up Docker and Push the Image to Amazon Elastic Container Registry (ECR)

Setup Kubernetes Deployment and Service:

Explanation of the Configuration

Deploying the Flask Application

Required GitHub Secrets

Adding Secrets to GitHub:

CI/CD Pipeline Process (regtech-app.yaml)

Monitoring with Prometheus and Grafana

Prometheus

Grafana: Visualizing and Monitoring with Dashboard

Conclusion

Top comments (0)

Table of Contents

Overview

Why Terraform?

Why GitHub Actions?

Prerequisites

Infrastructure Automation: EKS Cluster Setup

Terraform Setup

EKS Cluster and Node Group (main.tf):

Networking (vpc.tf):

IAM Roles (iam.tf):

CloudWatch and Monitoring (cloudwatch.tf):

AutoScaler IAM (iam-autoscaler.tf):

Routing (security_groups.tf)

Variables (variables.tf)

Provider (provider.tf)

IAM OpenID (oidc.tf)

GitHub Actions Workflow setup: eks-setup.yaml

Purpose:

Steps:

Flask App and Docker Setup:

Step 1: Set Up the Flask Application

Step 2: Set Up Docker and Push the Image to Amazon Elastic Container Registry (ECR)

Setup Kubernetes Deployment and Service:

Explanation of the Configuration

Deploying the Flask Application

Required GitHub Secrets

Adding Secrets to GitHub:

CI/CD Pipeline Process (regtech-app.yaml)

Monitoring with Prometheus and Grafana

Prometheus

Grafana: Visualizing and Monitoring with Dashboard

Conclusion

Routing (`security_groups.tf`)

Variables (`variables.tf`)

Provider (`provider.tf`)

IAM OpenID (`oidc.tf`)

GitHub Actions Workflow setup: `eks-setup.yaml`