DEV Community

Cover image for How to do Canary Deployments on EKS
vivekpophale
vivekpophale

Posted on

How to do Canary Deployments on EKS

Overview
Testing out a new feature or upgrade in production is a challenging task. It is paramount to roll out changes frequently but without affecting the end user experience. This allows us to test the changes in real time, and the ability to quickly roll back the changes in the event of any unforeseen issues.
When you add the canary deployment to a Kubernetes cluster, it is managed by a service through selectors and labels. The service routes traffic to the pods with a specific label. This is helpful to add or remove deployments easily.

Image description

How Canary Deployments Work

Canary deployments involve running two versions of the application simultaneously. The old version is referred as “the stable” and the new “the canary.”

Here's a step-by-step explanation of how canary deployment works:

Initial Deployment

The existing version of the software is currently running in the production environment.
Developers create a new version or release with updates, bug fixes, or new features.

Deployment to a Subset (Canary Group)

Instead of deploying the new version to the entire user base, it is first released to a small subset of users or servers. This subset is often referred to as the "canary group."
The canary group typically represents a small percentage of the overall user base, allowing for a controlled and gradual release.

Monitoring and Testing

The performance, stability, and functionality of the new version are closely monitored within the canary group.
Automated testing and monitoring tools are often used to detect issues such as errors, crashes, or performance degradation.

Incremental Rollout

If the new version proves to be stable and performs well within the canary group, the deployment is gradually expanded to include a larger percentage of users.
This incremental rollout continues until the new version is deployed to the entire user base.

Rollback or Remediation

If issues are detected during the canary deployment, developers can quickly roll back the changes or implement fixes before the wider rollout.
This provides a safety net to minimize the impact of potential problems on the entire user base.
Completion:

Once the new version has been successfully deployed to the entire user base and no significant issues are detected, the canary deployment process is complete.

Canary Deployments in Kubernetes

Basically, a canary deployment creates a similar copy as that of the production environment with a load balancer routing user traffic between the available environments based on the defined parameters.

The canary deployment is controlled by services using selectors and labels. This service provides or forwards traffic to the labeled Kubernetes environment or pod, making it simple to add or remove deployments.

Firstly, a specific percentage of users are directed to the new application.The idea is to gradually roll out the new version to a subset of users or nodes, monitor its performance and stability, and then progressively deploy it to the entire system if everything looks good. This approach helps catch potential issues early and allows for quick rollbacks if problems arise.

For canary deployments, the selectors and labels used in the config or YAML file are different than those used in original deployments.

A service is created to allow access to all created pods or replicas through a single IP or name. Then ingress configuration sets a collection of rules allowing inbound connection to communicate with cluster services.

Why EKS

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that eliminates the need to install, operate, and maintain your own Kubernetes control plane on Amazon Web Services (AWS).

Features of Amazon EKS

The following are key features of Amazon EKS:

Secure networking and authentication
Amazon EKS integrates your Kubernetes workloads with AWS networking and security services. It also integrates with AWS Identity and Access Management (IAM) to provide authentication for your Kubernetes clusters.

Easy cluster scaling
Amazon EKS enables you to scale your Kubernetes clusters up and down easily based on the demand of your workloads. Amazon EKS supports horizontal Pod autoscaling based on CPU or custom metrics, and cluster autoscaling based on the demand of the entire workload.

Managed Kubernetes experience
You can make changes to your Kubernetes clusters using eksctl, AWS Management Console, AWS Command Line Interface (AWS CLI), the API, kubectl, and Terraform.

High availability
Amazon EKS provides high availability for your control plane across multiple Availability Zones.

Integration with AWS services
Amazon EKS integrates with other AWS services, providing a comprehensive platform for deploying and managing your containerized applications. You can also more easily troubleshoot your Kubernetes workloads with various observability tools.

Reference
https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html

Prerequisites:

  • Kubernetes cluster set up and configured.
  • kubectl command-line tool installed.

Environment setup
I have created EKS cluster using eksctl command line utility with below details.

  1. Cluster version is 1.27.
  2. Region ap-south-1.
  3. Node type t3.medium.
  4. Number of nodes 3.
eksctl create cluster --name my-demo-cluster --version 1.27 --region ap-south-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --managed
Enter fullscreen mode Exit fullscreen mode

Architecture

Image description

Image description

Steps to follow

  1. 10 replicas of version 1 is serving traffic
  2. Deploy 1 replica of version 2 (meaning ~5% of traffic)
  3. Wait to confirm that version 2 is stable and not throwing unexpected errors
  4. Scale up version 2 replicas to 10 and scale
  5. Wait until all instances are ready
  6. Scale down version 1 to 9 replicas.
  7. Shutdown version 1

Actual implementation

Deploy the first application

kubectl apply -f app-v1.yaml
Enter fullscreen mode Exit fullscreen mode

(https://github.com/vivekpophale/canaryexample/blob/main/appv1.yml)

#app-v1.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-v1
  labels:
    app: my-app
spec:
  replicas: 10
  selector:
    matchLabels:
      app: my-app
      version: v1.0.0
  template:
    metadata:
      labels:
        app: my-app
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9101"
    spec:
      containers:
      - name: my-app
        image: containersol/k8s-deployment-strategies
        ports:
        - name: http
          containerPort: 8080
        - name: probe
          containerPort: 8086
        env:
        - name: VERSION
          value: v1.0.0
        livenessProbe:
          httpGet:
            path: /live
            port: probe
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: probe
          periodSeconds: 5
Enter fullscreen mode Exit fullscreen mode

Deploy the service

kubectl apply -f service.yaml
Enter fullscreen mode Exit fullscreen mode

(https://github.com/vivekpophale/canaryexample/blob/main/service.yml)

#service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    targetPort: http
  selector:
    app: my-app
Enter fullscreen mode Exit fullscreen mode

Test if the deployment was successful

Image description

To see the deployment in action, open a new terminal and run a watch command.

It will show you a better view on the progress

watch kubectl get po
Enter fullscreen mode Exit fullscreen mode

Image description

Then deploy version 2 of the application and scale down version 1 to 9 replicas at same time

kubectl apply -f app-v2.yaml
kubectl scale --replicas=9 deploy my-app-v1
Enter fullscreen mode Exit fullscreen mode

(https://github.com/vivekpophale/canaryexample/blob/main/appv2.yml

#app-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-v2
  labels:
    app: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
      version: v2.0.0
  template:
    metadata:
      labels:
        app: my-app
        version: v2.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9101"
    spec:
      containers:
      - name: my-app
        image: containersol/k8s-deployment-strategies
        ports:
        - name: http
          containerPort: 8080
        - name: probe
          containerPort: 8086
        env:
        - name: VERSION
          value: v2.0.0
        livenessProbe:
          httpGet:
            path: /live
            port: probe
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: probe
          periodSeconds: 5
Enter fullscreen mode Exit fullscreen mode

Image description

Only one pod with the new version should be running.

You can test if the second deployment was successful

Image description

Image description
*If you are happy with it, scale up the version 2 to 10 replicas

kubectl scale --replicas=10 deploy my-app-v2
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

Then, when all pods are running, you can safely delete the old deployment

kubectl delete deploy my-app-v1
Enter fullscreen mode Exit fullscreen mode

Conclusion

This demo illustrated the benefit of using canary deployment and its ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. By slowly ramping up the load, you can monitor and capture metrics about how the new version impacts the production environment. This is an alternative approach to creating an entirely separate capacity testing environment, because the environment will be as production-like as it can be.

Reference-https://martinfowler.com/bliki/CanaryRelease.html?ref=wellarchitected

Top comments (2)

Collapse
 
bcouetil profile image
Benoit COUETIL 💫

Welcome, and thank you for sharing !

When you do canary testing, you do more than just checking that the pods are ready. Or a traditional rollout would suffice. You may elaborate on this to help readers understand the difference.

Another friendly advice : your shell output can be code too, this would print nicer than the screenshots, for which text size are different 😅

Collapse
 
vivekpophale profile image
vivekpophale

Thank you @bcouetil for your friendly advice, yes I do plan to cover it in my next article!!