Vedant Shrotria for LitmusChaos

Posted on Mar 14, 2022

Cloud Native Application & Testing Automation

#kubernetes #bash #productivity #devops

Hi Folks!!

In this article, we are going to discuss how we can create an automated pipeline using Github-Actions for a cloud-native application & use Cypress for testing the same.

If you are not much familiar with Cypress, you can also check my previous blog on End-To-End Testing Using Cypress for the same.

Okay, before starting with automation & pipeline, let's talk about some of the things that should be considered while choosing an environment for testing...

1. How does your application work & what are the resources it needs to be fully functional?

Here, we should think about how our application works as well as its functionalities. Well, the reason for this is that Your application can be cloud-native/monolithic & we might be able to run the same using docker/docker-compose or using k8s cluster as well.

If we can't run that application using docker/docker-compose, we need to look into the Kubernetes testing tools available.

In our case, ChaosCenter is a cloud-native application and also there is a chaos-agent component, which needs to run as a part of a k8s cluster. Simulation/Mocking of functionality is good for unit/integration testing, for end-to-end testing, we should try to have our testing environment as realistic as possible.

As we are planning more on the E2E testing and our application is also cloud-native, we chose to have a Kubernetes cluster on which we can deploy our application.

2. The testing environment will be transient or always running & which tool to use for provisioning the same?

Well, having a test environment that is always running helps to debug issues faster in the pipeline but at the same time as it is always running, it will be more costly. That cost can also be reduced by using spot instances on cloud providers like AWS. ( If you are using them :-) )

On the other side, if we automate the creation/deletion of the test environment as a part of the pipeline, it helps us to reduce the cost automatically, but at the same time, we have to pay in terms of time consumed for creation/deletion of our environment. Here, when we talk about debugging efforts, so yes we can create another pipeline that will automate the creation of the environment as well as testing but won't tear down the setup so that we can do debugging later and delete the environment manually.

With the above considerations, We finalized that we will be using a transient k8s cluster as our test environment. There are many tools like Minikube, KIND, K3s, K3D & many others which can be used as per requirements/use-case. In case you are using cloud providers, Tools like eksctl, gcloud, az, Terraform, Ansible & many other tools can also be used for the same.

After analyzing some of the tools mentioned above, we chose to go with K3d. The main reason behind choosing K3d is that we can create Loadbalancer services very easily in the K3d cluster without any extra configuration, Which on another side also allows us to have Ingress configuration as realistic as possible like cloud providers. This helped us when we wanted to test our application over loadBalancer & Ingress.

Okay okay, No more theory let's jump into steps....

Automation of Creation/Deletion of Clusters

As we discussed above, we don't want to have our testing cluster always running. So we need to automate the same as part of our pipeline.

As we are using Github-Actions workflows, K3d already has its GitHub-Action AbsaOSS/k3d-action@v2 that can be used for creating a transient cluster.

We can add this github-action as a step in our testing job as below.

- uses: AbsaOSS/k3d-action@v2
  name: Create 1st Cluster
  with:
    cluster-name: My-Test-Cluster
    args: >
       --agents 3
       --k3s-arg "--no-deploy=traefik, metrics server@server:*"

This will create a local cluster on GitHub-Runner VM and will be destroyed once the pipeline is completed.

Now, as you have created the cluster, let's also add one more step for basic checks of the cluster configuration.

- name: Configuring and Testing the Cluster Installation
  run: |
    kubectl cluster-info --context k3d-${{ env.SELF_AGENT }}
    kubectl get nodes
    kubectl get pods -n kube-system

This will do basic checks like checking nodes & pods in kube-system namespace. (You don't want to create a cluster locally just to check why is it not getting created, right?)

Application Setup

Well, we have created our test cluster & also added some validation checks. Let's automate the installation of our application in the pipeline.

For installing ChaosCenter, we use the k8s manifest which has all the components declared. Well, we could have added 2 simple commands for the same -

curl https://raw.githubusercontent.com/litmuschaos/litmus/master/litmus-portal/cluster-k8s-manifest.yml --output litmus-portal-setup.yml

kubectl apply -f litmus-portal-setup.yml

The above approach would have worked, but it's not flexible or scalable, right? What if there are more commands required for installation? I mean, we don't want to make our pipeline configuration very complex by adding the long list of commands, right?

We can use a bash script for the same. This will help in isolating all instructions related to the installation of the application. Also, future instructions can be added to the same script.

#application-installation.sh
#!/bin/bash

function install() {

    echo -e "\n---------------Installing Litmus-Portal in Cluster Scope----------\n"
    curl https://raw.githubusercontent.com/litmuschaos/litmus/master/litmus-portal/cluster-k8s-manifest.yml --output litmus-portal-setup.yml

    kubectl apply -f litmus-portal-setup.yml
}

install

Now, we can add a step in our pipeline workflow for executing this bash script.

- name: Deploying Litmus-Portal using **k8s-manifest**
  run: |
    chmod 755 ./litmus/application-installation.sh
    ./litmus/application-installation.sh

Now, coming to the issue here, If you have worked with a cloud-native application, you know it takes time for all pods to come in the Running state. So, we need to have some logic to wait for all pods to be ready before starting testing, otherwise, your tests will fail if the application is not ready.

We will extend our installation script to have some validation checks for application as well as to wait for all pods to be ready like we did for our k8s cluster.

#application-installation.sh
#!/bin/bash

function install() {

    echo -e "\n---------------Installing Litmus-Portal in Cluster Scope----------\n"
    curl https://raw.githubusercontent.com/litmuschaos/litmus/master/litmus-portal/cluster-k8s-manifest.yml --output litmus-portal-setup.yml

    kubectl apply -f litmus-portal-setup.yml
}

function wait_for_portal_to_be_ready(){

    echo -e "\n---------------Pods running in litmus Namespace---------------\n"
    kubectl get pods -n litmus

    echo -e "\n---------------Waiting for all pods to be ready---------------\n"
    # Waiting for pods to be ready (timeout - 360s)
    wait_for_pods litmus 360

    echo -e "\n------------- Verifying Namespace, Deployments, pods and Images for Litmus-Portal ------------------\n"
    # Namespace verification
    verify_namespace litmus

    # Deployments verification
    verify_all_components litmusportal-frontend,litmusportal-server litmus

    # Pods verification
    verify_pod litmusportal-frontend litmus
    verify_pod litmusportal-server litmus
    verify_pod mongo litmus
}

install
wait_for_portal_to_be_ready

Above bash-script will install or application as well as will make sure that the application is ready for testing by validating the status of pods as well as the existence of different resources like pods/deployments/namespaces.

If you are testing an static website/application, you can use

npm start &

And for docker-compose based applications -

docker-compose up &

Accessing the Application

Till now, we have automated the configuration of the Kubernetes cluster as well as our application setup. The next step is to automate the part, where we get access URL for the application & use it for E2E-Testing.

So, if you have already worked with k8s, you must already know that we can access an application on k8s using multiple types of services or even using port-forwarding.

Let's start with port-forwarding -

So, for accessing our application using port-forwarding, we can use this command -

kubectl port-forward svc/litmusportal-frontend-service 9091:9091 -n litmus

This will open a port-forwarded pipe from the port (9091) on the host machine to the port (9091, where our frontend pod is listening) of the pod deployed on a k8s cluster and allow us to access our frontend on port 9091 on localhost.

We can test the same by opening http://localhost:9091 & checking if we can access our frontend on the browser.

Well, this will work fine, but there is one issue.

If you run the above command, then you will no longer be able to access that terminal until and unless we stop the port-forwarding, this will hang our pipeline as this command will never complete.

That's where the ampersand symbol (&) comes to help. If we run the same command accompanied with a &, then it will start the port-forwarding process in the background & continue with other commands.

Well, we added the same in our pipeline & it started working fine for us.

Running Tests with generated Access Endpoint -

Finally, This part is easy with Cypress. As we are using Github-Actions, Cypress also provides a github-action, which can be used in the pipeline workflow very easily. Let's take a look at the step for the same.

- name: Running basic tests (Login and Onboarding Tests)
  uses: cypress-io/github-action@v2
  continue-on-error: false
  with:
    spec: cypress/integration/Basic_Setup/**/*.spec.js
    working-directory: Cypress/
    config-file: cypress.prod.json
    env: true
  env:
    CYPRESS_BASE_URL: ${{ env.URL }}

If we closely look at the above snippet, we can see that we are using cypress-io/github-action@v2 provided by Cypress & we are also providing some configurations.

Here,

spec - For the tests files we want to run, we can give the exact path or regex.
working-directory - Root directory where Cypress is installed or set up.
config-file - configuration file we want to use for testing. This is beneficial when we have multiple environments we want to test. In our case, We provide 2 modes of installation of LitmusChaos, so we created 2 different configurations for the same.

Last environment variables, We are providing BASE_URL as env CYPRESS_BASE_URL. This helps us not to hardcode the Access endpoint in our tests and take it according to the environment, so the same tests can be used for different environments.

With this setup, we completed our minimal complete automated testing pipeline.

Now, as a bonus, as I had said in starting that we are going to use the K3d cluster, Let's also look at why we chose to do so.

As we were using port-forwarding for accessing our frontend in the pipeline, there was one issue that we faced. Our tests were having little flakiness. After looking into the issue, we found the culprit. The tests were fine, but the port-forwarding pipe was getting closed in between sometimes. That was because of large chunks getting transferred through the pipe while testing through the frontend. The port-forwarding pipe is having a limit for timeout as well as the size of data/chunks getting transferred. Take a look at this issue for more details.

This was the time, we shifted from port-forwarding & started looking into other approaches like accessing the same using NodePort/LoadBalancer/Ingress from the k8s cluster while testing.

Now, for automating the above, we created custom functions in the bash script itself and used the same.

# Function to get Access point of ChaosCenter based on Service type(mode) deployed in given namespace
function get_access_point(){
    namespace=$1
    accessType=$2

    if [[ "$accessType" == "LoadBalancer" ]];then

        kubectl patch svc litmusportal-frontend-service -p '{"spec": {"type": "LoadBalancer"}}' -n ${namespace}
        export loadBalancer=$(kubectl get services litmusportal-frontend-service -n ${namespace} -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
        wait_for_pods ${namespace} 360
        wait_for_loadbalancer litmusportal-frontend-service ${namespace}
        export loadBalancerIP=$(kubectl get services litmusportal-frontend-service -n ${namespace} -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
        export AccessURL="http://$loadBalancerIP:9091"
        wait_for_url $AccessURL
        echo "URL=$AccessURL" >> $GITHUB_ENV

    elif [[ "$accessType" == "Ingress" ]];then

        setup_ingress ${namespace}
        # Ingress IP for accessing Portal
        export AccessURL=$(kubectl get ing litmus-ingress -n ${namespace} -o=jsonpath='{.status.loadBalancer.ingress[0].ip}' | awk '{print $1}')
        echo "URL=http://$AccessURL" >> $GITHUB_ENV

    else 
        # By default NodePort will be used. 
        export NODE_NAME=$(kubectl -n ${namespace} get pod  -l "component=litmusportal-frontend" -o=jsonpath='{.items[*].spec.nodeName}')
        export NODE_IP=$(kubectl -n ${namespace} get nodes $NODE_NAME -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
        export NODE_PORT=$(kubectl -n ${namespace} get -o jsonpath="{.spec.ports[0].nodePort}" services litmusportal-frontend-service)
        export AccessURL="http://$NODE_IP:$NODE_PORT"
        echo "URL=$AccessURL" >> $GITHUB_ENV

    fi
}

Now, we called this function in our installation script and took accessType as env in bash script & passed it to this function.

If you are interested in learning more about the different custom functions that we used as in the above bash script, you can take a look at this utility bash script & also complete pipeline configuration can be found here

Conclusion

Feel free to check out our ongoing project - Chaos Center and do let us know if you have any suggestions or feedback regarding the same. You can always submit a PR if you find any required changes.

Make sure to reach out to us if you have any feedback or queries. Hope you found the blog informative!

If chaos engineering is something that excites you or if you want to know more about cloud-native chaos engineering, don’t forget to check out our Litmus website, ChaosHub, and the Litmus repo. Do leave a star if you find it insightful. 😊

I would love to invite you to our community to stay connected with us and get your Chaos Engineering doubts cleared.
To join our slack please follow the following steps!

Step 1: Join the Kubernetes slack using the following link: https://slack.k8s.io/

Step 2: Join the #litmus channel on the Kubernetes slack or use this link after joining the Kubernetes slack: https://slack.litmuschaos.io/

Cheers!

litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

LitmusChaos

Open Source Chaos Engineering Platform

Read this in other languages.

🇰🇷 🇨🇳 🇧🇷 🇮🇳

Overview

LitmusChaos is an open source Chaos Engineering platform that enables teams to identify weaknesses & potential outages in infrastructures by inducing chaos tests in a controlled way. Developers & SREs can practice Chaos Engineering with LitmusChaos as it is easy to use, based on modern Chaos Engineering principles & community collaborated. It is 100% open source & a CNCF project.

LitmusChaos takes a cloud-native approach to create, manage and monitor chaos. The platform itself runs as a set of microservices and uses Kubernetes custom resources (CRs) to define the chaos intent, as well as the steady state hypothesis.

At a high-level, Litmus comprises of:

Chaos Control Plane: A centralized chaos management tool called chaos-center, which helps construct, schedule and visualize Litmus chaos workflows
Chaos Execution Plane Services: Made up of a…

View on GitHub

DEV Community