This is a quick tutorial on how to get started with LitmusChaos 2.0 in Azure Kubernetes Services. We will first create an AKS Cluster, followed by Installing LitmusChaos 2.0 on the cluster and then executing a simple pre-defined chaos workflow using LitmusChaos.
LitmusChaos is a toolset to do cloud-native chaos engineering. It provides tools to orchestrate chaos on Kubernetes to help SREs find weaknesses in their deployments. SREs use Litmus to run chaos experiments initially in the staging environment and eventually in production to find bugs, vulnerabilities. Fixing the weaknesses leads to increased resilience of the system.
Litmus takes a cloud-native approach to create, manage and monitor chaos. Chaos is orchestrated using the following Kubernetes Custom Resource Definitions (CRDs):
ChaosEngine: A resource to link a Kubernetes application or Kubernetes node to a ChaosExperiment. ChaosEngine is watched by Litmus’ Chaos-Operator which then invokes Chaos-Experiments
ChaosExperiment: A resource to group the configuration parameters of a chaos experiment. ChaosExperiment CRs are created by the operator when experiments are invoked by ChaosEngine.
ChaosResult: A resource to hold the results of a chaos experiment. The Chaos-exporter reads the results and exports the metrics into a configured Prometheus server.
If you feel lazy to install them, you can always use the Azure Cloud Shell, it already has the tools installed.
The first step to installing LitmusChaos on an AKS Cluster is to have an AKS Cluster. So let’s do that. Open Azure Portal and then log in with your account. You will be presented with the home screen. Now search for Kubernetes services and open it.
To create a cluster, click on create Create option in the menu and then select Create a Kubernetes cluster
Now you have to fill in details about what kind of cluster you want to create. Since Azure doesn’t charge for cluster management, you will only have to pay for the Node Instance you will be running. Fill in the name of the cluster, it can be anything, also create a new Resource Group if you haven’t. You can keep other settings as it is, or if you know what they do, can change it according to your need. For the Node Pool, select a B2ms size that has 2 vCPUs and 8 GiB of RAM and set the Node Count to 1 as we only want to run LitmusChaos, this will suffice for it. Although you are free to choose your configuration, keeping a minimum of 2 vCPUs and 8 GiB of RAM will help in seamless running. Remember to check that the Scale Method is set to Manual to keep a check on the cost.
You can skip the rest of the configurations for now and directly click on Review + Create which will start the creation of Cluster. It will take around 5–10 minutes, so you can sit back for some time, grab a glass of water, also read about ChaosEngineering and LitmusChaos 2.0
The cluster is ready and you can now install LitmusChaos on it. You can use the Azure Cloud Shell or your local system terminal to connect to the Cluster, the steps are the same for both. I personally prefer using my local system so I will use that for this tutorial.
Open your cluster and click on the Connect button, this will show you two commands to run. Copy the two commands and run them one by one. The first command sets the account as per the subscription id provided, and the second command fetches the credentials for the specific resource.
Now you have the credentials to access the Cluster, you can go ahead and install LitmusChaos 2.0 and do some chaos. For installation, I will be following their docs. There are two ways to install, one is by using helm, other is by applying the manifest file. I will follow the helm repo procedure, you can follow the other one if you want by going through their docs.
Note: You will need to have Helm installed on your system. You can refer from here
First, you will add the LitmusChaos Helm repository and then confirm that litmuschaos is present in the helm repository
helm repo add litmuschaos [https://litmuschaos.github.io/litmus-helm/](https://litmuschaos.github.io/litmus-helm/) helm repo list
Next, you will create the namespace, by default we use *litmus *as the namespace name, you are allowed to use any name of your choice, just remember to change it in the following commands.
kubectl create ns litmus
Now, let’s install LitmusChaos using the helm repository you just added.
helm install chaos litmuschaos/litmus-2–0–0-beta --namespace=litmus --devel --set portalScope=namespace
Note: If you are using helm2, you will have to run this command
helm install --name chaos litmuschaos/litmus-2–0–0-beta --namespace=litmus --devel --set portalScope=namespace
The final step is to install the LitmusChaos CRDs
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/master/litmus-portal/litmus-portal-crds.yml
Let’s verify that all the services are running, and there has been no issue
The services are running properly but there is one more change that you need to do, since AKS doesn’t provide public-IP to nodes by default, we need to change the litmusportal-frontend-service to a LoadBalancer service. You can do that by editing the service.
kubectl edit svc litmusportal-frontend-service -n litmus
At the very end inside spec there is type: NodePort, you have to change it to type: LoadBalancer
spec: clusterIP: xxxxxxx externalTrafficPolicy: Cluster ports: — name: http nodePort: xxxxx port: 9091 protocol: TCP targetPort: 8080 selector: app.kubernetes.io/component: litmus-2–0–0-beta-frontend sessionAffinity: None # Change the type here from NodePort to LoadBalancer type: LoadBalancer
Then save it, and list the services again. The External-IP might show pending for a minute, run the command again after a minute to get the IP.
Change the with what is showing to you for the litmusportal-frontend-service and then visit the address in your browser.
Ta-da! We are done with the installation of LitmusChaos 2.0 and now you can run a workflow. Login to the portal, the default credentials are
username: admin password: litmus
It will ask you to set a new password, and then log in to the dashboard.
Note: Other than changing the frontend service to LoadBalancer, there is another way to make it work with NodePort by enabling public IP for individual nodes. I have not covered it in this article, but feel free to check it out here.
Let’s see some chaos happening now. LitmusChaos comes with few predefined workflows, which setups a service and then wreak havoc in them. We will be running a podtato-head workflow, which creates a simple deployment and then injects the pod-delete experiment into it.
On the dashboard select Schedule a Workflow. In the Workflows dashboard, select the Self-Agent and then click on Next. In the next screen, select Create a Workflow from Pre-defined Templates and then select podtato-head and then click on Next.
On the next screen, you can define the Experiment name, description, and namespace, leave the default values and click on Next.
On this screen, you can tune the workflow by editing the experiment manifest and adding/removing or arranging the experiments in the workflow. The podtato-head template comes with its own defined workflow so simply click on Next.
The next screen is to adjust the weights of the experiment on the reliability score since you are running only one experiment, you can keep any value, in the case where you have multiple experiments running, you can set the importance of each experiment according to your requirements to get a meaningful reliability score. For now, click on Next and select Schedule now, you can also create a recurring schedule if you want the experiment to keep running at certain intervals. The final screen is to confirm the workflow and schedule it. Click on Finish to run the workflow
Yay! The workflow is created and is running now. Click on Go to Workflow, which will take you to the workflow screen, here you can see all your scheduled workflows. Click on the workflow to see its status.
The workflow will take some minutes to run, you can take a break until then. Meanwhile, you can join the LitmusChaos community on slack to stay updated with new releases and get help from the community.
The workflow run is now complete, you can access the workflow details using the graph view or the table view.
Click on View Logs & Results to check out the logs and chaos results for the experiment
And we are done. You were able to create an AKS Cluster, install LitmusChaos 2.0 on it, log in to the LitmusChaos Portal and then finally schedule a Workflow.
I hope you enjoyed this journey and found the blog interesting. You can leave your queries or suggestions (maybe appreciation as well) in the comments below.
Thank you for reading
Software Engineer at ChaosNative and a final year CSE student