How to run Azure Instance Stop Experiment in LitmusChaos

#kubernetes #litmuschaos #azure #vm

This article is a guide for setting up and running the Azure Instance Stop experiment on LitmusChaos 2.0. The experiment causes the power off of one or more azure instance(s) for a certain chaos duration and then power them on. The broad objective of this experiment is to extend the principles of cloud-native chaos engineering to non-Kubernetes targets while ensuring resiliency for all kinds of targets, as a part of a single chaos workflow for the entirety of a business.

Currently, the experiment is available only as a technical preview in the chaos hub, so we will have to use the master branch of the chaos hub to access it.

Pre-Requisites

To run this experiment, we need a few things beforehand

An Azure account
A Virtual Machine Scale Set (or an Instance only)
A Kubernetes cluster with LitmusChaos 2.0 installed (you can follow this blog to set up LitmusChaos 2.0 on AKS — Getting Started with LitmusChaos 2.0 in Azure Kubernetes Service)

Setting up Azure Credentials as Kubernetes Secret

To let LitmusChaos access your Azure instances, you need to set up the azure credentials as a Kubernetes secret. It is a very simple process, first, you need to install Azure CLI (if you already haven’t) and log in to it. Now run this command to get the azure credentials saved in an azure.auth file.

az ad sp create-for-rbac — sdk-auth > azure.auth

Next, create a secret.yaml file with the following content. Change the content inside azure.auth with the contents inside your azure.auth file

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  azure.auth: |-
    {
      "clientId": "XXXXXXXXX",
      "clientSecret": "XXXXXXXXX",
      "subscriptionId": "XXXXXXXXX",
      "tenantId": "XXXXXXXXX",
      "activeDirectoryEndpointUrl": "XXXXXXXXX",
      "resourceManagerEndpointUrl": "XXXXXXXXX",
      "activeDirectoryGraphResourceId": "XXXXXXXXX",
      "sqlManagementEndpointUrl": "XXXXXXXXX",
      "galleryEndpointUrl": "XXXXXXXXX",
      "managementEndpointUrl": "XXXXXXXXX"
    }

Now run the following command. Remember to change the namespace if you have installed LitmusChaos in any other namespace

kubectl apply -f secret.yaml -n litmus

Updating ChaosHub

As the experiment is only available as a technical preview right now, we will have to update the ChaosHub to use the technical preview (master) branch.

Now change the branch to “master”.

Click on Submit Now and the ChaosHub will now show the Azure Instance Stop experiment.

Scheduling the Experiment Workflow

Now move to the Workflows section and click on Schedule a Workflow. Select the Self-Agent (or any other one if you have multiple agents installed) and click on Next.

Select the third option to create a workflow from experiments using ChaosHub. Click on Next.

Click Next again (or edit the workflow name if you want to) and now on the Experiments page, click on Add a new Experiment and select the Azure Instance Stop experiment.

Next click on Edit YAML, you will now have to add the Instance Name(s) and Resource Group name in the ChaosEngine environments. Scroll down to the ChaosEngine artefacts, where you will see the environment variables, set the values accordingly. If you are injecting chaos on a Scale Set, set the SCALE_SET to “enable”. Save the changes and schedule your workflow

Note: You need to provide the instance name from the Virtual Machine Scale Set section for Azure AKS nodes, not from the AKS node pools section.

Azure Virtual Machine Scale Set Instances section

Observing the Experiment Run

Great, now your workflow is running and you can check it out, click on Go to Workflow and then select your workflow.

You can check the status of your instance in the Azure Portal to verify that the experiment is working as expected.

You can also click on azure-instance-stop to view the experiment logs. After the given chaos duration, the experiment will automatically power on the instance(s), and it will give a pass/fail verdict. In case the experiment fails, verify through the logs and portal that the instances have started.

This was it, you have successfully run the Azure Instance Stop experiment using LitmusChaos 2.0 Chaos Center.

In this blog, we saw how we can perform the Azure Instance Stop experiment using LitmusChaos 2.0. You can learn more about this experiment from the docs. This experiment is one of the many experiments Non-Kubernetes experiments in LitmusChaos, including experiments for AWS, GKS, VMWare, which are targeted towards making Litmus an absolute Chaos Engineering toolset for every enterprise regardless of the technology stack used.

You can join the LitmusChaos community on Github and Slack. The community is very active and tries to solve queries quickly.

I hope you enjoyed this journey and found the blog interesting. You can leave your queries or suggestions (appreciation as well) in the comments below.

Show your ❤️ with a ⭐ on our Github. To learn more about Litmus, check out the Litmus documentation. Thank you! 🙏

Thank you for reading

Akash Shrivastava

Software Engineer at Harness

Linkedin | Github | Instagram | Twitter