In this blog, I will be talking about setting up a quick demo environment for Litmus. Before jumping in, let's do a quick recap on Litmus. Litmus is a framework for practicing chaos engineering in cloud-native environments. Litmus provides a chaos operator, a large set of chaos experiments on its hub, detailed documentation, and a friendly community. Litmus is very easy to use but a quick demo environment to install Litmus, run experiments, and learn chaos engineering on Kubernetes will be of course cooler.
Litmus Demo Environment
The Litmus Demo is a quick way to introduce yourself to the world of Cloud-Native Chaos Engineering. It helps you to familiarize yourself with running LitmusChaos experiments in a realistic application environment running multiple services on a Kubernetes cluster. By following the instructions in this blog, you will be able to create a cluster, install a sample application that will be subjected to chaos, build the chaos infrastructure & run the chaos experiments, all in a matter of minutes.
When, as a community, we were pondering about a quick, lightweight environment to achieve this automated “Demo” setup, we just couldn’t look beyond KinD (cluster infra) & the immensely popular Sock-Shop (sample microservices application). However, the demo script also provides for a more “extensive” platform in GKE, in case you want to explore the larger suite of experiments.
The idea here, as you may have guessed, is to get down to experiencing chaos injection without brooding over documentation & copy-pasting multiple Kubernetes manifests. This demo is designed to help get your hands dirty on injecting failures using a chaos framework & finding out what happens and not so much about inculcating in you, in-depth knowledge of standard practices around chaos engineering. We are hoping this will get you there eventually!
LitmusChaos PreRequisites
Docker, Kubectl & Python3.7+ (with the PyYaml package) are all you need for running the KinD platform based chaos demo. If GKE is your platform choice, you may need to configure gcloud on your workstation (or test-harness machines, if you are old fashioned!).
Know Your Cluster
As described earlier, this demo environment supports different platforms (KinD, GKE).
KinD: A 3 node KinD cluster pre-installed with the sock-shop demo application, litmus chaos CRDs, operator, and a minimal set of prebuilt chaos experiment CRs is setup.
GKE: A 3 node GKE cluster pre-installed with the sock-shop demo application, litmus chaos CRDs, operator, and the full generic Kubernetes chaos experiment suite is setup.
NOTE:The support for other platforms like AWS will be added very soon
Getting started:
To get started with any of the above platforms, we will follow the following steps.
1. Clone Litmus demo repository in your system.
This command will clone the master branch of the litmus demo repository in your system.
git clone https://github.com/litmuschaos/litmus-demo.git
2. Check out the available options on the demo script
cd litmus-demo
./manage.py -h
Output:
usage: manage.py [-h] {start,test,list,stop} ...
Spin up a Demo Environment on Kubernetes.
positional arguments:
{start,test,list,stop}
start Start a Cluster with the demo environment deployed.
test Run Litmus ChaosEngine Experiments inside litmus demo
environment.
list List all available Litmus ChaosEngine Experiments
available to run.
stop Shutdown the Cluster with the demo environment
deployed.
optional arguments:
-h, --help show this help message and exit
So, we can see that the available arguments are start, test, list, and stop with their usage.
3. Check the available experiments on the desired platform
You can also view the experiments supported by the platform, as of today (The folks in the litmus community are busy extending the platform interoperability for the chaos experiment, so stay tuned!).
For KinD platform
./manage.py list --platform kind
NOTE: The default value of --platform
is kind so ./manage.py list
will also give the same output.
OUTPUT:
Available Litmus Chaos Experiments:
1. pod-delete
2. container-kill
3. node-cpu-hog
4. node-memory-hog
For GKE platform
./manage.py list --platform GKE
OUTPUT:
Available Litmus Chaos Experiments:
1. container-kill
2. disk-fill
3. node-cpu-hog
4. node-memory-hog
5. pod-cpu-hog
6. pod-delete
7. pod-memory-hog
8. pod-network-corruption
9. pod-network-latency
10. pod-network-loss
4. Installing Demo Environment
Install the demo environment using one of the platforms with start argument:
- KinD Cluster:
./manage.py start --platform kind
Once done, wait for all the pods to get in a ready state. You can monitor this using.
watch kubectl get pods --all-namespaces
Now when all pods come into Running state, we can access the sock-shop application through web-ui, which will help us to visualize the impact of chaos on the application and whether the application persists after chaos injections.
Follow the given steps to access through web-ui.
- Get the port of frontend deployment:
kubectl get deploy front-end -n sock-shop -o jsonpath='{.spec.template.spec.containers[?(@.name == "front-end")].ports[0].containerPort}'
OUTPUT:
8079
- Perform port forwarding on the port obtained above
kubectl port-forward deploy/front-end -n sock-shop 3000:8079
OUTPUT:
Forwarding from 127.0.0.1:3000 -> 8079
Forwarding from [::1]:3000 -> 8079
Copy the IP to a web browser and get the web-ui of sock-shop using a kind cluster.
- GKE CLuster
./manage.py start --platform GKE
Once done, you will get an output containing Ingress Details:
Ingress Details:
** RUNNING: kubectl get ingress basic-ingress --namespace=sock-shop
NAME HOSTS ADDRESS PORTS AGE
basic-ingress * 34.120.15.86 80 95s
You can access the web application in a few minutes at http://34.120.15.86
After waiting for a few minutes, you can check the given URL (here http://34.120.15.86) it will take you to the sock-shop catalog UI on the localhost browser. This makes sure that the sock-shop application is deployed in the cluster-infra. The sock-shop catalog UI will look like:
To check all pods are ready, run the following command and check the status of all pods which should be in Running state.
watch kubectl get pods --all-namespaces
5. Running Chaos in Demo Environment
We can run all the chaos experiments we get from ./manage.py list
or we can also run selective experiments that only we want to perform using ./manage test
as shown:
For running all experiments
./manage.py test --platform <platform-name>
For running selective experiments
./manage.py test --platform <platform-name> --test <test-name>
Follow step 3 to get the available test-name.
Example: For running the pod-delete experiment.
**./manage.py test --platform kind --test pod-delete **
Take this opportunity to view what happens as the experiment runs its course. Pods may be killed; containers may be restarted, the resource usage may spike up - all impacting the application in its own unique way. In a typical run of the chaos experiment, a hypothesis is built around each of these failures. While we discuss this in detail in upcoming blogs, the endeavor in our case is to find out if the sock-shop app continues to stay available across these failures!!
The experiment results (Pass/Fail) are derived based on the simple criteria of app availability post chaos & are summarized on the console once the execution completes.
Get more details about the flags used to configure and run the chaos tests please refer to the param tables in the test section.
6. Deleting Cluster / Cluster Clean up
To shut down and destroy the cluster when you're finished, run the following commands:
KinD cluster
./manage.py --platform kind stop
GKE cluster
./manage.py --platform GKE stop --project {GC_PROJECT}
Conclusion
Did you get a taste of chaos in a Kubernetes environment? Did you end up killing different microservices to gauge the impact? Are you inclined to understand what happened when the failures were injected & what options the chaos framework provides to control & assess these? Did you make a mental map of how your microservices might behave if the same failures were injected on them? And about how to mitigate the impact of these failures?
If you asked yourself these questions, then the purpose of this litmus-demo is met. It also means you are onwards and upwards in your chaos engineering journey!!
Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you?
Join Our Community On Slack For Detailed Discussion, Feedback & Regular Updates On Chaos Engineering For Kubernetes: https://kubernetes.slack.com/messages/CNXNB0ZTN
(#litmus channel on the Kubernetes workspace)
Check out the Litmus Chaos GitHub repo and do share your feedback: https://github.com/litmuschaos/litmus
Submit a pull request if you identify any necessary changes.
litmuschaos / litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
Open Source Chaos Engineering Platform
Read this in other languages.
Overview
LitmusChaos is an open source Chaos Engineering platform that enables teams to identify weaknesses & potential outages in infrastructures by inducing chaos tests in a controlled way. Developers & SREs can practice Chaos Engineering with LitmusChaos as it is easy to use, based on modern Chaos Engineering principles & community collaborated. It is 100% open source & a CNCF project.
LitmusChaos takes a cloud-native approach to create, manage and monitor chaos. The platform itself runs as a set of microservices and uses Kubernetes custom resources (CRs) to define the chaos intent, as well as the steady state hypothesis.
At a high-level, Litmus comprises of:
- Chaos Control Plane: A centralized chaos management tool called chaos-center, which helps construct, schedule and visualize Litmus chaos workflows
- Chaos Execution Plane Services: Made up of a…
Top comments (1)
Link in the article is broken.
Docmentation link docs.litmuschaos.io/docs/getstarted/
Does not work