I am feeling delighted to write this blog on the eve of the announcement of LitmusChaos 2.0. We had spent a bit more than a year planning on adding additional features and making the user journey better and simpler. While more requirements and improvements are flowing in, we are encouraged by the possibilities and feedback. The principles and features of 2.0 have been solidified over many months of beta.
Let me put the thoughts before making LitmusChaos 2.0. We wanted to have a clear distinction between the existing 1.x and 2.x versions. The items which we concentrated are as mentioned below.
- Provide a single pane for orchestration ChaosCenter. This piece of LitmusChaos is not just, and it’s built upon the API layer. Keeping any user interaction can be achieved through this either through UI, API or LitmusCTL. Hold on. I will be covering LitmusCTL later on this blog.
- Didn’t want to deviate from the principle on how the Chaos Engineering Platform should be built and stick the features in alignment with these principles.
- Users should be able to share or collaborate the workflow with others
- GitOps enabled, i.e. the user should be able to source control the workflow or execute workflow based on the changes in the environment
- Open Observability, able to observe chaos injection impact using their existing observability tools. Support Non-K8S Chaos from Litmus and enable it to schedule it the same way as other workflows.
- Central repo from which users can consume or contribute the experiments. ChaosHub.
- For Novice have a predefined workflow so that they can consume the ChaosEngineering on a Standard microservices reference architecture like Sock-Shop, PodatoHead Application and Bank of Anthos.
Now let’s briefly explore the new feature additions in LitmusChaos 2.0.
Let us consider the life cycle of Chaos Engineering in 4 simple steps as depicted in the diagram. And let us increase a bit more complexity in your environment. You want to manage the multiple clusters and cross-cloud or do it across various teams. You need an interface, i.e. to do either using UI or API. To answer this, we devised ChaosCenter.
- A chaos control plane or portal provides centralised management and orchestration of chaos operations on multiple clusters across data centres/clouds. The control plane carries out experiments through agents installed on the registered clusters.
- Comprises documented APIs to invoke chaos programmatically.
- Provides visualisation capabilities and analytics
- Supports an org-project-teams-users structure to enable collaboration within teams for chaos operations.
- Introduces chaos workflows - to (a) automate dependency setup (b) aid creation of complex chaos scenarios with multiple faults (c) support definition of load/validation jobs along with chaos injection
- Provides flexibility in creating/running workflows in different ways - via templates, an integrated hub, and custom uploads.
- Supports setup (control plane & agents) and execution of chaos experiments in both: cluster-scoped and namespace-scoped modes to help operations in shared clusters with a self-service model
LitmusCTL is a command-line tool for advanced users who would want to perform operations in the command line. At present, this has support for registering Chaos agents to ChaosCenter.
*Open Observability & Steady State Hypothesis Validation *
- Provides an increased set of Prometheus metrics with additional filters - that can be used for instrumenting application dashboards to observe chaos impact
- Provides a diverse set of probes to automate validation of steady-state hypothesis - thereby improving the efficiency of running automated chaos experiments
GitOps for Chaos
- Integrates with Git-based SCM to provide a single-source-of-truth for chaos artifacts (workflows). Changes are synchronised bi-directionally b/w the git source and the ChaosCenter - thereby pulling the latest artifact for execution.
- Provides an event-tracker microservice to automatically launch “subscribed” chaos workflows upon app upgrades affected by GitOps tools like ArgoCD, Flux.
- Adds experiments to inject chaos on infrastructure (cloud) resources such as VMs/instances and disks (AWS, GCP, Azure, VMWare) - irrespective of whether they host the Kubernetes clusters or not.
- Introduces chaos experiments to bring down bare metal nodes that provide IPMI-based out-of-band access.
We invite you to discuss GitHub or Slack with us and provide us with your requirements to improve Litmus as a community project.
That's all, folks. Thank you for reading it till the end. I hope you had a productive time learning about Litmus, and we hope you are as excited as we are about the upcoming features/additions to Litmus.
Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you?
Join Our Community On Slack For Detailed Discussion, Feedback & Regular Updates On Chaos Engineering For Kubernetes: https://slack.litmuschaos.io (#litmus channel on the Kubernetes workspace)
Top comments (1)
I'd strongly recommend introducing such articles with a single paragraph describing what Litmus is and linking to more information. I started this article, skimmed it, and still have no idea what Litmus even is.