Chaos Engineering as a Service
re:Invent is usually a one-week extravaganza in Vegas the first week of December, and it is amazing to attend! So many events, sessions, information, swag, and PEOPLE. It's overwhelming and exciting and educational...
My favorite part of re:Invent, though, has always been the new service reveals at the keynote speeches. And even though re:Invent went virtual this year, those still did not disappoint.
As a fan of Chaos Engineering, and as someone who uses Chaos Engineering experiments as part of my system test process, I've been largely home-rolling the code to implement the experiments.
There's definite risks with creating my own experiments. Did I build in guardrails that will sufficiently protect my system from unanticipated ripple effects? Did I write a proper experiment kill-switch? Did I write an experiment that will render my system vulnerable to real failure because of something I didn't consider when crafting the experiment?
There are product offerings out there that are fantastic for providing the instrumentation and the protection you want to have in place for chaos experiments, of course. But if your company is anything like mine, they are a bit reluctant to invest a lot of money in application licensing, so those products are largely not an option.
AWS has a reputation for paying attention to the needs of their customers. And they paid attention in this case too! The Fault Injection Simulator is a new service that AWS revealed during re:Invent 2020, and I am SO EXCITED about it!
So what IS Fault Injection Simulator? Simply put, it is Chaos Engineering as a Service (CEaaS). The website says that it is coming in early 2021, which means any day now. What does CEaaS look like? According to the documentation, "AWS Fault Injection Simulator supports creating disruptive events across a range of AWS services, such as Amazon EC2, Amazon EKS, Amazon ECS, and Amazon RDS". The listed services are ones that are particularly vulnerable to failure, and that are often not designed for full resiliency to that failure.
You'll be able to inject faults into the system and see how it responds. You'll be able to define safety rails and stop experiment criteria that AWS will automatically apply when the triggers are met, which will help you prevent widespread ripple effects from your experiments.
Another benefit I am looking forward to exploring with the Fault Injection Simulator are the prebuilt templates that you can use to jumpstart your system's Chaos Experiments. These templates will represent common chaos experiments and will make adding Chaos Engineering into your system even easier.
I look forward to playing with this new service AWS is offering. Chaos Engineering is a powerful tool, and being able to incorporate it more easily and safely into our CI/CD pipelines, run the experiments consistently, and have full results documentation available after the conclusion of the experiments is something I am really excited about using!
Are you using Chaos Engineering experiments in your systems today? Will you be giving the Fault Injection Simulator a try? I'm watching the documentation page and will definitely be diving in as soon as it goes live! Once I start running it, I will post more about my experience with the service here.