DEV Community

Cover image for What COVID-19 Taught the Cyber Security Industry: Security Chaos Engineering
Kennedy for AWS Community Builders

Posted on • Updated on

What COVID-19 Taught the Cyber Security Industry: Security Chaos Engineering


The COVID-19 pandemic has significantly changed our lifestyles, the impact is far-reaching and definitely unforgettable. Our view of what constitutes humanity e.g. business, education, healthcare has been redefined. Ultimately, rapid adaptation has been the ubiquitous option, paving the way for emerging and innovative solutions to gain unparalleled adoption. However, the most prominent of these innovative solutions is the ecosystem surrounding the accelerated development, testing, distribution and administration of COVID-19 vaccines.


Fig. 1 The Impact of Vaccines on Measles

Talking about vaccines, the truth is the underlying concepts of vaccinology date back to the 17th century when Buddhist monks drank snake venom to confer immunity from snake bites. However, scientific methods were first applied in the 18th century, by Edward Jenner. Since then, vaccinology has matured dramatically, highly influenced by the need to eradicate diseases. Clearly, vaccines are core fundamentals for human resilience to diseases. For example, newly born babies get a shot of hepatitis B within 12 hours after birth to increase their chances to survive (resilience) or at least lead a normal, healthy life. Through this approach and several similar related methods, several diseases e.g measles (Fig 1) have been eradicated.

Chaos Engineering

The most important lesson drawn from vaccines is that true resilience is achieved by introducing chaos (fake chaos) into a stable system to identify and mitigate the hidden problems (real chaos). This is the theory behind vaccines, by introducing non-malicious substances into biological systems, the natural immune system is boosted and hardened to overcome diseases. This simple idea that has enabled human resilience to diseases over several centuries is the fundamental theory behind chaos engineering.

Netflix pioneered chaos engineering to overcome the digital pandemic that threatened the survival of the streaming business in the aftermath of migrating to AWS. The decision to adopt chaos engineering, despite sounding crazy at that time, helped Netflix survive several AWS outages. This is indeed similar to how humanity has survived epidemics via vaccination. In recent years, chaos engineering has gained traction as enterprises face different kinds of digital pandemics on the path to digital transformation. According to Werner Vogels (CTO@ Amazon), “Failures are a given, and everything will eventually fail”, so literally failures are bound to occur, the question is when and how we prepare to prevent or minimize the resulting impact. Chaos engineering proffers options for addressing these concerns, by proactively experimenting on systems. Consequently, several chaos engineering products and services have emerged, including offerings from cloud service providers e.g. the AWS Fault Injection Simulator.

Security Chaos Engineering

Failures often manifest with security implications by impacting confidentiality, availability and integrity. Such security failures are majorly man-made aka cyber-attacks or can be due to human errors or misconfigurations. Due to the rapid adoption of digital systems, security failures are increasing at a monumental pace. The COVID-19 pandemic is further exacerbating these cybersecurity challenges as enterprises strive to be technology-driven. However, traditional cyber-security mechanisms are struggling to grapple with the emerging security challenges. On the one hand, humans operators are incapable of completely and continuously maintaining clear mental models of modern systems. This is is a prerequisite for designing efficient security systems, furthermore, these modern systems require more innovative security mechanisms due to new operating models that differ from traditional systems. The bulk of the emerging technologies are cloud-native, increasingly complex and dynamic, thus offering little or no opportunity for gate-keeping style security mechanisms. Therefore, innovative cyber security solutions are imperative to overcome these evolving security challenges.

Fig 2. Attackers view cloud-native systems as a single attack objective.

Security chaos engineering is an emerging sub-discipline, pioneered by Aaron Rinehart to potentially addresses the aforementioned challenges. It leverages fault injection techniques to detect and mitigate security vulnerabilities especially in complex systems e.g. cloud-native infrastructure. The same chaos engineering techniques are employed while focusing on security attributes. The state-of-the-art cloud-native security systems focus on multiple abstraction layers commonly referred to as the 4Cs of cloud-native security: code, containers, cluster and cloud. Most cloud-native security systems are designed to tackle security issues emerging from at least two of these layers. Effectively, these layers are treated as silos, due to the logical barriers and existing complexities, unfortunately, this viewpoint doesn’t capture or represent attackers’ view. Attackers literally see a single system (a unified attack objective) or otherwise plan to laterally move across the multiple abstraction layers regardless of the logical demarcations (Fig 2). Attacks orchestrating these kinds of multi-layered, lateral movements are becoming more commonplace in recent times.


Fig 3. Cloud-native security systems need to adopt an attacker-centric lens to have clear visibility across the 4Cs of cloud-native security.

Overcoming these security challenges requires the adoption of viewpoint, essentially, the emerging cloud-native security systems ought to enable unified visibility and operations across cloud-native systems, as illustrated in Fig.3. Security chaos engineering provides opportunities for achieving this objective, by injecting failures across the entire system, the resulting outcome enables clear visibility. The fault injection campaigns produce security insights suitable for detecting vulnerabilities, especially those with casual relationships across abstraction layers. Furthermore, unlike traditional security systems, the derived security insights are approx. 100% actionable, given vulnerabilities are detected prior to cyber attacks. Most cloud-native systems act in the opposite, due to their reactive nature, events occur before action is taken thereby affording attackers windows-of-opportunity (which can be over 200 days of attacker dwell-time). It is noteworthy that, the security insights gained via security chaos engineering campaigns can be added to other security systems to enrich intelligence, thus enhancing visibility and achieving proactiveness(Fig 4). imperative to overcome these evolving security challenges.


Fig 4. Integration of knowledge gained from Security Chaos Engineering into into a security architecture.

Final thoughts

Just as diseases are effectively tackled with vaccines, security chaos engineering offers opportunities for effectively overcoming cyber-attacks at various abstraction layers. This is one of the fundamental lessons the cybersecurity industry can derive from the COVID-19 pandemic: true resilience comes by constructive security fault injection! Security in the cloud-native era is more about adopting resilient postures than just being secure. By applying the concepts of security chaos engineering to cloud-native systems, there are opportunities for ensuring the security attributes are intact, by verifying that various security controls (predictive, detective, protective & corrective) function as expected. This approach leads to a proactive security methodology and affords security to be ahead of the game.

Top comments (0)