DEV Community

Cover image for Chaos Engineering to fortify P&C Insurance business on Guidewire Cloud
Moulinath Chakrabarty
Moulinath Chakrabarty

Posted on

Chaos Engineering to fortify P&C Insurance business on Guidewire Cloud

Introduction
The Property and Casualty (P&C) insurance industry is undergoing a major transformation with the shift to cloud computing. Guidewire, a leading provider of software for P&C insurers, has transitioned to the cloud, which requires insurers to prioritize performance, resiliency, and security. As geopolitical events and climate risks increasingly threaten cloud stability, and with the constant threat from cyberattacks, P&C insurers need to adopt a forward-thinking approach to ensure the reliability of their cloud-based operations. One promising method is Chaos Engineering, which can help insurers proactively address potential disruptions and ensure the resilience and security of their operations on the Guidewire Cloud.

Chaos Engineering - From Netflix to Insurance
Chaos Engineering was pioneered by Netflix as a way to test the fault tolerance of their systems by deliberately introducing failures and observing the results. While the financial services and insurance industries are vastly different from Netflix, the shift to public cloud environments makes Chaos Engineering a valuable tool for ensuring the stability of cloud-based systems. Outages are costly, especially in high-risk industries like insurance. For example, IT downtime can cost large businesses up to $540,000 per hour, according to Gartner. As P&C insurers move to the cloud, they must ensure the resilience and security of their systems to avoid these costly disruptions.

Chaos Engineering for P&C Insurance on Guidewire Cloud
The implementation of Chaos Engineering in the P&C insurance industry should focus on several key areas:

  1. Resiliency: Building resilience into applications to mitigate system failures and ensure business continuity.
  2. Data Security: Continuously monitoring for potential security threats and ensuring the protection of sensitive financial data.
  3. Cost Efficiency: Designing policies that mitigate the costs associated with cloud sprawl and optimizing the use of cloud resources.
  4. Conformance: Ensuring that all teams follow consistent operating principles on cloud platforms, especially in an Agile/DevOps environment.
  5. Observability: Leveraging insights from system behavior to improve monitoring and fault detection.

Strategy for Implementing Chaos Engineering
The implementation of Chaos Engineering in the P&C insurance industry should start with a well-defined strategy that aligns with the unique needs of the industry and the specific architecture of the Guidewire Cloud.
A. Chaos Engineering Strategy: The strategy should be less focused on tools and more on the overall objective of building a culture of resilience. The goal is to proactively address potential failures by introducing controlled chaos into the system and observing the results. This approach encourages the development of more fault-tolerant applications.
B. Compatibility with Guidewire Cloud: Guidewire Cloud is powered by AWS, and the Chaos Engineering strategy must align with the architecture of Guidewire's platform-as-a-service (PaaS) layer on top of AWS infrastructure. The P&C insurers' strategy should consider the specific components and services provided by Guidewire Cloud to ensure compatibility and effectiveness.
C. Steps to Implement Chaos Engineering: The implementation process should follow these steps:

  1. Design Hypotheses:
    • Focus on scenarios that could impact critical business processes and customer experiences.
    • Conduct experiments that simulate failures in key systems like ClaimCenter, PolicyCenter, and the quote/bind/rating processes.
  2. Technical Approach:
    • Establish a framework that combines the best of Guidewire and open-source tools, focusing on resilience, data security, and observability.
    • Utilize AWS Fault Injection Simulator (FIS) for inducing real-world infrastructure faults and monitor the results using AWS CloudWatch.
    • Complement the AWS tools with open-source solutions like Chaos Toolkit, Chaos Mesh, and Gremlin to enhance the resilience of the Guidewire Cloud Platform (GWCP).
  3. Identify Metrics Monitoring Framework:
    • Monitor both business and technical metrics to assess the impact of induced faults.
    • Use Guidewire's Response Time Analysis Toolkit to analyze response times and identify fault-inducing parameters.
    • Leverage AWS and open-source tools to supplement Guidewire-specific metrics and ensure comprehensive monitoring.
  4. Measure Outcomes and Refine the Framework:
    • Start with simple experiments and gradually increase the complexity of the scenarios tested.
    • Analyze the results, refine the framework, and collaborate across teams to ensure continuous improvement.

Conclusion
P&C insurance is at a critical juncture, with cloud computing offering new opportunities for growth and innovation. However, with these opportunities come new risks, particularly related to system resilience and security. Chaos Engineering offers a proactive approach to addressing these risks, enabling insurers to build more robust and fault-tolerant systems on the Guidewire Cloud. By adopting a thoughtful and well-planned Chaos Engineering strategy, P&C insurers can ensure the stability of their cloud-based operations and protect their business from unforeseen disruptions.

Articles/posts referred to:
·      https://www.geektime.com/how-much-does-it-downtime-cost/
·      https://www.pingdom.com/outages/average-cost-of-downtime-per-industry/
·      https://www.itnews.com.au/news/nab-deploys-chaos-monkey-to-kill-servers-24-7-382285
·      https://www.guidewire.com/sites/default/files/media/pdfs/Guidewire_Cloud_data_sheet_en.pdf
·      https://medium.com/guidewire-engineering-blog/guidewire-cloud-why-hybrid-tenancy-is-the-right-choice-56a0ff176032
·      https://www.techtarget.com/searchcloudcomputing/definition/cloud-sprawl#:~:text=Cloud%20sprawl%20is%20the%20uncontrolled,over%20its%20cloud%20computing%20resources
·      https://medium.com/guidewire-engineering-blog/log-management-and-guidewire-cloud-platform-observability-73a033a34e9a
·      https://medium.com/guidewire-engineering-blog/guidewire-cloud-why-hybrid-tenancy-is-the-right-choice-56a0ff176032
·      https://medium.com/guidewire-engineering-blog/guidewire-cloud-why-hybrid-tenancy-is-the-right-choice-part-2-of-2-ba22c9888bb8
·      https://marketplace.guidewire.com/s/product/response-time-analysis-tool-for-insurancesuite-100x/01t3n00000GfL6AAAV?language=en_US
·      https://www.guidewire.com/blog/technology/expanding-your-companys-cloud-capabilities-with-garmisch/
·      https://documentation.solarwinds.com/en/success_center/observability/content/configure/services/java/guidewire-support.htm
·      https://aws.amazon.com/blogs/architecture/chaos-engineering-in-the-cloud/
·      https://aws.amazon.com/blogs/mt/chaos-engineering-leveraging-aws-fault-injection-simulator-in-a-multi-account-aws-environment/
·      https://aws.amazon.com/blogs/apn/improving-system-resilience-and-observability-chaos-engineering-with-aws-fis-and-aws-dlt/#:~:text=You%20can%20monitor%20performance%20metrics,be%20affected%20during%20chaos%20testing
·      https://aws.amazon.com/blogs/architecture/chaos-engineering-in-the-cloud/#:~:text=As%20Chaos%20Engineering%20should%20provide,be%20injected%20to%20your%20workload
·      https://aws.amazon.com/about-aws/whats-new/2023/11/aws-fault-injection-service-two-requested-scenarios/
·      https://docs.aws.amazon.com/fis/latest/userguide/what-is.html
·      https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/resiliency-and-the-components-of-reliability.html
·      https://aws.amazon.com/blogs/devops/chaos-experiments-on-amazon-rds-using-aws-fault-injection-simulator/
·      https://www.gremlin.com/community/tutorials/chaos-engineering-tools-comparison/
·      https://www.gremlin.com/aws/
·      https://chaos-mesh.org/
·      https://chaostoolkit.org/

Cover image courtesy: Internet

Top comments (0)