Pre-study image
Is it a design for failure kind of story?
Study
What is Design for Failure?
Design for Failure refers to the concept of designing a system based on the assumption that failures will occur.
Servers can fail, AZ/region things can fail.
The idea is to improve the availability of the service by taking countermeasures in case of failures.
How to realize Design for Failure
- Elimination of SPOFs
To improve availability, it is important not to create single-point-of-failure (SPOF).
By configuring the system in an HA configuration or, in the case of AWS, in a multi-AZ, multi-region configuration, it is possible to avoid SPOFs at a single point of failure.
This is because services can continue to operate even if an entire AZ or region fails.
- Monitoring
Service monitoring for early detection of service failures, resource monitoring to detect performance degradation, etc.
It is necessary to set up constant monitoring of logs and metrics in the service for early detection and early recovery.
- Recovery Methods
When a failure occurs and the client is affected, how to recover?
It is possible to recover early by deciding in advance how to move within the organization and what recovery methods to use when a failure occurs and clients are affected.
Image after study
SPOF is indeed ・・・・.
Basically, it means you have to think of the server as something that dies.
Top comments (0)