What is Design for Failure?

#failure #beginners

Pre-study image

Is it a design for failure kind of story?

Study

What is Design for Failure?

Design for Failure refers to the concept of designing a system based on the assumption that failures will occur.
Servers can fail, AZ/region things can fail.
The idea is to improve the availability of the service by taking countermeasures in case of failures.

How to realize Design for Failure

Elimination of SPOFs

To improve availability, it is important not to create single-point-of-failure (SPOF).
By configuring the system in an HA configuration or, in the case of AWS, in a multi-AZ, multi-region configuration, it is possible to avoid SPOFs at a single point of failure.
This is because services can continue to operate even if an entire AZ or region fails.

Monitoring

Service monitoring for early detection of service failures, resource monitoring to detect performance degradation, etc.
It is necessary to set up constant monitoring of logs and metrics in the service for early detection and early recovery.

Recovery Methods

When a failure occurs and the client is affected, how to recover?
It is possible to recover early by deciding in advance how to move within the organization and what recovery methods to use when a failure occurs and clients are affected.

Image after study

SPOF is indeed ・・・・.
Basically, it means you have to think of the server as something that dies.

DEV Community

What is Design for Failure?

Pre-study image

Study

What is Design for Failure?

How to realize Design for Failure

Image after study

Top comments (0)

Read next

Understanding Go: part 10 – Struct

Git Merge vs. Rebase: Key Differences

C# dasturlash tilida ma'lumot turlari va ularning ustun hamda kamchiliklari

Interview Question: Tell me about a product you launched successfully.