Although the advantages of a microservices architecture are known (not a topic explained here), we often ignore the resiliency in system design. Software systems do remote calls to software running in different processes, usually on different machines across a network. For example:
What happens when service B goes unavailable, responds with high latency or returns the same business exception repeatedly? These unhandled cases can lead to cascading failures that affect various company services.
The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors it for failures. When we apply this pattern, we prevent possible application problems. This pattern follows the same concept as the safety electrical component named circuit breaker.
Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error or with some alternative service or default message, without the protected call being made at all. This will assure that the system is responsive and threads are not waiting for an unresponsive call, protecting the system to avoid catastrophic failures.
In case service B goes down, service A should still try to recover from this and try to do one of the followings actions:
Custom fallback: Try to get the same data from some other source. If not possible, use its own cache value or your custom client error response.
Fail fast: If service A knows that service B is down, there is no point to waiting the timeout and consuming its own resources.
Don’t crash: As we saw in this case, service A should not have crashed.
Heal automatic: Periodically check if service B is working again.
Other APIs should work: All other APIs should continue to work.
Closed: When everything is normal, the Circuit Breaker remains CLOSED and all calls to service B occur normally. If the number of failures exceeds a predetermined limit, the status changes to OPEN.
Open: In this state, the Circuit Breaker will not execute the service B call and return a treated error.
Half-Open: After a timeout period, the circuit switches to a half-open state to test if the underlying problem still exists. If a single call fails in this HALF-OPEN state, the breaker is once again tripped. If it succeeds, resets back to the normal, CLOSED state.
The circuit breaker helps you prevent possible problems of integration between your microservices. For best results, use monitoring tools and metrics, such as prometheus and grafana.
In the next next post i will be talk about the main framework for resilience to Java applications, Resilience4j.