Making systems more resilient (1) - Circuit Breaker

#devops

Things break. Usually in production. In this article series we will present various mechanisms for making your systems more resilient so your users will never experience real outage in spite of possible problems under your hood.

Circuit Breaker has an interesting name that comes from the analogy found in the physical electrical world. Every decent electrician recognizes that playing with electricity is dangerous and that she must install security mechanisms to protect the rest of the electrical network if one device breaks down. What do electricians do? They install fuses. If a device (say toaster!) breaks down and produces short circuit, strong electric current will flow through the network resulting in damage, overheating and possibly fire. Unless there is a fuse in the network! If there is strong electric current flowing, the fuse will melt down or otherwise break the electrical circuit, stop the current and save the rest of the network.

Circuit Breaker mechanisms in software are very much like electrical fuses in the real world. Suppose one of your services is calling another:

If Service B stops responding, Service A will patiently wait until timeout happens and then will either throw an exception or return questionable results. None of this is acceptable. We need a better solution that would prevent the whole system from going down. That’s where the Circuit Breakers come into play.

Circuit Breaker is positioned between two services and service call passes through it. If everything is fine, Circuit Breaker just passes calls through. What happens when Service B doesn’t respond?

Circuit Breaker will detect that and return an error or alternative response. In electrical terms, Circuit Breaker will "open the circuit". What’s more, Circuit Breaker will remember that Service B is down and whoever calls Service B will immediately get an error without Circuit Breaker even passing the request to Service B. Circuit Breaker will periodically pass one of the requests through to Service B, just to check if Service B is still down. If that is so, Circuit Breaker will continue to return an error without even passing requests to Service B.

If, on the other hand, Service B recovers and returns a valid result, Circuit Breaker will detect that, return the result to the caller (Service A) and "close the circuit" again, i.e. continue passing all requests through to Service B.

When Circuit Breaker detects that called service is down, it can do various things. It could return a meaningful error to the caller but it could also return a “good enough” result that would provide some value for the user.

Let’s take an example of a banking application that shows account balance fetched via separate service. If that service goes down, Circuit Breaker could return the last known balance from the cache. Yesterday’s balance might not be valid today but accompanied by a proper message it’s better than nothing.

"Dear customer, we’re experiencing some trouble getting your balance at the moment, but your balance 3 hours ago was €234.55" beats "Your balance is €null" anytime. Of course, what Circuit Breaker returns needs to be carefully thought through from a business perspective.

Circuit Breakers are really cool. In Part 2 of this series, we will lift the hood and show concrete Circuit Breaker implementation.

— Photo by rawpixel.com from Pexels.

DEV Community

Making systems more resilient (1) - Circuit Breaker

Top comments (0)

Read next

✨ Discovering GitLab Duo 🤖

Creating a free-tier AWS RDS PostgreSQL instance using Terraform

Unlocking Cloudflare's Threat Score: Enhance Your Security Without Upgrading Your Plan

How to check the distribution of timestamps in a PostgreSQL table?