There are many infrastructure design patterns. In this post, I want to dig into the circuit breaker pattern. This pattern's name comes from electrical engineering and circuit breakers.
The power company delivers electricity to your home at a constant voltage (that varies depending on your country of residence). Voltage is the measurement of the force that pushes electrons through the circuit. Current is the rate of flow of the electrons. The wires that carry the electrons, light bulbs, computers, and other appliances place a load on the current, so resistance varies throughout the house. Resistance produces heat.
Ohm's Law defines the relationship between these different components:
I = V/R
- I - current in units of amperes
- V - voltage in units of volts
- R - resistance in units of ohms
An electrical circuit is a path for electrical current. A home circuit breaker is a safety mechanism to prevent fires by breaking a circuit or path. A properly functioning circuit breaker opens the circuit before the current jumps to unsafe levels.
Additionally, homes are designed with parallel circuits so that a failure in a single room or large appliance is isolated from the rest of the house. Too much charge and heat might overload the house wiring or appliance wiring and break other devices or cause a fire.
Opening a circuit breaks it as current can't flow. A closed-circuit is working as designed.
If we think about the flow of requests as being like electricity, then we implement a circuit breaker pattern by making a mechanism that breaks the flow of requests.
When we talk about circuit breaker patterns in software, an "open" circuit indicates a problem. The goal of implementing a circuit breaker in our design is to prevent individual services from causing damage to other services or bringing down the whole system. If a request is continually failing, continuously retrying may potentially exacerbate whatever the underlying issue is. It can also delay recovery.
What is a fault that would trip a circuit breaker in a service?
In distributed systems, resource consumption, error density, or run time may all indicate limitations in the architecture that have system-wide impacts.
The actual implementation of a circuit breaker in software is very different than in electrical engineering. We can auto-resolve a tripped circuit breaker when the failure is no longer occurring. We can also implement a third state, "half-open," allowing for degraded performance. During a recovery period, limiting requests through a half-open circuit gives time to assess the state of the failed service and for the failed service to recover.
Is there an Ohm's Law that we can define for systems?
There isn't an Ohm's Law, but it is something to think about in terms of maximum requests rates or limits of dependent and third party services. What is the total load being placed on these services?
Operations engineers need to know whether anyone has implemented circuit breakers and where. Depending on the circuit breaker implementation, clients may completely fail a particular operation or queue up the request to be processed later. Tripped circuit breakers can provide direction toward improving operability and design. It's critical to expose and monitor changes to state in circuit breakers.
In some ways, the circuit breaker pattern isn't ideal. We generally don't have a single metric that can tell us whether something is in a bad state. When designing our application, improving the value we provide to our customer, minimizing the impact on external services, and reducing wasted spend all have benefit. Implementing a circuit breaker based off of a single metric that is monitored and assessed for incremental improvement or customization may be better than nothing.
Often engineers implement the circuit breaker pattern in conjunction with the retry and bulkhead design patterns. I'll visit those patterns in separate posts.