When we work with microservices, one of the main objectives is to achieve resilience. There are many ways to achieve that. Retry, bulkhead, circuit breaker, all of them improve the microservices resilience. My idea with this post is to write about circuit breaker and how it works. I will use resilience4j dependency to implement this mechanism.
Resilience4j
Resilience4j is a lightweight fault tolerance library designed for functional programming. Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference with a Circuit Breaker, Rate Limiter, Retry or Bulkhead. You can stack more than one decorator on any functional interface, lambda expression or method reference. The advantage is that you have the choice to select the decorators you need and nothing else.
Circuit Breaker
The CircuitBreaker is implemented via a finite state machine with three normal states: CLOSED, OPEN and HALF_OPEN and two special states DISABLED and FORCED_OPEN.
The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed.
The CircuitBreaker also changes from CLOSED to OPEN when the percentage of slow calls is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls took longer than 5 seconds. This helps to reduce the load on an external system before it is actually unresponsive.
The failure rate and slow call rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.
The CircuitBreaker rejects calls with a CallNotPermittedException when it is OPEN. After a wait time duration has elapsed, the CircuitBreaker state changes from OPEN to HALF_OPEN and permits a configurable number of calls to see if the backend is still unavailable or has become available again.
If the failure rate or slow call rate is then equal or greater than the configured threshold, the state changes back to OPEN. If the failure rate and slow call rate is below the threshold, the state changes back to CLOSED.
More details in: https://resilience4j.readme.io/docs/circuitbreaker
Let's code
The first thing is to add the resilience4j-circuit-breaker on pom.xml
<!-- https://mvnrepository.com/artifact/io.github.resilience4j/resilience4j-circuitbreaker -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-circuitbreaker</artifactId>
<version>1.7.1</version>
</dependency>
Now we need to configure our circuit breaker. My idea is that the circuit opens when catch 70% of errors or 70% of the requests last more than 2 nanoseconds (it's necessary to force the circuit open). My reference will be 10 requests. So, if in 10 requests, we receive 70% of errors or 70% of requests are slow, the state of circuit changes. To do that with code, let's create a Kotlin class called CircuitBreakerConfiguration
.
// imports omitted
@Configuration
class CircuitBreakerConfiguration {
fun getConfiguration() = CircuitBreakerConfig.custom()
.slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
.slidingWindowSize(10)
.slowCallRateThreshold(70.0f)
.failureRateThreshold(70.0f)
.waitDurationInOpenState(Duration.ofSeconds(5))
.slowCallDurationThreshold(Duration.ofNanos(3))
.permittedNumberOfCallsInHalfOpenState(3)
.build()
fun getCircuitBreaker() = CircuitBreakerRegistry.of(getConfiguration())
.circuitBreaker("circuit-breaker-car-service")
}
As we can see, we called the custom method of CircuitBreakerConfig
class and apply the configurations. We defined the type and the size of sliding window. The type was COUNT_BASED
and the size was 10.
It's a good thing to understand about the sliding window. For that read the documentation.
To define the percentage of slow calls and errors that will open the circuit, we used slowCallRateThreshold
and failureRateThreshold
, both with 70%. There are two configurations that we need to understand better. They are waitDurationInOpenState
and permmitedNumberOfCallsInHalfOpenState
. The wait duration in open state is responsible to configure how many time the open circuit will pass from OPEN to HALF_OPEN. In our case, will be 5 seconds. When the circuit is in HALF_OPEN state, is permitted to request to the service (API, microservices etc...) to validate if there is an error or if the system recovered. To know how many requests is necessary to validate, we use permmitedNumberOfCallsInHalfOpenState
. In our case is 3. If 70% of 3 requests fails or are more slow than 2 nanoseconds, the circuit will change FROM HALF_OPEN to OPEN and will wait more 5 seconds to change to HALF_OPEN again. The circuit won't permit requests to the service, returning CallNotPermittedException
. But, if the requests don't fail, the circuit change from HALF_OPEN to CLOSED and application will work normally. Now we need to test.
Let's test
The application where we implemented the circuit breaker is simple. As we use Spring Boot, we have a RestController
that calls a service. This service there are some methods and one of them calls an external service via HTTP. To do this request we use Retrofit, which a client HTTP that has good integration with resilience4j.
I won't give many details about Retrofit, for that, I recommend to read the documentation: https://square.github.io/retrofit/
The method that we need to enhance with circuit breaker is called listCarsByModel
override fun listCarsByModel(model: String) =
carHttpService.getByModel(model)
.execute()
.body()
?.let(CarHttpToModelConverter::toModel)
There are some details about retrofit. CarHttpService
is an interface that has a getByModel
method. In execution time, when execute( )
method will call, retrofit will implement this interface. It is possible, because we have a configuration class fot that.
@Configuration
class CarHttpConfiguration(
private val circuitBreakerConfiguration: CircuitBreakerConfiguration
) {
private companion object {
const val BASE_URL = "http://localhost:8081/cars"
}
private fun buildClient() = OkHttpClient.Builder().build()
private fun buildRetrofit() = Retrofit.Builder()
.baseUrl(BASE_URL)
.addConverterFactory(GsonConverterFactory.create())
.client(buildClient())
.build()
@Bean
fun carHttpService(): CarHttpService = buildRetrofit().create(CarHttpService::class.java)
}
In this class we defined a buildRetrofit( )
method, that is responsible for add some configurations. We pass the url that will be call (you can change for other). We add a converter factory that will be the responsible to convert the API information to our object, and a client. After that, we defined a function called carHttpService
, which is a bean. When execute( )
method is called, Retrofit uses this function to implement the interface and inject it into the service.
Until now, we have just spoken about retrofit. But what we need to know is how we can integrate resilience4j circuit breaker and retrofit.
We need to add a new dependency in our pom.xml
<!-- https://mvnrepository.com/artifact/io.github.resilience4j/resilience4j-retrofit -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-retrofit</artifactId>
<version>1.7.1</version>
</dependency>
Now we just need to add one line in our retrofit configuration in builRetrofit
method.
private fun buildRetrofit() = Retrofit.Builder()
//new line
.addCallAdapterFactory(CircuitBreakerCallAdapter.of(circuitBreakerConfiguration.getCircuitBreaker()))
.baseUrl(BASE_URL)
.addConverterFactory(GsonConverterFactory.create())
.client(buildClient())
.build()
In this line, we applied an adapter, that receives a CircuitBreakerCallAdapter
where we pass our circuit breaker configuration.
Without retrofit, we would have more manual work to do. This is the advantage of integrations.
The next step will be to start the application and call the endpoint that does the HTTP request more than 10 times. As we configured our slowCallDurationThreshold
to 3 nanoseconds, the requests always last more time, so circuit should open. Once open, is possible to see this return
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'circuit-breaker-car-service' is OPEN and does not permit further calls
Conclusion
As we can see, circuit breaker improves the resilience of our services. Besides, it helps the external system to recover, when doesn't permit that our system does any request to it. I hope that you liked the post. Doubts, suggestions, and critics, you can comment or call me on my social media.
Top comments (4)
amazing Man!!
Thank you, man!!
Great article
Thank you, man!!