Kshitij (kd)

Posted on Oct 9, 2023

Resilient Systems using Go: Circuit Breaker

#go #distributedsystems #resilientsystems #systemdesign

Introduction

In the previous post we talked about retry mechanism and what all possibilities can be encapsulated together in a package. Its an important mechanism to prevent the whole system from going down in case an external service goes down.

Let's take an example of Twitter (or X), like social media-application that synchronously loads the website with all the important features like recommended tweets, user recommendations, and trending hashtags.

Its the football World Cup, and England is playing Portugal. People who are stuck in the office are checking the hashtag #EngVsPor to get live reaction of others, and hence overloading the hashtag component of our system. The hashtag service is taking 7 seconds to get data instead of the usual 20ms. And because the call is synchronous, each web page reload is taking at least 7 seconds.
On top of that, now we have a lot of concurrent requests stuck on our server waiting for the hashtag service to give a result, ultimately leading to an outage.

In this case, it would be better if we failed all the requests early. One way to do that is to reduce the timeouts for these requests. But coming up with a value would be difficult. If the system is under load, all the requests will time out.
So a timeout will not be an efficient way to manage this problem. And this is where circuit breaker comes in.

Circuit Breaker

What we can do is let the request to the hashtag service go through the circuit breaker. If the number of errors goes beyond a specified threshold, the circuitbreaker will stop sending requests to the hashtag service. Hence, the circuit is open.

But how would we know if the service can start taking requests again? This can be done by adding another state to the circuit breaker - the half open state. After a specified duration of time, one can send a few requests to the service. If even a single request returns error, the circuit will open again, and the cycle will continue.

If, in half state, a good amount of requests do not cause any errors, we can close the circuit again and resume the flow.
But the flow through the circuit breaker is not controlled by the package. We need a way to inform the system of the current state of circuit breaker

Design

So our Circuit Breaker structure must include

current state (open/half/closed)
threshold : when the number of errors reaches the threshold, the state changes to open.
duration: time after which our state changes from open to half.
good requests: Total number of good requests in halfstate
halfStateThreshold: Exceeding this threshold will change the state to closed, and a full flow of requests can be expected afterwards. A good idea will be to have it as a percentage of the threshold variable from above.
NotifyFunc: function that will be called whenever a state is changed.
StateMutex : The state of the circuit breaker due to concurrent access will cause locks. So we will use mutex to avoid that scenario.

Let's have a separate structure that will be used as an input to invoke our circuit breaker structure. The image below shows what the structures will look like.

Implementation

So the execution will be somewhat similar to what was done in the previous blog about the retry mechanism.

// cb is the circuit breaker object.
        cb.Execute(context.Background(), func() (interface{}, error) {
            l := m.Func()

            if l == "" {
                return "", errors.New("Error found")
            }
            return "ok", nil
        })

The execute function will run the closure if it is in a half state or closed state. It will return an error if it is in the open state.

// Execute executes the user defined function in the circuit breaker
func (cb *CircuitBreaker) Execute(ctx context.Context, fn Action) (interface{}, error) {
    // Execute the function
    var state State

    cb.sLock.Lock()
    state = cb.state
    cb.sLock.Unlock()

    switch state {
    case Closed:
        return cb.run(fn)
    case Open:
        return nil, ErrCircuitOpen
    case Half:
        return cb.runInHalfState(fn)
    }
    return cb.run(fn)
}

The switch from a closed to an open circuit will happen when the number of errors reaches the threshold. Once the state is set to Open, we will wait for specified amount of time and then change the state to Half state

// running the function in closed state
func (cb *CircuitBreaker) run(fn Action) (interface{}, error) {

    res, err := fn()
    if err != nil {
        cb.count++
    }
    if cb.count >= cb.threshold {
        go cb.openCircuit()
    }
    return res, err
}

// Open circuit 
func (cb *CircuitBreaker) openCircuit() {
    cb.setState(Open)
    go cb.halfCircuit()

}


// HalfOpen Circuit
func (cb *CircuitBreaker) halfCircuit() {

// Sleep for specified duration
    time.Sleep(cb.duration)

    cb.setState(Half)
}

Now for the execution in half state, we will keep on counting the good requests, and if the number exceeds a certain percentage of the threshold, we can close the circuit.
If even a single error comes up, we close the circuit again.

func (cb *CircuitBreaker) runInHalfState(fn Action) (interface{}, error) {

    res, err := fn()
    if err != nil {
        cb.openCircuit()
        return res, err
    }

    cb.goodReqs++

    if cb.goodReqs >= (cb.hsThreshold*cb.threshold)/100 {
        cb.closeCircuit()
    }
    return res, err
}

And whenever we set the state, we need to notify the system about the change in state

func (cb *CircuitBreaker) setState(st State) {

    cb.sLock.Lock()
    cb.goodReqs = 0
    cb.count = 0
    cb.state = st
    cb.sLock.Unlock()

    //Notify the userDefined Function
    go cb.notifyFunc(stateMapping[st])

}

And that's it! The circuit breaker package is ready to use.
Code alongside testcases can be found here

DEV Community

Resilient Systems using Go: Circuit Breaker

Introduction

Circuit Breaker

Design

Implementation

Top comments (0)

Read next

CRUD Operations with Goravel (Laravel for GO)

This is all what I've learned about Go in TWO Weeks!

🧪 GOLANG INTEGRATION TEST WITH GIN, GORM, TESTIFY, MYSQL

📝 100 COMMON GOLANG INTERVIEW QUESTIONS 🐹