DEV Community

Juan Vega
Juan Vega

Posted on

A golang library to retry operations with exponential backoff

I have created go-again, a small library to retry operations using different timing algorithms. By default, it uses exponential backoff with a jitter to generate various delays for each retry.

GitHub logo jdvr / go-again

A set of utility algorithms to retry operations, again and again.

Go Again


A simple and configurable retry library for go, with exponential backoff, and constant delay support out of the box Inspired by backoff.

Features

  • Configurable delay calculation algorithm
  • Support for exponential backoff and constant delay out of the box
  • Support for generics
  • Simple and clean interface

There are two main concepts:

  • Retry: Given an operation and a ticks calculator keeps retrying until either permanent error or timeout happen
  • TicksCalculator: Provide delay for retryer to wait between retries

Examples:

Call an API using exponential backoff

package main
import (
    "context"
    "errors"
    "fmt"
    "net/http"

    "github.com/jdvr/go-again"
)


func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    apiResponse, err := again.Retry[*http.Response](ctx, func(ctx context.Context) (*http.Response, error) {
        fmt.Println("Running Operation")

        
Enter fullscreen mode Exit fullscreen mode

Retry any function using exponential backoff:



package main

import (
"context"
"errors"
"fmt"
"net/http"

<span class="s">"github.com/jdvr/go-again"</span>
Enter fullscreen mode Exit fullscreen mode

)

func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

<span class="n">apiResponse</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">again</span><span class="o">.</span><span class="n">Retry</span><span class="p">[</span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Response</span><span class="p">](</span><span class="n">ctx</span><span class="p">,</span> <span class="k">func</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Response</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"Running Operation"</span><span class="p">)</span>

    <span class="n">resp</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">http</span><span class="o">.</span><span class="n">DefaultClient</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s">"https://sameflaky.api/path"</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
        <span class="c">// operation will be retried</span>
        <span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">err</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="n">resp</span><span class="o">.</span><span class="n">StatusCode</span> <span class="o">==</span> <span class="n">http</span><span class="o">.</span><span class="n">StatusForbidden</span> <span class="p">{</span>
        <span class="c">// no more retries</span>
        <span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">again</span><span class="o">.</span><span class="n">NewPermanentError</span><span class="p">(</span><span class="n">errors</span><span class="o">.</span><span class="n">New</span><span class="p">(</span><span class="s">"no retry, permanent error"</span><span class="p">))</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="n">resp</span><span class="o">.</span><span class="n">StatusCode</span> <span class="o">&gt;</span> <span class="m">400</span> <span class="p">{</span>
        <span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">errors</span><span class="o">.</span><span class="n">New</span><span class="p">(</span><span class="s">"this will be retry"</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="c">// do whatever you need with a valid response ...</span>

    <span class="k">return</span> <span class="n">resp</span><span class="p">,</span> <span class="no">nil</span> <span class="c">// no retry</span>
<span class="p">})</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="nb">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="p">}</span>

<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"Finished with response %v</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">apiResponse</span><span class="p">)</span>
Enter fullscreen mode Exit fullscreen mode

}

Enter fullscreen mode Exit fullscreen mode




Exponential backoff and jitter

Quoting Wikpedia about Exponential Backoff:

Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process to gradually find an acceptable rate.

The idea is to generate longer wait periods between each retry, assuming the system will work at some point because the issue is just a matter of time.

Suppose you have an API to get the user profile that might fail. In that case, you can introduce a retry function to keep trying the request, and this retry will wait longer between each period. Now imagine that your API crashed due to workload.

Backoff won't help. Although the request will be triggered with a delay period, all the clients follow the same algorithm with the same delays, so your workload issue is even worse.

The algorithm uses a jitter to solve the issue of the exact delays by generating a small random delay gap between different clients. Instead of having all clients waiting for 500ms, they pick a random number between 450 and 550, distributing the server workload wisely.

Go Again

I created this library for fun, but it is production ready, and of course, I will use it as soon as I have the opportunity. It is inspired by backoff, the library I am using now.

go-again goes beyond exponential backoff and offers another algorithm for a constant delay in case you want to keep it simple. Furthermore, you can define your own algorithm easily by implementing an interface.

Top comments (1)

Collapse
 
manuartero profile image
Manuel Artero Anguita 🟨

Sounds very cool Juan!