DEV Community

Cover image for How to effectively use retry policies with BullJs/BullMQ
Caio Campos Borges Rosa for Woovi

Posted on

How to effectively use retry policies with BullJs/BullMQ

Jobs play a pivotal role in the majority of distributed systems these days. They allow us to achieve scalability at the cost of time. While it may take more time to process all the tasks in our queue, rest assured that we will process each one eventually. Except, if they fail.

In distributed systems, it's generally a good practice to treat events as disposable and immutable. Unless you have a specific use case that necessitates storing events, this should be your default approach.

How to deal with failed jobs

When a job fails, there are several ways to address the situation. Reprocessing is often the first option that comes to mind, but it typically involves recreating the event, requiring user input, or asking the user to repeat an action. This can be detrimental to the system and user experience since unnecessary repetition is generally undesirable.

What is a retry policy?

Another approach to handling failing jobs that require reprocessing is through the use of retry policies. A retry policy comprises a set of rules that automate job reprocessing. Typically, these rules define the time intervals between each retry and the maximum number of retry attempts.

How to configure in BullMQ/JS?

In BullMQ/BullJS, you configure this directly when adding a job to a queue using the job options object during job creation.

await queue.add(
  'test-retry',
  { foo: 'bar' },
  {
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 1000,
    },
  },
);
Enter fullscreen mode Exit fullscreen mode

The 'attempts' parameter determines how many times a job will be retried, starting from the first failure. The 'backoff' function involves a dynamic relationship between 'attempts' and the time intervals, which is described using 'type' and 'delay'."

The 'type' specifies the strategy for handling the retry attempts, while 'delay' represents the time intervals between these attempts.

Fixed

When the backoff type is set to 'fixed,' the retry attempts will be evenly spaced after the specified delay time. For instance, as shown in the example above, Bull will make 7 attempts, waiting 10 seconds between each one.

 {
        attempts: 7,
        backoff: {
          type: 'fixed',
          delay: 10000,
        },
      },
Enter fullscreen mode Exit fullscreen mode

Exponential

When the backoff type is set to 'exponential,' the retry attempts will follow an exponential pattern after the specified delay time. For example, as demonstrated in the above example, Bull will make 7 attempts, waiting 10 seconds after the first, 20 seconds after the second, 30 seconds after the third, and so on.

 {
        attempts: 7,
        backoff: {
          type: 'exponential',
          delay: 10000,
        },
      },
Enter fullscreen mode Exit fullscreen mode

If the only issue you are trying to solve is handling errors and retries automatically that would be enough, but what if you want to control when a job fails?

How to explicit fail a job?

The retry policy will only apply to jobs that explicitly fail. Jobs are considered failed when they've reached their maximum number of stalls (refer to the documentation) or when an error occurs while a worker is processing the job.

You can create custom error classes and explicitly instantiate them based on logic your code controls.

For example, if I make a request and the status code is anything other than 200:

My error class:

export class WorkerRetryError extends Error {
  constructor(message) {
    super(message);
    this.name = 'WorkerRetryError';
  }
}

export const isWorkerRetryError = (err: Error) =>
  err instanceof WorkerRetryError;
Enter fullscreen mode Exit fullscreen mode

Using error class in this case is important, so you can use narrowing with typeguards at runtime to clear the custom errors you will be trowing.

 const getHasError = () => {
    if (result?.error) {
      return true;
    }
    if (result?.response?.status > 200) {
      return false;
    }

    return !result?.response?.ok;
  };

  const hasError = getHasError();

  if (hasError) {
    throw new WorkerRetryError(
      `Webhook failed with status code 
        ${result?.response?.status}`,
    );
  }
Enter fullscreen mode Exit fullscreen mode

Photo by Mark König on Unsplash

Top comments (0)