loading...

Rate limiting API calls - sometimes a Bottleneck is a good thing

rcoundon profile image Ross Coundon ・3 min read

What is Bottleneck and why do I need it in my coding life?

If you've spent any time working with 3rd party APIs you'll have come up against an issue where you make a tonne of calls to an API and it doesn't finish giving you what you want. You might get a helpful error like 429 - Too Many Requests or something less helpful like ECONNRESET

Either way, what is happening is that as a consumer of that API you are only allowed to make so many requests in a certain period of time, or the number of concurrent requests you're allowed to make is restricted.

In Javascript your code might look something like this:


const axios = require('axios');

async function getMyData(data){
  const axiosConfig = {
    url: 'https://really.important/api',
    method: 'post',
    data
  }
  return axios(axiosConfig)
}


async function getAllResults(){

  const sourceIds = []

  // Just some code to let us create a big dataset
  const count = 1000000;
  for(let i = 0; i < count; i++){
    sourceIds.push({
      id: i
    });
  }

  // Map over all the results and call our pretend API, stashing the promises in a new array
  const allThePromises = sourceIds.map(item => {
    return getMyData(item);
  })

  try{
    const results = await Promise.all(allThePromises);
    console.log(results);
  }
  catch(err){
    console.log(err);
  }

}

What's going to happen here is the code will call the 1000000 times as fast as possible and all requests will take place in a very short space of time (on my MacBook Pro it's < 700ms)

Understandably, some API owners might be a little upset by this as it's creating a heavy load.

What do we need to do?

We need to be able to limit the number of requests we're making, potentially both in terms of the number of API calls in a space of time and in terms of the number of concurrent requests.

I'd encourage you to attempt to roll your own solution as a learning exercise. For example, there is a reasonably simple solution that can get you out of a hole using setInterval. What I think you'll find is that building a reliable solution that limits rate and concurrency is actually trickier than it looks and requires you to build and manage queues. It's even more complicated if you're clustering.

We can instead turn to a gem of a package on NPM - Bottleneck
https://www.npmjs.com/package/bottleneck

The author describes this as:

Bottleneck is a lightweight and zero-dependency Task Scheduler and Rate Limiter for Node.js and the browser.

What you do is create a 'limiter' and use it to wrap the function you want to rate limit. You then simply call the limited version instead.

Our code from earlier becomes:


const axios = require('axios');
const Bottleneck = require('bottleneck');

const limiter = Bottleneck({
  minTime: 200
});

async function getMyData(data){
  const axiosConfig = {
    url: 'https://really.important/api',
    method: 'post',
    data
  }
  return axios(axiosConfig)
}

const throttledGetMyData = limiter.wrap(getMyData);

async function getAllResults(){

  const sourceIds = []

  // Just some code to let us create a big dataset
  const count = 1000000;
  for(let i = 0; i < count; i++){
    sourceIds.push({
      id: i
    });
  }

  // Map over all the results and call our pretend API, stashing the promises in a new array
  const allThePromises = sourceIds.map(item => {
    return throttledGetMyData(item);
  })


  try{
    const results = await Promise.all(allThePromises);
    console.log(results);
  }
  catch(err){
    console.log(err);
  }

}

getAllResults()

As you can see, we've created a limiter with a minTime property. This defines the minimum number of milliseconds that must elapse between requests. We have 200 so we'll make 5 requests per second.

We then wrap our function using the limiter and call the wrapped version instead:


const throttledGetMyData = limiter.wrap(getMyData);
...
  const allThePromises = sourceIds.map(item => {
    return throttledGetMyData(item);
  })

If there's a chance your requests will take longer than the minTime, you're also easily able to limit the number of concurrent requests by setting up the limiter like this:

const limiter = Bottleneck({
  minTime: 200,
  maxConcurrent: 1,
});

Here we'll ensure that there is only one request submitted at a time.

What else can it do?

There are many options for setting up Bottleneck'ed functions. You can rate limit over a period of time using the reservoir options - e.g. send a maximum of 100 requests every 60 seconds. Or, send an initial batch of requests and then subsequent batches every x seconds.

The documentation over at NPM is excellent so I advise you to read it to get a full appreciation of the power of this package, and also the gotchas for when things don't behave as you expect.

Wrapping up

If you've ever in need of a highly flexible package that deals with how to rate limit your calls to an API, Bottleneck is your friend.

Posted on by:

Discussion

pic
Editor guide
 

I tried the above suggestion to make multiple requests to a remote third party API to solve the " Socket hangout issue, ECONNRESET in Express.js (Node.js) with multiple requests: but still getting the error. I will appreciate your help. Thanks

 

That means the other end of the connection closed it for some reason. Can you share your code?

 

This is the code I use to simulate the bulk operation

export async function bulkParserSimulator(arrayOfIds) {
  // Fetch CV urls
  // logg.info('Fetching uploaded resume arrays')
  let cvUrl = await getUploadedCVUrl(arrayOfIds);
  let parsedResult = [];
  let save;
  let start = +new Date();
  let parserError;

  // sequential operation
  for (let index = 0, len = cvUrl.length; index < len; index++) {
    const file = cvUrl[index].fileurl;
    const filename = cvUrl[index].filename;
    const jobpositionIdFromArray = cvUrl[index].jobPositionId;
    // Check if cvUrl exist else go to next loop
    if (!file && !filename) continue;

    let singleResult = await throttleApiCall(filename, file);
    console.log({ singleResult });
    const { data } = singleResult;
    if (data.Error) {
      parserError = true;
    }
    if (data.Results) {
      let newparsed = new ParsersCollection(data);
      newparsed.jobPositionId = jobpositionIdFromArray;
      save = await newparsed.save();
      parsedResult.push(save);

      let end = +new Date();
      console.log({ TimeToFinish: end - start + 'ms' });
    }
  }
  // You can return this result then
  let final = { parsedResult, parserError };
  logger.info('Returning the parsed results');
  return final;
}

Then this is the one for bottleneck

import Bottleneck from 'bottleneck';
import { initializeParsingProcess } from './hire';
// Never more than 1 requests running at a time.
// Wait at least 1000ms between each request.
const limiter = new Bottleneck({
  maxConcurrent: 1, // one request per second
  minTime: 333,
});

export const throttleApiCall = limiter.wrap(initializeParsingProcess);

How you're wrapping the function looks fine. If you call initializeParsingProcess() directly, what happens?

I will get the error below when I call direct. I use bottleneck to see whether that can be solved after going your amazing post here but still getting the same error below

Error: socket hang up
          at createHangUpError (_http_client.js:323:15)
          at TLSSocket.socketOnEnd (_http_client.js:426:23)
          at TLSSocket.emit (events.js:194:15)
          at TLSSocket.EventEmitter.emit (domain.js:441:20)
          at endReadableNT (_stream_readable.js:1125:12)
          at process._tickCallback (internal/process/next_tick.js:63:19)
        code: 'ECONNRESET',

Ah, I see, I thought you were suggesting the problem was with your usage of Bottleneck. Can you share the code that you use to call the API?

I used Axios to make post request to a remote server. This is the code below

import axios from 'axios';
import Agent from 'agentkeepalive';
import http from 'http';
import https from 'https';

//keepAlive pools and reuses TCP connections, so it's faster
const keepAliveAgent = new Agent({
  maxSockets: 100, // Maximum number of sockets to allow per host. Defaults to Infinity.
  maxFreeSockets: 10,
  timeout: 60000, // active socket keepalive for 60 seconds
  freeSocketTimeout: 60000, // // Maximum number of sockets to leave open for 60 seconds in a free state. Only relevant if keepAlive is set to true. Defaults to 256.
  socketActiveTTL: 1000 * 60 * 10,
});

const axiosInstance = axios.create({ httpAgent: keepAliveAgent });

After that, I import into the file below to make the call

import { axiosInstance} from './axiosInstance';

 let responseBody = await axiosInstance.post(ROOT_URI, prepareFormData(filename, file), {
      headers: form.getHeaders(),
    });

Can you try with a raw axios instance, i.e. without the KeepAliveAgent?

I have done that. It was after some research that I added KeepAliveAgent to see whether it can be solved but still proved abortive

How about making a single, non-bottlenecked call to the API? Does that work?

Making a single call even with bottleneck works perfectly

Then I'm guessing there's something weird in the way you're building the URLs or making the requests when there are multiple API calls.

To clean things up, if I was you I'd change the code to use map() on the array of cvUrl returning a promise for each call.
The await promise.all() on the result of that map, then do your parsing.

Put console.logs in each iteration to determine exactly what you're sending and wrap in try/catch to see if you can find any more information about what's actually going wrong with the connection.

This is where I tried it with Promise.all but got the same error. Is this code below look like what you suggested above?

const toReadInParalel = async arrayOfIds => {
  let cvUrl = await getUploadedCVUrl(arrayOfIds);
  let parsedResult = [];
  let save;
  let start = +new Date();
  let parserError;
  await Promise.all(
    cvUrl.map(async url => {
      const file = url.fileurl;
      const filename = url.filename;
      const jobpositionIdFromArray = url.jobPositionId;
      console.log({ file, filename, jobpositionIdFromArray });
      if (!file && !filename) return;

      let singleResult = await throttleApiCall(filename, file);
      console.log({ singleResult });
      const { data } = singleResult;
      if (data.Error) {
        parserError = true;
      }
      if (data) {
        let newparsed = new ParsersCollection(data);
        newparsed.jobPositionId = jobpositionIdFromArray;
        save = await newparsed.save();
        parsedResult.push(save);
      }
    }),
  );

I need your assistance to get this resolve. Thanks

@ross Coundon, I would appreciate your assistance from the wealth of your experience in the field interacting with several third part API on how I can make a concurrent request to the server without experience the "socket hangout issue". In the first interaction of the loop shown above, I will get results from the third-party API but on the second iteration, there will be a delay for a response from the third party and hence the error message below

Trace: { Error: socket hang up
    at createHangUpError (_http_client.js:323:15)
    at TLSSocket.socketOnEnd (_http_client.js:426:23)
    at TLSSocket.emit (events.js:194:15)
    at TLSSocket.EventEmitter.emit (domain.js:441:20)
    at endReadableNT (_stream_readable.js:1125:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
  code: 'ECONNRESET',

Hi - I'm not sure what to suggest, are you able to share what the 3rd party API is? Do they provide any documentation/information on acceptable usage, time between requests, number of concurrent requests etc?

They do not have that spell out on their API documentation. I have sent a mail to them to inquire about the acceptable usage, the time between requests, number of concurrent requests

 

Great stuff, thanks for bringing bottleneck to the broader audience.