Aaron Goldsmith

Posted on Mar 29, 2021 • Updated on May 9, 2021

Twitch.tv API - Get live stream data from paginated results

#javascript #programming #webdev

Recently, I wanted to work with the Twitch API to try to recreate the website twitchroulette.net, where you would be able to view a completely random live stream from all of the streams currently happening on the site. According to analytics from twitchtracker.com, there are currently an average of over 100,000 Twitch live streams at any given time.

When I went through the Twitch API documentation, I discovered that for the endpoint https://api.twitch.tv/helix/streams to get live streams, Twitch limits the response to a maximum of 100 streams per API call. However, the response includes a pagination field which contains a cursor value (a string) which is used in a subsequent requests to specify the starting point of the next set of results.

The response body for the GET request at https://api.twitch.tv/helix/streams?first=100 would include the top 100 most active live streams, and the data looks like this:

{
  "data": [
    {
      "id": "41375541868",
      "user_id": "459331509",
      "user_login": "auronplay",
      "user_name": "auronplay",
      "game_id": "494131",
      "game_name": "Little Nightmares",
      "type": "live",
      "title": "hablamos y le damos a Little Nightmares 1",
      "viewer_count": 78365,
      "started_at": "2021-03-10T15:04:21Z",
      "language": "es",
      "thumbnail_url": "https://static-cdn.jtvnw.net/previews-ttv/live_user_auronplay-{width}x{height}.jpg",
      "tag_ids": [
        "d4bb9c58-2141-4881-bcdc-3fe0505457d1"
      ]
    },
    ...
  ],
  "pagination": {
    "cursor": "eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19"
  }
}

If you wanted to retrieve the next 100 most active live streams, the subsequent API request URL would need to be:

https://api.twitch.tv/helix/streams?first=100&after=eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19

This includes as its after value the cursor value returned in the prior response.

It's not possible to sort the responses by least active, so in order to get results with streams with very few or no viewers, you would need results for the more active streams first.

It is also important to note that the Twitch API is rate-limited to 800 requests per minute, so the maximum number of livestreams we could retrieve in that time is 80,000, which is substantially lower than the current weekly average. It's therefore plausible that trying to get a truly complete list of results for live streams would run the risk of causing a HTTP 429 error (too many requests).

In order to try to retrieve as many live streams as possible, while keeping in mind the constraints of the rate-limit and a potentially impatient user, I approached this problem using recursion:

function getAllStreams (cursor, data = [], counter = 15) {
  while (counter !== 0) {
    const request = new Request('https://api.twitch.tv/helix/streams?first=100' + (cursor ? '&after=' + cursor : ''), { 
      method: 'GET' ,
      headers: {
        'Client-ID': clientId,
        'Authorization': `Bearer ${access_token}`,
        'Content-Type' : 'application/x-www-form-urlencoded; charset=UTF-8'
        }
      });

    return fetch(request).then((response) => response.json()).then((responseJson) => { 
      if (counter === 1) return data;
      data.push(...responseJson.data);
      return getAllStreams(responseJson.pagination.cursor, data, --counter);
    });
  }
}

I found that each request took about half a second to complete, so that meant I also needed to limit the number of requests made in order to keep the user engaged, and I specify that limit as a default argument counter. While 1500 streams might not seem like a big number, it does make it possible to recreate the experience of viewing a single random stream.

I would appreciate any suggestions or critiques of my approach, as this was the first time I've worked with and tried to 'crawl' a paginated API. I just wanted to share the way I went about using this endpoint in order to try to help other developers who attempt to do the same.

Thanks for reading!

Top comments (1)

Eric_Eberhart • Feb 9

To retrieve live stream data from Twitch.tv API with paginated results, you can follow these steps:

Authentication: Obtain an OAuth token to authenticate your requests. You can generate a token by registering your application with Twitch and following their authentication process.

API Endpoint: Use the Twitch API endpoint to retrieve live streams. The endpoint for getting live streams is usually something like api.twitch.tv/helix/streams.

Paginated Results: Twitch API provides pagination through query parameters like first (to specify the number of results per page) and after (to specify the cursor for the next page). By default, Twitch returns 20 streams per page.

Send Request: Make an HTTP GET request to the Twitch API endpoint, including your OAuth token in the request headers for authentication. You can also include query parameters to specify the number of results per page (first) and the cursor for the next page (after).

Handle Response: Parse the JSON response returned by the Twitch API. Extract the live stream data from the response body, including information like streamer name, game being played, viewer count, etc.

Paginate through Results: If the API response includes a pagination object with a cursor for the next page, use this cursor to make subsequent requests for additional pages of live streams data.

Process Data: Process the live stream data as needed for your application, such as displaying it in a list, filtering by specific criteria, or performing analytics.

Here's a simplified example using Python with the requests library:

python
Copy code
import requests

Set up authentication

oauth_token = "YOUR_OAUTH_TOKEN"
headers = {
"Authorization": f"Bearer {oauth_token}",
"Client-ID": "YOUR_CLIENT_ID"
}

API endpoint for getting live streams

endpoint = "api.twitch.tv/helix/streams"

Parameters for pagination

params = {
"first": 20 # Number of results per page
}

Make initial request

response = requests.get(endpoint, headers=headers, params=params)
data = response.json()

Process first page of live streams

process_live_streams(data['data'])

Check if there are more pages to fetch

while 'pagination' in data and 'cursor' in data['pagination']:
cursor = data['pagination']['cursor']
params['after'] = cursor

# Make request for next page
response = requests.get(endpoint, headers=headers, params=params)
data = response.json()

# Process next page of live streams
process_live_streams(data['data'])

def process_live_streams(streams):
# Process live stream data (e.g., display streamer name, game, viewer count)
for stream in streams:
print(stream['user_name'], stream['game_name'], stream['viewer_count'])

Remember to replace "YOUR_OAUTH_TOKEN" and "YOUR_CLIENT_ID" with your actual OAuth token and client ID obtained from Twitch. Additionally, adjust the process_live_streams function to suit your application's requirements for handling live stream data.

DEV Community

Twitch.tv API - Get live stream data from paginated results

Top comments (1)

Set up authentication

API endpoint for getting live streams

Parameters for pagination

Make initial request

Process first page of live streams

Check if there are more pages to fetch

Read next

The Tale of Tailwind CSS and React

Decoding JavaScript Variables: A Comprehensive Overview

3 Ways to Add a Table of Contents to Your Blog Post

New Dev Challenge Announcement: 10 Products in 10 Weeks