DEV Community

Cover image for Twitch.tv API - Get live stream data from paginated results
Aaron Goldsmith
Aaron Goldsmith

Posted on • Updated on

Twitch.tv API - Get live stream data from paginated results

Recently, I wanted to work with the Twitch API to try to recreate the website twitchroulette.net, where you would be able to view a completely random live stream from all of the streams currently happening on the site. According to analytics from twitchtracker.com, there are currently an average of over 100,000 Twitch live streams at any given time.

Alt Text

When I went through the Twitch API documentation, I discovered that for the endpoint https://api.twitch.tv/helix/streams to get live streams, Twitch limits the response to a maximum of 100 streams per API call. However, the response includes a pagination field which contains a cursor value (a string) which is used in a subsequent requests to specify the starting point of the next set of results.

The response body for the GET request at https://api.twitch.tv/helix/streams?first=100 would include the top 100 most active live streams, and the data looks like this:

{
  "data": [
    {
      "id": "41375541868",
      "user_id": "459331509",
      "user_login": "auronplay",
      "user_name": "auronplay",
      "game_id": "494131",
      "game_name": "Little Nightmares",
      "type": "live",
      "title": "hablamos y le damos a Little Nightmares 1",
      "viewer_count": 78365,
      "started_at": "2021-03-10T15:04:21Z",
      "language": "es",
      "thumbnail_url": "https://static-cdn.jtvnw.net/previews-ttv/live_user_auronplay-{width}x{height}.jpg",
      "tag_ids": [
        "d4bb9c58-2141-4881-bcdc-3fe0505457d1"
      ]
    },
    ...
  ],
  "pagination": {
    "cursor": "eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19"
  }
}
Enter fullscreen mode Exit fullscreen mode

If you wanted to retrieve the next 100 most active live streams, the subsequent API request URL would need to be:

https://api.twitch.tv/helix/streams?first=100&after=eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19
Enter fullscreen mode Exit fullscreen mode


This includes as its after value the cursor value returned in the prior response.

It's not possible to sort the responses by least active, so in order to get results with streams with very few or no viewers, you would need results for the more active streams first.

It is also important to note that the Twitch API is rate-limited to 800 requests per minute, so the maximum number of livestreams we could retrieve in that time is 80,000, which is substantially lower than the current weekly average. It's therefore plausible that trying to get a truly complete list of results for live streams would run the risk of causing a HTTP 429 error (too many requests).

In order to try to retrieve as many live streams as possible, while keeping in mind the constraints of the rate-limit and a potentially impatient user, I approached this problem using recursion:

function getAllStreams (cursor, data = [], counter = 15) {
  while (counter !== 0) {
    const request = new Request('https://api.twitch.tv/helix/streams?first=100' + (cursor ? '&after=' + cursor : ''), { 
      method: 'GET' ,
      headers: {
        'Client-ID': clientId,
        'Authorization': `Bearer ${access_token}`,
        'Content-Type' : 'application/x-www-form-urlencoded; charset=UTF-8'
        }
      });

    return fetch(request).then((response) => response.json()).then((responseJson) => { 
      if (counter === 1) return data;
      data.push(...responseJson.data);
      return getAllStreams(responseJson.pagination.cursor, data, --counter);
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

I found that each request took about half a second to complete, so that meant I also needed to limit the number of requests made in order to keep the user engaged, and I specify that limit as a default argument counter. While 1500 streams might not seem like a big number, it does make it possible to recreate the experience of viewing a single random stream.

I would appreciate any suggestions or critiques of my approach, as this was the first time I've worked with and tried to 'crawl' a paginated API. I just wanted to share the way I went about using this endpoint in order to try to help other developers who attempt to do the same.

Thanks for reading!

Top comments (1)

Collapse
 
ericeberhart profile image
Eric_Eberhart

To retrieve live stream data from Twitch.tv API with paginated results, you can follow these steps:

Authentication: Obtain an OAuth token to authenticate your requests. You can generate a token by registering your application with Twitch and following their authentication process.

API Endpoint: Use the Twitch API endpoint to retrieve live streams. The endpoint for getting live streams is usually something like api.twitch.tv/helix/streams.

Paginated Results: Twitch API provides pagination through query parameters like first (to specify the number of results per page) and after (to specify the cursor for the next page). By default, Twitch returns 20 streams per page.

Send Request: Make an HTTP GET request to the Twitch API endpoint, including your OAuth token in the request headers for authentication. You can also include query parameters to specify the number of results per page (first) and the cursor for the next page (after).

Handle Response: Parse the JSON response returned by the Twitch API. Extract the live stream data from the response body, including information like streamer name, game being played, viewer count, etc.

Paginate through Results: If the API response includes a pagination object with a cursor for the next page, use this cursor to make subsequent requests for additional pages of live streams data.

Process Data: Process the live stream data as needed for your application, such as displaying it in a list, filtering by specific criteria, or performing analytics.

Here's a simplified example using Python with the requests library:

python
Copy code
import requests

Set up authentication

oauth_token = "YOUR_OAUTH_TOKEN"
headers = {
"Authorization": f"Bearer {oauth_token}",
"Client-ID": "YOUR_CLIENT_ID"
}

API endpoint for getting live streams

endpoint = "api.twitch.tv/helix/streams"

Parameters for pagination

params = {
"first": 20 # Number of results per page
}

Make initial request

response = requests.get(endpoint, headers=headers, params=params)
data = response.json()

Process first page of live streams

process_live_streams(data['data'])

Check if there are more pages to fetch

while 'pagination' in data and 'cursor' in data['pagination']:
cursor = data['pagination']['cursor']
params['after'] = cursor

# Make request for next page
response = requests.get(endpoint, headers=headers, params=params)
data = response.json()

# Process next page of live streams
process_live_streams(data['data'])
Enter fullscreen mode Exit fullscreen mode

def process_live_streams(streams):
# Process live stream data (e.g., display streamer name, game, viewer count)
for stream in streams:
print(stream['user_name'], stream['game_name'], stream['viewer_count'])

Remember to replace "YOUR_OAUTH_TOKEN" and "YOUR_CLIENT_ID" with your actual OAuth token and client ID obtained from Twitch. Additionally, adjust the process_live_streams function to suit your application's requirements for handling live stream data.