I suggest that we dive deeper into HTTP requests to external APIs using the example of the GitHub REST API and a simple PHP script.
In past posts, we learned how to do a lot of requests in a loop and ran into limitations on the GitHub API side.
"message": "API rate limit exceeded for *.*.*.*.
(But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", "documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"
The API tells us the cause of the problem, but this is not always the case. More often we will have to guess on our own. It explicitly states that we need to read about rate limits.
After reading the documentation, I realize that:
- We made requests as unauthenticated users. We did not use login/pass or token.
- The API has rate limits and they may be different for different requests and authentication methods.
- The current limits can be seen in the HTTP headers.
GitHub limits non-authenticated users to 60 requests per hour, and we did more than 10 in a few seconds. But if we are an authenticated user, we can do 5,000 or even 15,000 requests per hour.
So we have the following plan:
- Add authentication to API requests.
- Add a function to read headers containing information about limits, so as not to exceed the limits.
Let's look at the documentation and the available authentication methods
I explored the available authentication methods for the GitHub REST API on the Authenticating to the REST API page
I found that we need to get a special token and add it to the headers.
So we've looked at the limits and we have a plan:
- Make a token.
- Add the token to the headers.
- Have fun!
Let's add authentication with a personal access token
You need to find the personal token creation page in your GitHub profile settings and create it, this page.
Save Token! :)
Stop our application if it is running.
docker-compose down
We need to pass our Token to the application, we use the environment variables in the docker-compose.yml file to do this.
Let's add a new environment variable GITHUB_TOKEN to the php-fpm service, because you will need it in your PHP code. Token is private information not to be shared with anyone.
docker-compose.yml
php-fpm:
image: php8.2-fpm-mongo
volumes:
- ./app:/var/www/html
environment:
DB_USERNAME: root
DB_PASSWORD: secret
DB_HOST: mongodb
GITHUB_TOKEN: my_secret_token
We can run our application with the command
docker-compose up -d
Now our TOKEN will be available in PHP to use anywhere. We need to replace our simple Guzzle initialization with a new one using Token. Let's do this right after initializing the database in app/init.php.
app/init.php
if (isset($local_conf['GITHUB_TOKEN'])) {
define('GITHUB_TOKEN', $local_conf['GITHUB_TOKEN']);
$app['http'] = new \GuzzleHttp\Client(
['headers' =>
[
'Authorization' => 'Bearer ' . GITHUB_TOKEN
]
]
);
} else {
$app['http'] = new \GuzzleHttp\Client();
}
You can notice that I made two modes. If we have Token in the environment variables we will do authentication, if not we will do requests without authentication.
Let's make a test request.
Great! Last time we were able to make only 9 requests and hit the limits. Now we have 30 requests and we have 770 repositories in the database.
HTTP headers and development of logic for dealing with limits
The Rate limit headers section in the documentation tells us about the headers that contain information about limits.
If we can read these headers, we can design a function to not exceed these limits.
Open the function fn_github_api_request
which executes HTTP requests and add a function to read the headers and print them.
app/func/github.php
$response = $app['http']->request($method, $url , [
'query' => $params
]);
$headers = $response->getHeaders();
dd($headers);
As a result, we will see the headers and their contents.
We will also see how many requests we have left until the next limit update.
Next, let's work the logic out:
- After each request, we will save the value of X-RateLimit-Remaining and X-RateLimit-Reset
- Before each request, we will check if we have information about the limits from the last request, this value, and if it is 0. Then we will wait for a number of seconds before resetting.
I added two new functions, setting limit information and checking limit information.
app/func/github.php
function fn_github_api_request_limits_set(&$app, $response)
{
$headers = $response->getHeaders();
$app['github_http']['limits']['remaining'] = (int) $headers['X-RateLimit-Remaining'][0];
$app['github_http']['limits']['reset'] = (int) $headers['X-RateLimit-Reset'][0];
}
function fn_github_api_request_limits_check($app)
{
if (isset($app['github_http']['limits'])) {
$remaining = $app['github_http']['limits']['remaining'];
if ($remaining == 0) {
$reset = $app['github_http']['limits']['reset'] - time();
fn_print_progress($app, 'Github API X-RateLimits will be reset in ' . $reset . ' sec.', true);
sleep($reset+1);
}
}
}
I also added the useful function fn_print_progress
, I added it to the file app/func/common.php
, for general useful functions.
This function prints the result of the application in a convenient form.
Now I can see the limits and the time in microseconds and seconds.
After playing around a bit with the code with and without Token, with the if ($remaining < 2)
setting, I checked that everything works fine for the current stage and the sleep function works as needed.
Bonus functions
In the process of development before the current one, I added some useful features. I will not publish them here as it takes a lot of code. You can find them in the repository
These are just useful functions to display information while the script is running and when it's finished.
You may notice that running a script that has run dozens of requests and saved hundreds of repositories to the database consumes only 2-3 MB of RAM.
This is important when I run long-running scripts that can make thousands of queries, if I save the data in an array, it can get bigger and the script will crash because of lack of allocated memory.
We now have a feature that will let you know the amount of memory consumed at any time.
Conclusion
We have learned how to do authentication and perform an unlimited number of requests, taking into account the limits.
We've added some useful features and more logic.
Now, we can get more data from the API and even do it in multiple threads by running the scripts several times as separate processes from the console. We'll learn how to do this next time.
The code for the current state of the project is available in the repository.
Thank you!
Top comments (0)