DEV Community

Cover image for Improve Speed Of Importing Data from Third Party with HTTP Request
Ching Cheng Kang
Ching Cheng Kang

Posted on • Updated on

Improve Speed Of Importing Data from Third Party with HTTP Request

Intro

Every hour, my Covid 19 Dashboard will get the data from the Ministry of Health Malaysia and COVID-19 Immunisation Task Force Malaysia from Github to import the open data to my database.

It used to take ~ 1 minute to

  1. make HTTP request to Github
  2. parse and transform from CSV to laravel collection
  3. check if the total record same with table count
  4. if the count is different, insert it.

Although it works, there will one minute downtime in my website each day and i decide to work on it

First Step

Instead checking with database every time, it might be better generating the hash value of that csv and saving it to cache.
If the hash is the same, we just not process it, and return an empty collection.

use Illuminate\Support\Facades\Cache;

private function getRecord(string $key): array
{
    $content = $this->recordHolder[$key];
    $hash = sha1($content);
    $exists = true;
    if (!(Cache::has($key) && Cache::get($key) == $hash)) {
        Cache::put($key, $hash, now()->addDay());
        $exists = false;
    }

    return [$content, $exists];
}

private function getCasesState(): ?Collection
{
    [$content, $exists] = $this->getRecord('CASES_STATE');
    if ($exists) {
        return null;
    }
    //...
}
Enter fullscreen mode Exit fullscreen mode

Sure, this help to reduce some computing time, but it is only saving me ~10% of speed! So... why stop here?

Second Step

After taking bath (as my way of debugging), I realise that maybe the reason why it's slow can be the HTTP Request itself.

So i convert it to async promise to resolve the HTTP Request by Guzzle Http when initialise the Service class.

use GuzzleHttp\Client;
use GuzzleHttp\Promise\Utils;

public function __construct()
{
    $client = new Client();
    $promises = [];

    foreach (array_keys(self::url) as $url) {
        $promises[$url] = $client->getAsync(self::url[$url]);
    }

    $responses = Utils::settle($promises)->wait();
    foreach ($responses as $key => $response) {
        $this->recordHolder[$key] = collect(explode(PHP_EOL, $response['value']->getBody()))->splice(1, -1);
    }
}
Enter fullscreen mode Exit fullscreen mode

Oh boi, only at this point i realise how dumb i was to not do this before.

From ~1 min, to within 10 seconds.
Not too shabby if we consider it is ~70k records are inserted in this small amount of time.

ofcoz, before inserting, the data is chunked to 500 item each operation for faster insert speed.

use Illuminate\Support\Facades\DB;
//...
$chunks = $records->chunk(500);
foreach ($chunks as $chunk) {
    DB::table($tableName)->insert($chunk->toArray());
}
Enter fullscreen mode Exit fullscreen mode

Here are the final result and test run in my local environment.
Final Result


Hey its me few month now, So you can do async request with Laravel build-in Http:pool() method!
The code will look something similar like

 collect(Http::pool(function (Pool $pool) {
            return collect(self::url)
                ->map(fn($url, $key) => $pool->as($key)->get($url))
                ->toArray();
        }))
            ->each(fn(Response $res, $key) => $this->recordHolder[$key] = collect(explode(PHP_EOL, $res->body()))->splice(1, -1);

Enter fullscreen mode Exit fullscreen mode

Cover Photo by Marc-Olivier Jodoin on Unsplash

Top comments (0)