loading...
Cover image for Web Scraping with PHP: COVID-19 Outbreak Data based on Channel News Asia

Web Scraping with PHP: COVID-19 Outbreak Data based on Channel News Asia

sonyarianto profile image Sony AK ・4 min read

Today I will share how to scrape data of COVID-19 outbreak that contains country, confirmed cases and reported deaths. The source is from Singapore based media called Channel News Asia (https://www.channelnewsasia.com). After analyzing their website, the data is actually stored on Google Spreadsheet and we can access it in the form of JSON format.

Here is the end-point of the data.

https://spreadsheets.google.com/feeds/list/1lwnfa-GlNRykWBL5y7tWpLxDoCfs8BvzWxFjeOZ1YJk/1/public/values?alt=json

I will use PHP to scrape that and here is the PHP code. Filename covid_19_outbreak.php.

<?php
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, 'https://spreadsheets.google.com/feeds/list/1lwnfa-GlNRykWBL5y7tWpLxDoCfs8BvzWxFjeOZ1YJk/1/public/values?alt=json');
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_ENCODING, '');
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($curl, CURLOPT_TIMEOUT, 30);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36');
    $curlData = curl_exec($curl);

    if(curl_errno($curl) == 28) {
        $isTimeout = true;
    } else {
        $isTimeout = false;
    }

    curl_close($curl);

    if($isTimeout) {
        echo 'Timeout!' . "\n";
        exit;
    }

    if(trim($curlData) == '') {
        echo 'Curl data empty!' . "\n";
        exit;
    }

    $jsonData = json_decode(mb_convert_encoding($curlData, 'HTML-ENTITIES', 'UTF-8'));

    if(!isset($jsonData->{'feed'}->{'entry'})) {
        $isEntryFound = false;
        echo 'No data!' . "\n";
        exit;
    } else {
        $isEntryFound = true;
    }

    $updatedDatetimeUtc = $jsonData->{'feed'}->{'updated'}->{'$t'};
    $updatedDatetime = trim(date("Y-m-d H:i:s", strtotime($updatedDatetimeUtc)));

    echo 'Updated Time UTC: ' . $updatedDatetimeUtc . "\n";

    if($isEntryFound) {
        $iCounter = 1;
        foreach($jsonData->{'feed'}->{'entry'} as $eachData) {
            $country = trim($eachData->{'title'}->{'$t'});
            $confirmedCases = trim($eachData->{'gsx$confirmedcases'}->{'$t'});
            $reportedDeaths = trim($eachData->{'gsx$reporteddeaths'}->{'$t'}) != '' ? trim($eachData->{'gsx$reporteddeaths'}->{'$t'}) : 0;

            echo $iCounter . '. ' . $country . ' - Confirmed Cases: ' . $confirmedCases . ' - Reported Deaths: ' . $reportedDeaths . "\n";

            $iCounter++;
        }
    }

You can try to run it.

php covid_19_outbreak.php

And here is the output sample.

sony@ubuntu:/playground (master)$ php covid_19_outbreak.php
Updated Time UTC: 2020-03-01T06:45:15.274Z
1. China - Confirmed Cases: 79824 - Reported Deaths: 2,870
2. Singapore - Confirmed Cases: 102 - Reported Deaths: 0
3. South Korea - Confirmed Cases: 3526 - Reported Deaths: 16
4. Cruise ship (Diamond Princess) - Confirmed Cases: 705 - Reported Deaths: 6
5. Japan - Confirmed Cases: 235 - Reported Deaths: 5
6. Hong Kong - Confirmed Cases: 94 - Reported Deaths: 2
7. Thailand - Confirmed Cases: 42 - Reported Deaths: 0
8. Taiwan - Confirmed Cases: 39 - Reported Deaths: 1
9. Malaysia - Confirmed Cases: 25 - Reported Deaths: 0
10. Germany - Confirmed Cases: 66 - Reported Deaths: 0
11. Vietnam - Confirmed Cases: 16 - Reported Deaths: 0
12. Australia - Confirmed Cases: 26 - Reported Deaths: 1
13. United States - Confirmed Cases: 67 - Reported Deaths: 1
14. France - Confirmed Cases: 73 - Reported Deaths: 2
15. Macau - Confirmed Cases: 10 - Reported Deaths: 0
16. United Arab Emirates - Confirmed Cases: 21 - Reported Deaths: 0
17. United Kingdom - Confirmed Cases: 23 - Reported Deaths: 0
18. Canada - Confirmed Cases: 16 - Reported Deaths: 0
19. India - Confirmed Cases: 3 - Reported Deaths: 0
20. Philippines - Confirmed Cases: 3 - Reported Deaths: 1
21. Italy - Confirmed Cases: 1128 - Reported Deaths: 29
22. Russia - Confirmed Cases: 5 - Reported Deaths: 0
23. Spain - Confirmed Cases: 45 - Reported Deaths: 0
24. Iran - Confirmed Cases: 593 - Reported Deaths: 43
25. Pakistan - Confirmed Cases: 4 - Reported Deaths: 0
26. Nepal - Confirmed Cases: 1 - Reported Deaths: 0
27. Cambodia - Confirmed Cases: 1 - Reported Deaths: 0
28. Sri Lanka - Confirmed Cases: 1 - Reported Deaths: 0
29. Finland - Confirmed Cases: 2 - Reported Deaths: 0
30. Sweden - Confirmed Cases: 12 - Reported Deaths: 0
31. Belgium - Confirmed Cases: 1 - Reported Deaths: 0
32. Egypt - Confirmed Cases: 1 - Reported Deaths: 0
33. Lebanon - Confirmed Cases: 7 - Reported Deaths: 0
34. Israel - Confirmed Cases: 7 - Reported Deaths: 0
35. Bahrain - Confirmed Cases: 41 - Reported Deaths: 0
36. Kuwait - Confirmed Cases: 45 - Reported Deaths: 0
37. Afghanistan - Confirmed Cases: 1 - Reported Deaths: 0
38. Iraq - Confirmed Cases: 8 - Reported Deaths: 0
39. Oman - Confirmed Cases: 6 - Reported Deaths: 0
40. Croatia - Confirmed Cases: 5 - Reported Deaths: 0
41. Austria - Confirmed Cases: 10 - Reported Deaths: 0
42. Switzerland - Confirmed Cases: 15 - Reported Deaths: 0
43. Algeria - Confirmed Cases: 1 - Reported Deaths: 0
44. Brazil - Confirmed Cases: 1 - Reported Deaths: 0
45. Greece - Confirmed Cases: 4 - Reported Deaths: 0
46. Romania - Confirmed Cases: 3 - Reported Deaths: 0
47. Georgia - Confirmed Cases: 3 - Reported Deaths: 0
48. Norway - Confirmed Cases: 7 - Reported Deaths: 0
49. North Macedonia - Confirmed Cases: 1 - Reported Deaths: 0
50. Denmark - Confirmed Cases: 3 - Reported Deaths: 0
51. Estonia - Confirmed Cases: 1 - Reported Deaths: 0
52. Netherlands - Confirmed Cases: 4 - Reported Deaths: 0
53. Nigeria - Confirmed Cases: 1 - Reported Deaths: 0
54. Lithuania - Confirmed Cases: 1 - Reported Deaths: 0
55. New Zealand - Confirmed Cases: 1 - Reported Deaths: 0
56. Belarus - Confirmed Cases: 1 - Reported Deaths: 0
57. Azerbaijian - Confirmed Cases: 3 - Reported Deaths: 0
58. Mexico - Confirmed Cases: 3 - Reported Deaths: 0
59. Iceland - Confirmed Cases: 1 - Reported Deaths: 0
60. Qatar - Confirmed Cases: 1 - Reported Deaths: 0
61. San Marino - Confirmed Cases: 1 - Reported Deaths: 0
62. Monaco - Confirmed Cases: 1 - Reported Deaths: 0
63. Luxembourg - Confirmed Cases: 1 - Reported Deaths: 0

Some TODOs are you can still sort based on Confirmed Cases, Country and Reported Deaths column.

Finally, I say stay strong to all victims of this disease and stay alert to all people who are still healthy.

Source code also available at GitHub at https://github.com/sonyarianto/covid-19-outbreak-based-on-channelnewsasia-web

Credits

Posted on by:

Discussion

pic
Editor guide
 

What about using the public arcgis database for that?

The database is hosted on github, and the website is here.

 

Hi Diane,
Thanks for the link, I will check it out :)

 

Hi, i would like to use this reference in a my web project, i can do it?I do not know if is open-source...I' m a high school student, and i'm looking forward to the university, and i really like to contribute the sharing of Covid-19's data!