DEV Community

Cover image for Using puppeteer to retrieve google business reviews
Benoît Arnoult
Benoît Arnoult

Posted on • Updated on

Using puppeteer to retrieve google business reviews

The code in this article is available in this github repository :
https://github.com/PiiXelx64/node-google-reviews-web-scraper, and there is an NPM package too : https://www.npmjs.com/package/google-reviews-web-scraper

If there is a feature lacking from the Google Maps API that could be really useful, it would be allowing the retrieval of reviews from a place. However, we could always scrape the data from the website. And that's exactly what I did to solve this problem.

On a techlead video

Our stack

To create or review fetcher, we are going to use NodeJS and plain old JS. We could've used TypeScript, but it wouldn't be much of a benefit. Using NodeJS allows us access to Puppeteer, an headless chrome implementation callable via code.

As Google likes to use JavaScript, we couldn't use something like axios to fetch data from the page, as it wouldn't have ran the JavaScript code necessary to show what interests us here : the reviews.

Setting up the project

We are going to create an npm project, and a git repo. For those things, you'll need git and npm installed on your system, and to run these two simple commands :

npm init
git init
Enter fullscreen mode Exit fullscreen mode

After that, we're ready to start working

Setting up puppeteer

To get puppeteer up and running, we need to first import it :

const puppeteer  = require('puppeteer');
Enter fullscreen mode Exit fullscreen mode

Once it is imported, we would need to create an async function, here I'll call it getReviews, and give it an url parameter.

const getReviews = async (url) => { /* code */ }
Enter fullscreen mode Exit fullscreen mode

Then in this method, we need to create a browser, create a page, go to the page that interests us, and wait for the components that we want to manipulate to be loaded.

Finding the components we need to wait for.

To find the class or the id of the components you want to find, you can use the inspector of your browser.
Inspecting to find the content we need to wait for.
Here, we can see that the class .section-review-textcontains the text review, so we just need to wait for it.

So now, our getReviews method contains this :

const getReviews = async (url) => { 
    // no sandbox for the headless browser
    const browser = await puppeteer.launch({args: ['--disabled-setuid-sandbox', '--no-sandbox']});
    const page = await browser.newPage();
    await page.goto(url);
    console.log('waiting for selector');
    await page.waitForSelector('.section-review-text');
}
Enter fullscreen mode Exit fullscreen mode

Now that we have loaded the page, we need to somehow get the data. for this, we can use the page.evaluate() method. Here, we want to get the review authors, the publish dates, the ratings, and the text of the review.

const data = await page.evaluate(() => {
    let reviewAuthorNamesClasses = document.getElementsByClassName('section-review-title');
    let reviewAuthorNames = [];
    for (let elements of reviewAuthorNamesClasses) {
        reviewAuthorNames.push(elements.innerText);
    }
    let datesClasses = document.getElementsByClassName('section-review-publish-date');
    let dates = [];
    for(let elements of datesClasses) {
        dates.push(elements.innerText);
    }

    let ratingsClasses = document.getElementsByClassName('section-review-stars');
    let ratings = [];
    for (let elements of ratingsClasses) {
        ratings.push(elements.children.length);
    }

    let reviewsContentClasses = document.getElementsByClassName('section-review-text');
    let reviewsContent = []
    for(let elements of reviewsContentClasses) {
        reviewsContent.push(elements.innerText);
    }
    return {
        reviewAuthorNames,
        dates,
        ratings,
        reviewsContent
    }
})
Enter fullscreen mode Exit fullscreen mode

Now, our data constant will contain 4 arrays, each array containing one of the data points that compose a review.
Once we're done with the headless browser, we'll need to close it. for this, we can use browser.close();.

Now that we have the data we need, we can return a promise with the data constant in it. Our getReviews method now looks like this :

const getReviews = async (url) => {
    const browser = await puppeteer.launch({args: ['--disabled-setuid-sandbox', '--no-sandbox']});
    const page = await browser.newPage();
    await page.goto(url);
    console.log(page.url);
    await page.waitForSelector('.section-review-text');
    const data = await page.evaluate(() => {
        let reviewAuthorNamesClasses = document.getElementsByClassName('section-review-title');
        let reviewAuthorNames = [];
        for (let elements of reviewAuthorNamesClasses) {
            reviewAuthorNames.push(elements.innerText);
        }
        let datesClasses = document.getElementsByClassName('section-review-publish-date');
        let dates = [];
        for(let elements of datesClasses) {
            dates.push(elements.innerText);
        }

        let ratingsClasses = document.getElementsByClassName('section-review-stars');
        let ratings = [];
        for (let elements of ratingsClasses) {
            ratings.push(elements.children.length);
        }

        let reviewsContentClasses = document.getElementsByClassName('section-review-text');
        let reviewsContent = []
        for(let elements of reviewsContentClasses) {
            reviewsContent.push(elements.innerText);
        }
        return {
            reviewAuthorNames,
            dates,
            ratings,
            reviewsContent
        }
    })
    browser.close();
    return new Promise((resolve, reject) => {
        resolve(data);
        if(reject) {
            reject({error: "error while scraping data."})
        }
    })

};
Enter fullscreen mode Exit fullscreen mode

We can now export our method as a module :

module.exports = getReviews;
Enter fullscreen mode Exit fullscreen mode

Testing our method

Now that our method is done, we can test it by :

  1. importing the module
  2. using our module to get the reviews for a place. For the place, I'm going to use the Eiffel Tower. its place url is the following: https://www.google.com/maps/place/Tour+Eiffel/@48.8583736,2.292298,17z/data=!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813.

I'm just going to log the data as JSON in my console for this example, I could also use an express server and serve it through the internet.

const getReviews = require('./getReviews');

async function main() {
    try {
        const data = await getReviews("https://www.google.com/maps/place/Tour+Eiffel/@48.8583736,2.292298,17z/data=!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813");
        console.log(JSON.stringify(data));
    } catch(e) {
        console.log(e);
    }

}

main();
Enter fullscreen mode Exit fullscreen mode

And my terminal output is the following :

{ reviewAuthorNames:
   [ ' Romain VILCOQ ', ' Sylvain Justine ', ' Alexandre MASSON ' ],
  dates: [ 'il y a 3 semaines', 'il y a 2 jours', 'il y a 5 jours' ],
  ratings: [ 5, 5, 5 ],
  reviewsContent:
   [ 'La dame de fer est l\'emblème de notre capitale, le monument à visiter en priorité. \nLa vue depuis le sommet est incontournable !\nL\'ascension par les escaliers est une belle expérience et permet de profiter au mieux de la structure, cependant elle est réservée aux plus sportifs. La descente est possible également ���',
     'Lieu sécurisé, pas de file d\'attente. C top',
     'Magnifique et incontournable monument de la capitale française. A absolument faire lors de votre visite parisienne ! Haute de 321 mètres, cette tour de fer surplombe la région parisienne. Véritable prouesse architecturale et scientifique, …' ] }
{"reviewAuthorNames":[" Romain VILCOQ "," Sylvain Justine "," Alexandre MASSON "],"dates":["il y a 3 semaines","il y a 2 jours","il y a 5 jours"],"ratings":[5,5,5],"reviewsContent":["La dame de fer est l'emblème de notre capitale, le monument à visiter en priorité. \nLa vue depuis le sommet est incontournable !\nL'ascension par les escaliers est une belle expérience et permet de profiter au mieux de la structure, cependant elle est réservée aux plus sportifs. La descente est possible également �😉","Lieu sécurisé, pas de file d'attente. C top","Magnifique et i
ontournable monument de la capitale française. A absolument faire lors de votre visite parisienne ! Haute de 321 mètres, cette tour de fer surplombe la région parisienne. Véritable prouesse architecturale et scientifique, …"]}
Enter fullscreen mode Exit fullscreen mode

And there we go !

What we learnt do to in this project

  • use promises
  • web scraping
  • using an headless browser to get data from a js-only website.

How could this project be improved upon ?

  • create an API based on this code
  • use of workers threads

Top comments (0)