Serpdog

Posted on Oct 22, 2022 • Edited on Oct 23, 2022 • Originally published at serpdog.io

How to scrape Google Maps Places?

#beginners #tutorial #javascript #programming

Introduction

In this tutorial, we will discuss how we can scrape Google Maps Places Results. And at the end, we will see how Serpdog's | Google Maps Reviews API can help you scrape Google Maps Reviews without any extra effort, which we require in scraping Google.

Requirements

Web Parsing with CSS selectors

To search the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.

This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.

You can also rotate User Agents, read more about this in this article: How to fake and rotate User Agents using Python 3.

If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Google.

Install Libraries

Before we begin, install these libraries so we can move forward and prepare our scraper.

Puppeteer JS

Or you can type the below commands in your project terminal to install the libraries:

npm i puppeteer

Target:

Process:

So, we have installed all the libraries required for this project. We will now prepare our scraper. Copy the below URL in your browser, and you will see the results as shown in the above image.

https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801

First, we will create the driver function, which will launch the browser and navigate to the target URL.

    const getMapsPlacesData = async () => {
     try {
        const url = "https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801";

        browser = await puppeteer.launch({
            headless: false,
            args: ["--disabled-setuid-sandbox", "--no-sandbox"],
        });
        const [page] = await browser.pages();

        await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
        await page.waitForTimeout(3000);

        const data = await extractData(page);
        console.log(data)

        await browser.close();
     }
     catch (e) {
        console.log(e);
     }
    }

Step-by-step explanation:

puppeteer.launch() - This will launch the Chromium browser with the options we have set in our code. In our case, we are launching our browser in non-headless mode.
browser.newPage() - This will open a new page or tab in the browser.
page.setExtraHTTPHeaders() - It is used to pass HTTP headers with every request the page initiates.
page.goto() - This will navigate the page to the specified target URL.
page.waitForTimeout() - It will cause the page to wait for 3 seconds to do further operations.
extractData() - At last, we called our function to extract the data we need from the page.

Now, let us prepare our parser and extract the required data.

    const extractData = async (page) => {
    let items = await page.evaluate(() => {
    let i = 0;
    return {
        title: document.querySelector(".fontHeadlineLarge")?.textContent,
        rating: document.querySelector(".F7nice")?.textContent,
        reviews: document.querySelector(".mmu3tf .DkEaL")?.textContent,
        type: document.querySelector(".u6ijk")?.textContent,
        service_options: document.querySelector(".E0DTEd")?.textContent.replaceAll("·", ""),
        address: document.querySelector("button[data-tooltip='Copy address']")?.textContent.trim(),
        website: document.querySelector("a[data-tooltip='Open website']")?.textContent.trim(),
        pluscode: document.querySelector("button[data-tooltip='Copy plus code']")?.textContent.trim(),
        timings: Array.from(document.querySelectorAll(".OqCZI tr")).map((el) => {
            return {
                [el.querySelector("td:first-child")?.textContent.trim()]: el.querySelector("td:nth-child(2) li.G8aQO")?.textContent,
            };
        }),
        popularTimes: {
            graphResults: Array.from(document.querySelectorAll(".C7xf8b > div")).map((el) => {
                let day;
                if (i == 0) {
                    day = "Sunday"
                }
                else if (i == 1) {
                    day = "Monday"
                }
                else if (i == 2) {
                    day = "Tuesday"
                }
                else if (i == 3) {
                    day = "Wednesday"
                }
                else if (i == 4) {
                    day = "Thursday"
                }
                else if (i == 5) {
                    day = "Friday"
                }
                else if (i == 6) {
                    day = "Saturday"
                }
                i++;
                return {
                    [day]: Array.from(el.querySelectorAll(`.dpoVLd`)).map((el) => {
                        const time = el.getAttribute("aria-label").split("at")[1].trim();
                        const busy_percentage = el.getAttribute("aria-label").split("busy")[0].trim();
                        return {
                            time,
                            busy_percentage,
                        };
                    }),
                };
            }),
        },
        photos: Array.from(document.querySelectorAll(".dryRY .ofKBgf")).map((el) => {
            return {
                title: el.getAttribute("aria-label"),
                thumbnail: el.querySelector("img").getAttribute("src"),
            }
        }),
        question_and_answers: {
            question: document.querySelector(".Py6Qke")?.textContent,
            answer: document.querySelector(".l79Qmc").textContent
        },
        user_ratings: Array.from(document.querySelectorAll(".ExlQHd tr")).map((el) => {
            return {
                [el.getAttribute("aria-label")?.split(",")[0].trim()]: el.getAttribute("aria-label")?.split(",")[1].trim(),
            };
        }),
        user_reviews: Array.from(document.querySelectorAll(".tBizfc")).map((el) => {
            return {
                description: el.textContent.replaceAll('"', "").trim(),
                user_link: el.querySelector("a").getAttribute("href")
            }
        }),
        mentions: Array.from(document.querySelectorAll(".KNfEk+ div .L6Bbsd")).map((el) => {
            return {
                query: el.querySelector(".uEubGf").textContent,
                mentioned: el.querySelector(".fontBodySmall").textContent + "times"
            }
        }),
        most_relevant: Array.from(document.querySelectorAll(".jJc9Ad")).map((el) => {
        return {
            user: {
            name: el.querySelector(".d4r55")?.textContent,
            thumbnail: el.querySelector(".NBa7we")?.getAttribute("src"),
            local_guide: el.querySelector(".RfnDt span:nth-child(1)")?.textContent.length ? true : false,
            reviews: el.querySelector(".RfnDt span:nth-child(2)")?.textContent.replace(".", "").trim(),
            link: el.querySelector(".WEBjve")?.getAttribute("href")
            },
            rating: el.querySelector(".kvMYJc")?.getAttribute("aria-label"),
            date: el.querySelector(".rsqaWe")?.textContent,
            review: el.querySelector(".MyEned .wiI7pd").textContent,
            images: Array.from(el.querySelectorAll(".KtCyie button")).length ? Array.from(el.querySelectorAll(".KtCyie button")).map((el) => {
                return {
                    thumbnail: getComputedStyle(el).backgroundImage.split('")')[0].replace('url("', ""),
                };
            })
                : "",
        }
        })
        }
    });
    return items;
    }

Step-by-step explanation:

document.querySelectorAll() - It will return all the elements that matches the specified CSS selector. In our case, it is Nv2PK.
getAttribute() -This will return the attribute value of the specified element.
textContent - It returns the text content inside the selected HTML element.
split() - Used to split a string into substrings with the help of a specified separator and return them as an array.
trim() - Removes the spaces from the starting and end of the string.
replaceAll() - Replaces the specified pattern from the whole string.
map() - It calls a callback function on each element of array and returns an array that contains the results.

Here is the full code:

    const puppeteer = require("puppeteer");

    const extractData = async (page) => {
    let items = await page.evaluate(() => {
    let i = 0;
    return {
        title: document.querySelector(".fontHeadlineLarge")?.textContent,
        rating: document.querySelector(".F7nice")?.textContent,
        reviews: document.querySelector(".mmu3tf .DkEaL")?.textContent,
        type: document.querySelector(".u6ijk")?.textContent,
        service_options: document.querySelector(".E0DTEd")?.textContent.replaceAll("·", ""),
        address: document.querySelector("button[data-tooltip='Copy address']")?.textContent.trim(),
        website: document.querySelector("a[data-tooltip='Open website']")?.textContent.trim(),
        pluscode: document.querySelector("button[data-tooltip='Copy plus code']")?.textContent.trim(),
        timings: Array.from(document.querySelectorAll(".OqCZI tr")).map((el) => {
            return {
                [el.querySelector("td:first-child")?.textContent.trim()]: el.querySelector("td:nth-child(2) li.G8aQO")?.textContent,
            };
        }),
        popularTimes: {
            graph_data: Array.from(document.querySelectorAll(".C7xf8b > div")).map((el) => {
                let day;
                if (i == 0) {
                    day = "Sunday"
                }
                else if (i == 1) {
                    day = "Monday"
                }
                else if (i == 2) {
                    day = "Tuesday"
                }
                else if (i == 3) {
                    day = "Wednesday"
                }
                else if (i == 4) {
                    day = "Thursday"
                }
                else if (i == 5) {
                    day = "Friday"
                }
                else if (i == 6) {
                    day = "Saturday"
                }
                i++;
                return {
                    [day]: Array.from(el.querySelectorAll(`.dpoVLd`)).map((el) => {
                        const time = el.getAttribute("aria-label").split("at")[1].trim();
                        const busy_percentage = el.getAttribute("aria-label").split("busy")[0].trim();
                        return {
                            time,
                            busy_percentage,
                        };
                    }),
                };
            }),
        },
        photos: Array.from(document.querySelectorAll(".dryRY .ofKBgf")).map((el) => {
            return {
                title: el.getAttribute("aria-label"),
                thumbnail: el.querySelector("img").getAttribute("src"),
            }
        }),
        question_and_answers: {
            question: document.querySelector(".Py6Qke")?.textContent,
            answer: document.querySelector(".l79Qmc").textContent
        },
        user_ratings: Array.from(document.querySelectorAll(".ExlQHd tr")).map((el) => {
            return {
                [el.getAttribute("aria-label")?.split(",")[0].trim()]: el.getAttribute("aria-label")?.split(",")[1].trim(),
            };
        }),
        user_reviews: Array.from(document.querySelectorAll(".tBizfc")).map((el) => {
            return {
                description: el.textContent.replaceAll('"', "").trim(),
                user_link: el.querySelector("a").getAttribute("href")
            }
        }),
        mentions: Array.from(document.querySelectorAll(".KNfEk+ div .L6Bbsd")).map((el) => {
            return {
                query: el.querySelector(".uEubGf").textContent,
                mentioned: el.querySelector(".fontBodySmall").textContent + "times"
            }
        }),
        most_relevant: Array.from(document.querySelectorAll(".jJc9Ad")).map((el) => {
            return {
                user: {
                    name: el.querySelector(".d4r55")?.textContent,
                    thumbnail: el.querySelector(".NBa7we")?.getAttribute("src"),
                    local_guide: el.querySelector(".RfnDt span:nth-child(1)")?.textContent.length ? true : false,
                    reviews: el.querySelector(".RfnDt span:nth-child(2)")?.textContent.replace(".", "").trim(),
                    link: el.querySelector(".WEBjve")?.getAttribute("href")
                },
                rating: el.querySelector(".kvMYJc")?.getAttribute("aria-label"),
                date: el.querySelector(".rsqaWe")?.textContent,
                review: el.querySelector(".MyEned .wiI7pd").textContent,
                images: Array.from(el.querySelectorAll(".KtCyie button")).length ? Array.from(el.querySelectorAll(".KtCyie button")).map((el) => {
                    return {
                        thumbnail: getComputedStyle(el).backgroundImage.split('")')[0].replace('url("', ""),
                    };
                })
                    : "",
            }
        })
    }
    });
    return items;
    }

    const getMapsPlacesData = async () => {
    try {
    const url = "https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801";

    browser = await puppeteer.launch({
        headless: false,
        args: ["--disabled-setuid-sandbox", "--no-sandbox"],
    });
    const [page] = await browser.pages();

    await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
    await page.waitForTimeout(3000);

    const data = await extractData(page);
    console.log(data)

    await browser.close();
    }
    catch (e) {
    console.log(e);
    }
    }

    getMapsPlacesData();

Results:

Our result should look like this 👇🏻:

   {
    title: ' Blacklist Coffee Roasters  ',
    rating: '4.8116 reviews',
    reviews: '116 reviews',
    type: 'Coffee shop',
    service_options: '    Dine-in    Takeaway    Delivery  ',
    address: '439D Hay St, Subiaco WA 6008, Australia',
    website: 'blacklistcoffee.com.au',
    pluscode: '3R3F+3X Subiaco, Western Australia, Australia',
    timings: [
        { Saturday: '7am-2pm' },
        { Sunday: '8am-2pm' },
        { Monday: '7am-2pm' },
        { Tuesday: '7am-2pm' },
        { Wednesday: '7am-2pm' },
        { Thursday: '7am-2pm' },
        { Friday: '7am-2pm' }
    ],
    popularTimes: {
        graphResults: [
        [Object], [Object],
        [Object], [Object],
        [Object], [Object],
        [Object]
        ]
    },
    photos: [
        {
        title: 'All',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOQrrdy6N2Z7Xp9zbS-BE0LqVqJPXyHAYPW76zD=w224-h298-k-no'
        },
        {
        title: 'Latest · 10 days ago',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipPAGd8tNSNaBLdx7XGTtL4o48xOK4kLgMjFGHh-=w448-h298-k-no'
        },
        {
        title: 'Food & drink',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOIfWpjgc7syqDrvU72Cg_ey4JhsDWU-v1kcmpS=w447-h298-k-no'
        },
        {
        title: 'Vibe',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipMDFZ_xthQMPS9nQcrbCLGYcawrzmnYQE9dDDjN=w224-h298-k-no'
        },
        {
        title: 'Latte',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipPlwUXR7bfyPYQz1CjtoUJljds1na3T-POExbZK=w397-h298-k-no'
        },
        {
        title: 'Coffee',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOOfsk6V1Dc7Ew8NbGHQpUYU2XN8Ua_58nJHuPN=w224-h298-k-no'
        },
        {
        title: 'By owner',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipMqKXEiGXy-YjmB_mTurbqgi31mdn8EWRRsYwAI=w446-h298-k-no'
        },
        {
        title: 'Street View & 360°',
        thumbnail: 'https://streetviewpixels-pa.googleapis.com/v1/thumbnail?panoid=OkXbXBk_L_BvCTTYcRC2Cw&cb_client=maps_sv.tactile.gps&w=224&h=298&yaw=156.95456&pitch=0&thumbfov=100'
        },
        {
        title: 'Videos',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipP4Jq3MimzxDXc9oh_hGQAQkZDpxDOh5m9FRpEd=w224-h298-k-no'
        }
    ],
    question_and_answers: {
        question: 'Will they grind purchased beans in store?',
        answer: 'Hi Alex, we can grind any beans you purchase from us :)'
    },
    user_ratings: [
        { '5 stars': '103 reviews' },
        { '4 stars': '8 reviews' },
        { '3 stars': '2 reviews' },
        { '2 stars': '0 reviews' },
        { '1 stars': '3 reviews' }
    ],
    user_reviews: [
        {
        description: "They also sell coffee equipment at standard, non-inflated prices (I've checked).",
        user_link: 'https://www.google.com/maps/contrib/101811065862344097095?hl=en-IN'
        },
        {
        description: 'Very serious about their coffee 👌 Deserves more attention this place!',
        user_link: 'https://www.google.com/maps/contrib/109201574522158862622?hl=en-IN'
        },
        {
        description: 'Ordered a distilled coffee and a mocha plus a cookie.',
        user_link: 'https://www.google.com/maps/contrib/111095785064544767742?hl=en-IN'
        }
    ],
    mentions: [
        { query: 'beans', mentioned: '10times' },
        { query: 'coffee tasting', mentioned: '9times' },
        { query: 'barista', mentioned: '7times' },
        { query: 'milk', mentioned: '3times' }
    ],
    most_relevant: [
        {
        user: [Object],
        rating: ' 5 stars ',
        date: '2 weeks ago',
        review: 'A regular coffee stop. If you're into brews and love a tasting, here's one to go to. Definitely recommend this place. Love the vibe of the cafe; the interior. Ideal to chill here in the morning. Parking is easy to find which is great especially in Subiaco.',
        images: [Array]
        },
        {
        user: [Object],
        rating: ' 5 stars ',
        date: 'a year ago',
        review: 'Excellent coffee, lactose-free milk available. We came for the $14 coffee tasting and it was really good! You can taste coffees in filter style, espresso, or with milk (latte etc). The staff are lovely, Bree was so nice! Apparently they rotate the coffees for tasting every 2-3 weeks, so will definitely be back for another tasting.',
        images: [Array]
        },
        {
        user: [Object],
        rating: ' 5 stars ',
        date: '5 months ago',
        review: 'Come here not only for the great black coffee but ALL the staff here are super welcoming and lovely to speak to. ' ...',
        images: [Array]
        }
    ]
   }

Serpdog's Google Maps Reviews API

Scraping, in the long run, can become a time-consuming process as it requires you to maintain the scraper according to changing CSS Selectors. To solve this problem, Serpdog's Google Search API also offer Google Maps Reviews API that returns the HTML and readymade structured JSON data to the users.
Currently, we are working on Google Maps Places API, which we will launch after some time.

Scraping Google also requires solving captchas, a large pool of User agents, and proxies, but Serpdog solves all these problems on its behalf for a smooth scraping experience.

Our users also get 100 free requests on the first sign-up.

    const axios = require('axios');

    axios.get('https://api.serpdog.io/reviews?api_key=APIKEY&data_id=0x89c25090129c363d:0x40c6a5770d25022b')
        .then(response => {
        console.log(response.data);
        })
        .catch(error => {
        console.log(error);
        });

Results:

"location_info": {
    "title": "Statue of Liberty",
    "address": "New York, NY",
    "avgRating": "4.7",
    "totalReviews": "83,109 reviews"
    },
    "reviews": [
    {
        "user": {
        "name": "Vo Kien Thanh",
        "link": "https://www.google.com/maps/contrib/106465550436934504158?hl=en-US&sa=X&ved=2ahUKEwj7zY_J4cv4AhUID0QIHZCtC0cQvvQBegQIARAZ",
        "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJxv5_uPnmyIeoARlf7gMWCduHV1cNI20UnwPicE=s40-c-c0x00000000-cc-rp-mo-ba4-br100",
        "localGuide": true,
        "reviews": "111",
        "photos": "329"
        },
        "rating": "Rated 5.0 out of 5,",
        "duration": "5 months ago",
        "snippet": "The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it.There’s only one cruise line that is allowed to enter the Liberty Island and Ellis Island, which is Statue Cruises. You can purchase tickets at the Battery Park but I’d recommend you purchase it in advance. For $23/adult it’s actually very reasonably priced. Make sure you go early because you will have to go through security at the port. Also take a look at the departure schedule available on the website to plan your trip accordingly.As for the Statue of Liberty, it was my first time seeing it in person so what I could say was a wow. It was absolutely amazing to see this monument. I also purchased the pedestal access so it was pretty cool to see the inside of the statue. They’re not doing the Crown Access due to Covid-19 concerns, but I hope it will be resumed soon.There are a gift shop, a cafeteria and a museum on the island. I would say it takes around 2-3 hours to do everything here because you would want to take as many photos as possible.I absolutely loved it here and I can’t wait to come back.The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it. …More",
        "visited": "",
        "likes": "91",
        "images": [
        "https://lh5.googleusercontent.com/p/AF1QipPOBhJtq17DAc9_ZTBnN2X4Nn-EwIEet61Y9JQo=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPZ2ut1I7LnECqEB2vzrBk-PSXzBxaHEE4S54lk=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipM8nIogBhwcL-dUrd7KaIxZcc_SA6YnJpp50R0C=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPQ-YP7uw_gHTNb1gGZSGRGRrzLMzOrvh98AmSN=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipOTqBzK30vQZi9lfuhpk5329bnx-twzgIVjwcI1=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipN0TWUE35ajoTdSKelspuUpK-ZTXlRRR9SfPbTa=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPQH_4HtdXmSdkCiDTv2jO30LksCxpe9KQI4YKw=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipN_OfX2TgXVNry5fli5v-yExbyTAfV4K7SEy3T0=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipNWKl0TeBmnzMaR_W4-7skitDwHjjJxPePbiSyd=w100-h100-p-n-k-no"
        ]
    },
        ........

Conclusion:

In this tutorial, we learned to scrape Google Maps Results using Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

DEV Community

How to scrape Google Maps Places?

Introduction

Requirements

Web Parsing with CSS selectors

User Agents

Install Libraries

Target:

Process:

Results:

Serpdog's Google Maps Reviews API

Conclusion:

Additional Resources

Top comments (0)

Read next

Rust 🦀 version 1.83.0 came out a few days ago. It is upgrade time!

TypeScript's progressive adoption strategy for front-end projects

Our Scientific Approach to Aligning Human Capacity with Business Objectives

ChatsAPI — The World’s Fastest AI Agent Framework