Serpdog

Posted on Feb 28, 2023 • Edited on Mar 2, 2023 • Originally published at serpdog.io

Scrape Google Maps Reviews With Python

#beginners #programming #tutorial #python

Introduction

Python is a popular high-level, multi-purpose programming language. It is widely used for applications like desktop applications, web applications, artificial intelligence, etc. But one more beautiful task it can do is web scraping!

In this blog, we will scrape Google Maps Reviews using Python and its libraries — Beautiful Soup and Requests.

Why scrape Google Maps Reviews?

Scraping Google Maps Reviews comes with various benefits:

Valuable Insights — Scraping Google Maps Reviews can provide valuable insights from your customers, their opinions, and feedback which can help you to improve your product and revenue growth.
Competitive Intelligence — Reviews from Google Maps can help you identify your competitors' strengths and weaknesses, and you leverage this data to stay ahead of your competitors.
Data Analysis — The review data can be used for various research purposes such as sentimental analysis, consumer behavior, etc.
Reputation Management — Monitoring or analyzing the negative reviews left by your customers helps you identify the weakness in your product and allows you to solve problems faced by your customers.

Scraping Google Maps Reviews

In this blog, we will design a Python script to scrape the top 10 Google Reviews including location information, user details, and much more. At last, I will also show you a method to easily scrape the reviews beyond the top 10.

The Google Maps Reviews scraping is divided into two parts:

Getting the raw HTML from the target URL.
Extracting the required information from the raw HTML data.

Set-Up

Those who have not installed Python on their device can watch these videos:

If you don’t want to watch these videos, you can directly download Python from their official website.

Requirements

To scrape Google Maps Reviews, we will be using these two Python libraries:

Beautiful Soup — Used for parsing the raw HTML data.
Requests — Used for making HTTP requests.

To install these libraries, you can run the below commands in your terminal:

pip install requests
pip install beautifulsoup4

Process

After successfully done with the setup, open the project file in your respective code editor and import the libraries we have installed above.

import requests 
from bs4 import BeautifulSoup

Then, we will create our function to scrape the reviews of Burj Khalifa from Google Maps.

def get_reviews_data():

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"
    }

    response = requests.get("https://www.google.com/async/reviewDialog?hl=en_us&async=feature_id:0x3e5f43348a67e24b:0xff45e502e1ceb7e2,next_page_token:,sort_by:qualityScore,start_index:,associated_topic:,_fmt:pc", headers=headers)

    soup = BeautifulSoup(response.content, 'html.parser')

    user = []
    location_info = {}
    data_id = ''
    token = ''

In the above code, first, we set headers to the user agent so that our bot can mimic an organic user. Then, we made an HTTP request on our target URL.

Let us decode this URL first:

https://www.google.com/async/reviewDialog?hl=en_us&async=feature_id:0x3e5f43348a67e24b:0xff45e502e1ceb7e2,next_page_token:,sort_by:qualityScore,start_index:,associated_topic:,_fmt:pc

feature_id — It is also known as a data ID which is a unique Id for a particular location on Google Maps.

next_page_token — It is used to get the next page results.

sort_by — It is used for filtering the results.

You can get the data ID of any place by searching it on Google Maps.

Let us search for Burj Khalifa on Google Maps.

If you take a look at the URL, you will get to know the data ID is between !4m7!3m6!1s and !8m2!, which in this case is 0x3e5f43348a67e24b:0xff45e502e1ceb7e2.

Now, open the URL in your browser, and a text will be downloaded to your computer. Open this text file in your code editor, and convert it into an HTML file.

We will now search for the tags of the elements we want in our response.

Let us extract the information about the location from the HTML.

Look at the above image, you will find the tag for the title as P5Bobd, for the address as T6pBCe, for the average rating, is span.Aq14fc, and then for the total reviews is span.z5jxId.

    for el in soup.select('.lcorif'):
        location_info = {
            'title': soup.select_one('.P5Bobd').text.strip(),
            'address': soup.select_one('.T6pBCe').text.strip(),
            'avgRating': soup.select_one('span.Aq14fc').text.strip(),
            'totalReviews': soup.select_one('span.z5jxId').text.strip()
        }

Now, we will extract the data ID and the next page token.

Search for the tag loris in the HTML. You will find the data ID in the attribute data-fid. Then search for the tag gws-localreviews__general-reviews-block, and you will find the next page token in its attribute data-next-page-token.

    for el in soup.select('.lcorif'):
        data_id = soup.select_one('.loris')['data-fid']
        token = soup.select_one('.gws-localreviews__general-reviews-block')['data-next-page-token']
        location_info = {
            'title': soup.select_one('.P5Bobd').text.strip(),
            'address': soup.select_one('.T6pBCe').text.strip(),
            'avgRating': soup.select_one('span.Aq14fc').text.strip(),
            'totalReviews': soup.select_one('span.z5jxId').text.strip()
        }

Similarly, we can extract the user’s details and other information like images posted by the user, his rating, the number of reviews, and the feedback about the location written by the user.

This makes our code looks like this:

import requests
from bs4 import BeautifulSoup

def get_reviews_data():

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"
    }

    response = requests.get("https://www.google.com/async/reviewDialog?hl=en_us&async=feature_id:0x3e5f43348a67e24b:0xff45e502e1ceb7e2,next_page_token:,sort_by:qualityScore,start_index:,associated_topic:,_fmt:pc", headers=headers)

    soup = BeautifulSoup(response.content, 'html.parser')

    user = []
    location_info = {}
    data_id = ''
    token = ''

    for el in soup.select('.lcorif'):
        data_id = soup.select_one('.loris')['data-fid']
        token = soup.select_one('.gws-localreviews__general-reviews-block')['data-next-page-token']
        location_info = {
            'title': soup.select_one('.P5Bobd').text.strip(),
            'address': soup.select_one('.T6pBCe').text.strip(),
            'avgRating': soup.select_one('span.Aq14fc').text.strip(),
            'totalReviews': soup.select_one('span.z5jxId').text.strip()
        }

    for el in soup.select('.gws-localreviews__google-review'):
        user.append({
            'name': el.select_one('.TSUbDb').text.strip(),
            'link': el.select_one('.TSUbDb a')['href'],
            'thumbnail': el.select_one('.lDY1rd')['src'],
            'numOfreviews': el.select_one('.Msppse').text.strip(),
            'rating': el.select_one('.EBe2gf')['aria-label'],
            'review': el.select_one('.Jtu6Td').text.strip(),
            'images': [d['style'][21:d['style'].rindex(')')] for d in el.select('.EDblX .JrO5Xe')]
        })

    print("LOCATION INFO: ")
    print(location_info)
    print("DATA ID:")
    print(data_id)
    print("TOKEN:")
    print(token)
    print("USER:")

    for user_data in user:
        print(user_data)
        print("--------------")

get_reviews_data()

Run this code in your terminal, and your results should look like this:

The tutorial is not over yet. I will also teach you about the extraction of the next-page reviews.

In the output of the above code, we have got the token — CAESBkVnSUlDZw==

Let us embed this in our URL:

https://www.google.com/async/reviewDialog?hl=en_us&async=feature_id:0x3e5f43348a67e24b:0xff45e502e1ceb7e2,next_page_token:CAESBkVnSUlDZw==,sort_by:qualityScore,start_index:,associated_topic:,_fmt:pc

Make an HTTP request with this URL in your code. You will get the next page reviews successfully.

Using Google Maps Reviews API

Scraping Google is difficult. Many developers can’t deal with the frequent proxy bans and CAPTCHAs. But our Google Maps Reviews API, a completely user-friendly and streamlined solution can help you scrape reviews from Google Maps.

To use our API, you have to first sign up on our website. It will only take a bit.

Once you get registered, you will be redirected to a dashboard, and there you will get your API Key.

Use this API in the below code to scrape the reviews from Google Maps:

import requests
payload = {'api_key': 'APIKEY', 'data_id': '0x89c25090129c363d:0x40c6a5770d25022b'}
resp = requests.get('https://api.serpdog.io/reviews', params=payload)
print (resp.text)

With this short script, you can scrape Google Maps Reviews at a blazingly fast speed without any problem.

Conclusion

In this tutorial, we learned to scrape Google Maps Reviews with Python. Please do not hesitate to message me if I missed something. If you think we can complete your custom scraping projects feel free to contact us.

Follow me on Twitter. Thanks for reading!

DEV Community

Scrape Google Maps Reviews With Python

Introduction

Why scrape Google Maps Reviews?

Scraping Google Maps Reviews

Set-Up

Requirements

Process

Using Google Maps Reviews API

Conclusion

Additional Resources

Top comments (0)

Read next

Unlocking Efficient Training for AI Language Giants: Deep Optimizer States

Why Homelab Users Need a WAF

5 Best Reasons You Should Choose Microservices Architecture for Your Project

This Week In Python