DEV Community

Serpdog
Serpdog

Posted on • Originally published at serpdog.io

Scraping Google Search Results Using R

Scraping Google Search Results Using R

R, a programming language for statistical computing, was developed by two professors from the University of Auckland, Ross Ihaka and Robert Gentleman, in 1993. It was heavily inspired by the S programming language. Its ease of use, backing by various libraries, and continuous improvement over the years have made it a great tool for web scraping.

R is a powerful language that enables it to handle various tasks without any issues, thereby increasing its robustness and effectiveness.

Web Scraping or Data Mining has become a thriving industry over the past years. In particular, if you are scraping websites like Google, then it can open various doors for you to start earning. Scraping Google offers several benefits like SERP Monitoring, Price Monitoring, SEO, etc.

Scrape Google Search Results Using R

In this tutorial, we will teach you how to scrape Google Search Results with R. We will also explore some advantages and disadvantages of using R programming language.

This tutorial is aimed at teaching you how to fetch and handle complex HTML structures of Google Search Results. This will surely help you create any personal web scraping projects in your data extraction journey.

Let’s start the tutorial!

Let’s start scraping Google Search Results With R

The first step towards scraping Google Search Results with R would be fetching the HTML data from Google webpage by passing appropriate headers and then parsing the HTML to get the desired data.

Set-Up

If you have not already installed R, you can watch these videos for the installation.

  1. How to set up R on Windows?

  2. How to set up R on MacOS?

Requirements

For scraping Google search results with R, we will install a library:

  1. Rvest— This library will assist us in fetching and parsing the HTML data from the target website.

You can also install this library in your project folder by running the below command.

install.packages("rvest")
Enter fullscreen mode Exit fullscreen mode

Process

Now that we have all the ingredients on the table, it’s time to cook our food! As mentioned in the above section, our first step would be making a GET request on the target URL. Let us now implement this!

Import the library we have installed above.

library(rvest)
Enter fullscreen mode Exit fullscreen mode

After that, create a function and initialize the target URL and the respective headers to pass with the GET request.

getData <- function() {
  headers <- c("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36 Unique/99.7.2776.77")

  url <- "https://www.google.com/search?q=cakes+in+boston&gl=us"
Enter fullscreen mode Exit fullscreen mode

User Agent is a request header that identifies the client software and can be used to make our scraping bot mimic an organic user.

Then we will use the read_html() function to fetch the HTML from the target webpage.

 response <- read_html(url, headers = headers)
Enter fullscreen mode Exit fullscreen mode

After scraping the data, we will locate the required tags from the HTML.

For this, you have to inspect the search results of the target webpage.

Inspecting the Google Search Organic Results

From the above image, you can derive a conclusion that every organic result comes under the div tag with class g.

This allows us to iterate over every div tag with the class g to extract the required information.

  results <- html_nodes(response, "div.g")
Enter fullscreen mode Exit fullscreen mode

Next, we will locate the tags for the title, link, and description.

Inspecting the title, description and link

Inspect the search results again. You will observe that the link is present under the tag yuRUbf, the h3 tag represents the title of the respective organic result and the description comes under the tag VwiC3b.

      c <- 0

      for (result in results) {
        title <- html_text(html_nodes(result, "h3"))
        link <- html_attr(html_nodes(result, ".yuRUbf > a"), "href")
        description<- html_text(html_nodes(result, ".VwiC3b"))
        position <- c + 1

        cat("Title: ", title, "\n")
        cat("Link: ", link, "\n")
        cat("Description: ", description, "\n")
        cat("Position: ", position, "\n\n")

        c <- c + 1
      }
    }

    getData()
Enter fullscreen mode Exit fullscreen mode

In the above code, we extracted the required data step-by-step by identifying them with the help of their respective tags. We are also printing the position or rank of every organic result.

Run this code in your project terminal. You should get the following results.

       Title: Where to Order the 10 Best Cakes in Boston · The Food Lens,
       Link: https://www.thefoodlens.com/boston/guides/best-cakes/,
       Snippet: Where to Order the 10 Best Cakes in Boston ; Weesh Bake Shop. Roslindale · Bakery · Dessert · $$$$ ; La Saison Bakery · Cambridge · $$ ; Manoa Poke ...
       Position: 1

       Title: Top 10 Best Birthday Cake in Boston, MA - June 2023
       Link: https://www.yelp.com/search?find_desc=Birthday+Cake&find_loc=Boston%2C+MA
       Snippet: Best Birthday Cake near me in Boston, Massachusetts ; Soul Cake. 7.5 mi. 16 reviews ; Jonquils Cafe & Bakery. 2.9 mi. 538 reviews ; Sweet Teez Bakery. 0.9 mi. 15 ...
       Position: 2

       Title: Here's Where to Find the Best Bakeries in Boston Right Now
       Link: https://www.bostonmagazine.com/restaurants/best-bakeries-boston/
       Snippet: Devoted foodies and restaurant newbies love The Feed. Sign-up now for our twice weekly newsletter. · 7ate9 Bakery · Elm Street Sweets · Haley House Bakery Cafe.
       Position: 3 
Enter fullscreen mode Exit fullscreen mode

Here is the complete code:

 library(rvest)

    getData <- function() {
      headers <- c("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36 Unique/99.7.2776.77")

      url <- "https://www.google.com/search?q=cakes+in+boston&gl=us"
      response <- read_html(url, headers = headers)

      results <- html_nodes(response, "div.g")
      c <- 0

      for (result in results) {
        title <- html_text(html_nodes(result, "h3"))
        link <- html_attr(html_nodes(result, ".yuRUbf > a"), "href")
        description<- html_text(html_nodes(result, ".VwiC3b"))
        position <- c + 1

        cat("Title: ", title, "\n")
        cat("Link: ", link, "\n")
        cat("Description: ", description, "\n")
        cat("Position: ", position, "\n\n")

        c <- c + 1
      }
    }

    getData()
Enter fullscreen mode Exit fullscreen mode

I believe you now understand how we can scrape Google Search Results by writing a basic piece of code. You can customize the above code by yourselves if there is a need for more data.

Pros and cons of Using R

Every language has some pros and cons in itself. But, let us discuss some benefits associated with R.

Pros:

  1. R has a variety of packages, such as rvest and httr, specifically designed for custom web scraping tasks.

  2. R has excellent community support from developers. If you get stuck in a problem, then various online subreddits and discord servers that can provide you with assistance.

  3. R has excellent data manipulation capabilities and can easily parse extracted raw HTML data.

Cons:

  1. Learning R can be difficult for developers just begin with the language.

  2. R may not be able to deliver the same kind of performance as compared to other languages like Python and Node JS.

  3. R cannot be used for scraping dynamically rendered content as their libraries are specifically designed for scraping static HTML pages.

Using Serpdog’s Google Search API

Are you tired of frequently getting blocked by Google?

No worries!!!

As Serpdog’s Google Search API allows you to scrape Google at scale without the fear of any blockage. Also, it utilizes a massive pool of 10M+ residential proxies enabling our scraper to bypass any onsite protection delivering the results at a rapid speed.

Serpdog | Google Search API

Our API is one of the most affordable in the market, as we have decreased our profits to a great extent to provide a quality service to our customers.

And one thing I forgot. Serpdog also offers 100 free Google credits on first-time registration to get started with our Google Search API. These credits are also renewed every month.

So, embed your API Key in the below code after registering on our website to avail the benefits of our robust SERP API.

  library(httr)

    url <- "https://api.serpdog.io/search?api_key=APIKEY&q=cakes+in+boston&gl=us"
    response <- GET(url, config(ssl_verifypeer = 0L, ssl_verifyhost = 0L))
    content <- content(response, "text")
    cat(content)

Enter fullscreen mode Exit fullscreen mode

Conclusion:

In this tutorial, we learned to scrape Google Search Results using R Language. Feel free to message me anything you need clarification on. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Walmart Data

  2. Web Scraping Amazon Product Data

  3. Scrape Bing Using Python

  4. Scrape Zillow Using Python

  5. Scrape LinkedIn Jobs

  6. Scraping Google News Results

Top comments (0)