Web Scraping Google Search Results Using PHP
PHP is one of the most popular server-side scripting languages, which was first developed by Rasmus Lerdorf in 1994. It is widely used for the development of websites, web applications, and other dynamic web content.
It is a powerful scripting language, and one can leverage this capability to automate web scraping tasks.
Web scraping can be defined as the extraction of data from websites in the form of text, videos, images, and more, which can then be stored in a database or a local file. This technique can be used for various purposes, including SEO, lead generation, market research, and more.
In this tutorial, we’ll be scraping Google Search Results using PHP. We will also discuss why PHP can be a suitable alternative for Google scraping tasks.
By the end of the article, you will have a basic understanding of how to deal with scraping Google Search results, which will also help you to leverage this knowledge for other web scraping tasks.
Why PHP for scraping Google Search Results?
PHP can be an ideal choice for web scraping tasks because of its wide availability and ease of use. It also provides various libraries to work with HTTP requests and HTML parsing. Additionally, it is easy to learn and is greatly supported by the developer community.
PHP provides one of the most popular HTTP request libraries, cURL, which can be used to extract data from web servers. When used with Simple HTML Dom Parser for HTML parsing, it makes a powerful tool, similar to the combination of Axios and Cheerio in Node JS or Requests and Beautiful Soup in Python.
Overall, PHP is a robust and powerful language, highly efficient for scraping Google and other web scraping applications.
Scraping Google Search Results With PHP
In this tutorial, we’ll be focusing on creating a basic PHP script to scrape the first 10 Google Search Results, including their title, link, and description.
Set-Up:
If you have not already installed PHP, you can watch these videos for the installation.
Requirements:
For scraping Google search results with PHP, we will install a PHP library:
- Simple HTML DOM Parser — It allows you to extract or filter out the required data from the raw HTML.
Process:
So, I assume that you have set up your PHP project. We will start the scraping by making an HTTP request with the help of cURL to extract the raw HTML data. Here is our URL:
https://www.google.com/search?q=php+tutorial&gl=us&hl=en
Let us first import the required libraries:
<?php
require_once 'vendor/autoload.php';
require_once 'simple_html_dom.php';
And now, we will make a function to extract the data.
function getData() {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.google.com/search?q=php+tutorial&gl=us&hl=en');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.4951.54 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec($ch);
Step-by-step explanation:
First, we initiated a cURL session by using curl_init() function.
Next, we set the URL from which we want to extract the data.
In the following line, we set the header to be passed with the URL as the User Agent, which will help us mimic an organic user.
After that, we set the returntransferoption to ‘true’ so that we can obtain the data in string format.
Then we put the SSL verifier to false to skip the certificate verification.
And in last, the session gets executed with the curl_exec($ch) function, and we stored the extracted HTML in the $html variable.
Now, we will declare an instance of the Simple HTML DOM Parser to load the extracted HTML into it.
$dom = new simple_html_dom();
$dom->load($html);
Okay 😃, we are done with the scraping part. Now, we will focus on searching for the tags that contain our required elements to parse the HTML.
Open the URL in your browser and inspect the HTML code. You will find that every organic result is contained inside the g
tag.
So, we will loop over all the divs with the g
as the tag to get the information, it holds inside.
$results = $dom->find("div.g");
This will find all the elements with the class name g
.
$c = 0;
foreach ($results as $result) {
// Extract the title and link of the result
$title = $result->find("h3", 0)->plaintext;
$link = $result->find(".yuRUbf > a", 0)->href;
$snippet = $result->find(".VwiC3b", 0)->plaintext;
echo "Title: " . $title . "<br>";
echo "Link: " . $link . "<br>";
echo "Snippet: " . $snippet . "<br>";
echo "Position: " . ($c+1) . "<br>";
echo "<br>";
$c++;
}
}
Now, let’s search for the tags of the title, link, and snippet.
So, from the above image, you can tell that the title has the tag h3
, the link has the tag .yuRUbf > a
and the snippet has the tag .VwiC3b
.
Now, run this code in your terminal. Your results should look like this:
Title: PHP Tutorial - W3Schools
Link: https://www.w3schools.com/php/
Snippet: Learn PHP. PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used, free, and efficient ...
Position: 1
Title: PHP Tutorial - Tutorialspoint
Link: https://www.tutorialspoint.com/php/index.html
Snippet: The PHP Hypertext Preprocessor (PHP) is a programming language that allows web developers to create dynamic content that interacts with databases.
Position: 2
Title: A simple tutorial - Manual - PHP
Link: https://www.php.net/manual/en/tutorial.php
Snippet: Here we would like to show the very basics of PHP in a short, simple tutorial. This text only deals with dynamic web page creation with PHP, though PHP is ...
Position: 3
Great work🎉🎉!!! So, we have successfully created our scraper to scrape Google Search Results in PHP.
But, this solution can’t be used for a large amount of data extraction as this can make Google ban your IP. Alternatively, you can also use various Google Search APIs available in the market, which can help you counter various anti-bot mechanisms with their large pool of Data centers and Residential Proxies.
With Google Search API
Serpdog provides an easy and streamlined solution to scrape Google Search Results with its robust SERP APIs, and it also solves the problem of dealing with proxies and CAPTCHAs for a smooth scraping journey. It provides tons of extra data other than organic results in the most affordable pricing in the whole industry.
You will also receive 100 free requests upon signing up.
You will get an API Key after registering on our website. Embed the API Key in the code below, and you will be able to scrape Google Search Results at a blazingly fast speed.
<?php
$url = "https://api.serpdog.io/search?api_key=APIKEY&q=php+tutorial&gl=us";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Conclusion:
In this tutorial, we learned to scrape Google Search Results using PHP. Feel free to message me anything you need clarification on. Follow me on Twitter. Thanks for reading!
Top comments (0)