Bing is one of the major search engines currently in use today. With a 15% market share in the US, makes it the second largest search engine company after Google. Bing contains tons of valuable data which can be utilized for market trends, SERP monitoring, media monitoring, etc.
In this tutorial, we will learn to scrape Bing search results using Python and its libraries. We will also explore the benefits of scraping Bing and why the official Bing Search API may not be the best choice for extracting data from Bing.
Why Scrape Bing?
Scraping Bing can provide you with various benefits:
SERP Monitoring — Scraping Bing Search can help you monitor your website rankings on the search engine, which then, can be used to improve your website rankings according to the algorithm.
News Monitoring — Monitor the news around the globe by scraping Bing and use this data to analyze various trends and sentiments.
Lead Generation — Scrape the Bing data to enrich any company employees' database and sell them at a profitable price in the market.
Let’s start scraping Bing!
In this section, we will focus on scraping the first 10 Bing Search Results, including titles, links, and snippets(description).
Setup
To install Python on your device, you can consider the below videos:
If you don’t want to watch videos, you can directly install Python from their official website.
Requirements
To scrape Bing, we will be using these two Python libraries:
Beautiful Soup — Used for parsing the raw HTML data.
Requests — Used for making HTTP requests.
You can run the below commands in your project terminal to install the libraries.
pip install requests
pip install beautifulsoup4
Process:
Ok, so we are done with the setup. Next, we will scrape the data from this page.
We will start the program by importing the required libraries.
import requests
from bs4 import BeautifulSoup
Next, we will make an HTTP request at the target URL to extract the raw HTML data.
url = 'https://www.bing.com/search?q=api&count=10'
response = requests.get(url)
To parse this extracted data, we will use the BeautifulSoup library to easily navigate inside the DOM and search for the required HTML elements.
soup = BeautifulSoup(response.text, 'html.parser')
And then, we will search for the tags from HTML.
We will extract the title, snippet, and link from the search results.
If you inspect the HTML, you will get to know that every organic result is under the tag li
with class .b_algo
.
So, we will loop over all these list tags and extract the required information from them.
search_results = []
for result in soup.find_all('li', class_='b_algo'):
Let us now locate the tag of the title.
Inside the h2
element, you can find the title. And you can also see the link under the h2
tag.
So, let us scrape both things.
for result in soup.find_all('li', class_='b_algo'):
title = result.find('h2').text
url = result.find('h2 a')['href']
And finally, we will locate the tag for the snippet.
The paragraph with class b_algoSlug contains our snippet.
for result in soup.find_all('li', class_='b_algo'):
title = result.find('a').text
url = result.find('a')['href']
description = result.find('p.b_algoSlug').text
Finally, we will append the parsed data in our search_results array.
search_results.append({
'title': title,
'url': url,
'description': description
})
print(search_results)
Run this program in your terminal, you will get the expected results.
[
{
'title': 'What is an Application Programming Interface (API)? | IBM',
'url': 'https://www.ibm.com/topics/api',
'snippet': 'WebAn API, or application programming interface, is a set of defined rules that enable different applications to communicate with each other. It acts as an intermediary layer that processes data transfers between systems, letting companies open their application data and functionality to external third-party developers, business partners, and internal …'
},
{
'title': 'What is an API? - Application Programming Interfaces ...',
'url': 'https://aws.amazon.com/what-is/api/',
'snippet': 'WebA Web API or Web Service API is an application processing interface between a web server and web browser. All web services are APIs but not all APIs are web services. REST API is a special type of Web API that uses the standard architectural style explained above. The different terms around APIs, like Java API or service APIs, exist because ...'
},
]
.....
So, this is how you make your basic scraper to extract Bing Search Results.
Here is the complete code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.bing.com/search?q=api&count=10'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
search_results = []
for result in soup.find_all('li', class_='b_algo'):
title = result.find('a').text
url = result.find('a')['href']
snippet= result.find('p').text
search_results.append({
'title': title,
'url': url,
'snippet': snippet
})
print(search_results)
Problems with Official Bing Search API
There are various reasons why the Bing Search API is not preferred by developers:
Expensive — The official Bing Search API is not budget friendly for developers. That is why they consider Web Scraping API over it every time.
Limited Features — Bing Search API offers various plans and features according to its pricing. However, not all features are included in a single plan. That is why businesses around the globe consider web scrapers that give them full access to the results.
Complex Setup — It can be difficult for users with nontechnical backgrounds to set up the API.
Conclusion
In this tutorial, we learned to scrape Google Search Results using Ruby. Please do not hesitate to message me if I missed something.
If you think we can complete your custom scraping projects, feel free to contact us. Follow me on Twitter. Thanks for reading!
Top comments (0)