DEV Community

Cover image for Web Scraping Multiple Pages using Python & BeautifulSoup
Sudhanshu Mukherjee
Sudhanshu Mukherjee

Posted on

Web Scraping Multiple Pages using Python & BeautifulSoup

Web Scraping

Web Scraping is used to examine websites for Unstructured Data and store it Structurally for our use. Web Scraping helps us iterate over several web pages and extract the required information, It is then stored in a format suitable for the user.

In this project, we will learn 'How to use Python and Beautiful Soup to scrape Mobile Phone Names and Prices of Brand Mi on a Flipkart E-commerce Website'.

Click here for a code

Inspecting the Element

The objective of this project is to scrape Flipkart's Website and extract the 'Name' and 'Prices' of Mi Mobile Phones.
Get the URL that will be needed to make a request.

The URL which we will be using: https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy%2C4io&otracker=nmenu_sub_Electronics_0_Mi&page=

Now, It is very necessary to identify the right class of the elements stored which we need to scrape. To inspect elements follow these steps.
Step 1 - Visit the URL
Step 2 - Right on the website and select inspect or press Ctrl + shift + I together.
Step 3 - Hover on the name of the phone and click it. Select the class from the window appearing on the right.
Step 4 - Apply the same process for price.
Step 5 - Copy this class somewhere, we will need it later in our code.
Blue Highlighted Text on the right determines class of phone name

Import Libraries, modules, and Packages

Let's import the required libraries and packages first. To begin with, I will be using Beautiful Soup for parsing HTML documents which can later be used to extract data from HTML. Then, We will import 'requests' which allows us to interact with the web, It contains many useful features and methods to make HTTP requests.
Lastly, we will need pandas to store our scraped data in an organized way.

from bs4 import BeautifulSoup
import requests
import pandas as pd

Write the program

  • Create an empty list for Phone Name and Phone Price.
    phone_name = []
    phone_price = []

  • Declare a variable 'page_num' which will help us take input from users about the number of pages they want to scrape.
    page_num = int(input("Enter number of pages:"))

  • Now we will use for Loop to iterate over multiple pages looking for relevant information and operations to implement.

  • Then, Let's store our URL in the variable url and make requests to build a connection. Don't forget to add(+) str(i). This will help our program to iterate over multiple pages while using for loop.

  • After, making a request, let's use Beautiful Soup for parsing HTML and store all the data in a variable called content.

Code

  • Declare a variable called name to find and store all div and the relevant class(which we copied using inspect element) of phone name.
  • Declare a variable called price to find and store all div and the relevant class(which we copied using inspect element) of phone price.

Code

  • Let's append the data extracted in the empty list which we have created at the start.

Image description

  • Now, let's create a DataFrame to store our data structurally and export DataFrame into CSV file.

Code

This is our Aha moment folks. We have successfully created our web scraper using python and Beautiful Soup. To get a better understanding of this, Use a website that you personally like and try scraping it.

Discussion (0)