DEV Community

Rahul Kumar
Rahul Kumar

Posted on • Edited on

Selenium Scraping in Python with Installation/Setup Guide

Selenium allows you to automate web-related tasks whether it is fetching data from website (web scraping), filling forms and many more.

All these tasks are performed using a headless browser. A headless browser is nothing more than a browser without visible GUI which allows you to - make HTTP requests and keep session information.

My main focus in doing some basic operations on a website and fetch some information.

Pre-requisite

  1. You should have basic HTML knowledge to understand how selenium works.
  2. Understanding of DOM will be beneficial.

Installation

First Installation, regardless of your platform you need three things to get started.

  1. Selenium Install selenium using pip install selenium
  2. Headless Browser For this tutorial I am using chrome's chromedriver. Alternatively, you can use firefox headless browser called geckodriver. Install Chromedriver from this link. Install Geckodriver from this link.
  3. A web browser with GUI Install Chrome using the following commands
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
Enter fullscreen mode Exit fullscreen mode

Now without further ado let's create our first demo.

Import Selenium packages to your active project

import selenium
from selenium import webdriver
from selenium.webdriver import chrome
from selenium.webdriver.chrome.service import Service
Enter fullscreen mode Exit fullscreen mode

Now let's open a website using.

s = Service("chromedriver.exe")
driver = webdriver.Chrome(service=s)
driver.get("https://rugsforyou.in/")
Enter fullscreen mode Exit fullscreen mode

The Service class demands a path of executable, I have chromedriver.exe in same folder as my python file.
You can also use geckodriver.exe for Firefox.

The webdriver.Chrome creates a new instance of chrome driver.

Now its time to explain a bit about webdriver.

A webdriver is a component of Selenium which accept command and send them to browser to return result. Webdriver.Chrome demands an executable file for chromedriver that I am provide through a reference to Service class.

The .get() method is a way to load a web page in the current browser session. In short it creates an HTTP request for the supplied URL.

Now lets, create a simple automation using selenium. This will open this wonderful eCommerce site and enter a value into search bar and then show the result.
For this we need to import another class By.

from selenium.webdriver.common.by import By

# in continuation to the above code
send_data = driver.find_element(By.CLASS_NAME, value="ms-search-field")
send_data.send_keys("flower")
send_data.submit()
Enter fullscreen mode Exit fullscreen mode

driver.find_element find the web element with class name "ms-search-field".

Using By you can define the locator eg - CLASS_NAME or ID.

The .send_keys("value") holds the value to allow typing into an input field in our case "ms-search-field" is an input field.

The .submit() submits the form.

In order to access the resulting website URL you can use .current_url object. For eg - print(driver.current_url)

Using headless browser

Just add the following command to run this program in headless state.

from selenium.webdriver.chrome.options import Options

# continuation to above code
opts = Options()
opts.headless = True
# change the arguments of Chrome class
driver = webdriver.Chrome(options=opts, service=s)
Enter fullscreen mode Exit fullscreen mode

This will not open chrome browser but still load the data and print the result in terminal.

In order to revert it back to GUI remove options=opts from Chrome().
I now at this point you must be wondering why we have imported so many packages. So let's recap.

Image description

Top comments (0)