Selenium allows you to automate web-related tasks whether it is fetching data from website (web scraping), filling forms and many more.
All these tasks are performed using a headless browser. A headless browser is nothing more than a browser without visible GUI which allows you to - make HTTP requests and keep session information.
My main focus in doing some basic operations on a website and fetch some information.
Pre-requisite
- You should have basic HTML knowledge to understand how selenium works.
- Understanding of DOM will be beneficial.
Installation
First Installation, regardless of your platform you need three things to get started.
-
Selenium
Install selenium using
pip install selenium
- Headless Browser For this tutorial I am using chrome's chromedriver. Alternatively, you can use firefox headless browser called geckodriver. Install Chromedriver from this link. Install Geckodriver from this link.
- A web browser with GUI Install Chrome using the following commands
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
Now without further ado let's create our first demo.
Import Selenium packages to your active project
import selenium
from selenium import webdriver
from selenium.webdriver import chrome
from selenium.webdriver.chrome.service import Service
Now let's open a website using.
s = Service("chromedriver.exe")
driver = webdriver.Chrome(service=s)
driver.get("https://rugsforyou.in/")
The Service class demands a path of executable, I have chromedriver.exe in same folder as my python file.
You can also use geckodriver.exe for Firefox.
The webdriver.Chrome creates a new instance of chrome driver.
Now its time to explain a bit about webdriver.
A webdriver is a component of Selenium which accept command and send them to browser to return result. Webdriver.Chrome
demands an executable file for chromedriver
that I am provide through a reference to Service class.
The .get() method is a way to load a web page in the current browser session. In short it creates an HTTP request for the supplied URL.
Now lets, create a simple automation using selenium. This will open this wonderful eCommerce site and enter a value into search bar and then show the result.
For this we need to import another class By.
from selenium.webdriver.common.by import By
# in continuation to the above code
send_data = driver.find_element(By.CLASS_NAME, value="ms-search-field")
send_data.send_keys("flower")
send_data.submit()
driver.find_element find the web element with class name "ms-search-field"
.
Using By you can define the locator eg - CLASS_NAME or ID.
The .send_keys("value") holds the value to allow typing into an input field in our case "ms-search-field" is an input field.
The .submit() submits the form.
In order to access the resulting website URL you can use .current_url object. For eg - print(driver.current_url)
Using headless browser
Just add the following command to run this program in headless state.
from selenium.webdriver.chrome.options import Options
# continuation to above code
opts = Options()
opts.headless = True
# change the arguments of Chrome class
driver = webdriver.Chrome(options=opts, service=s)
This will not open chrome browser but still load the data and print the result in terminal.
In order to revert it back to GUI remove options=opts
from Chrome().
I now at this point you must be wondering why we have imported so many packages. So let's recap.
Top comments (0)