DEV Community

csdj92
csdj92

Posted on

Web crawling with python and selenium

Intro

For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.

Learing to crawl

To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()

browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

elem = browser.find_element_by_name('p')  # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)

browser.quit()
Enter fullscreen mode Exit fullscreen mode

Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.

Make sure to check out the selenium docs for more information!

Top comments (1)

Collapse
 
crawlbase profile image
Crawlbase

Thanks! Nice blog post! Love how you break down the basics of web crawling with Python and Selenium. It's awesome to see practical tips like setting up Selenium and navigating through a webpage.
By the way,you can check out Crawlbase, it could be your next go-to tool.