Intro
For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.
Learing to crawl
To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title
elem = browser.find_element_by_name('p') # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)
browser.quit()
Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.
Make sure to check out the selenium docs for more information!
Top comments (1)
Thanks! Nice blog post! Love how you break down the basics of web crawling with Python and Selenium. It's awesome to see practical tips like setting up Selenium and navigating through a webpage.
By the way,you can check out Crawlbase, it could be your next go-to tool.