Hello, noobs,
So this is my second post in this forum and we are going to learn how to scrape data from a website using Python,beautifulsoup4 and requests in this series.
I will presume that you are using a Unix based os for the purposes of this tutorial.
Now heading onto the yummy part of it. We will start by creating a new directory,.
cd Desktop && mkdir webscraping && cd webscraping
While still on your terminal, create a virtual environment, activate it.
python3 -m virtualenv venv && source venv/bin/activate
Now we install the libraries we need.
pip install requests beautifulsoup4 pandas
Now we are all set up to go. In the webscraping folder that we had created earlier create a file named app.py
touch app.py
In the file, import the modules that we need.
from bs4 import BeautifulSoup
import requests
import pandas as pd
We will use century21 as our data source for the purpose of this tutorial. we will now call the URL
url = "https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?ty=0"
response = requests.get(url)
c = response.content
We can now start our webscraping part.
soup = BeautifulSoup(c,'html.parser')
all = (I for I in soup.find_all('div',{'class':'infinite-item'} )
d = {}
l = []
for i in all:
price = i.find('a',{'class':'listing-price'},text=True)
if price:
d['price'] = price.get_text().strip()
beds= i.find('div',{'class':'property-beds'})
if beds:
d['beds'] = beds.get_text().strip()
bath = i.find('div',{'class':'property-baths'})
if bath:
d['bath'] = bath.get_text().strip()
address = i.find('div',{'class':'property-address'})
if address:
d['address'] = address.get_text().strip()
address_city = i.find('div',{'class':'property-address-city'})
if address_city:
d['address_city'] = address_city.get_text().strip()
l.append(d)
we can now save the data in a pandas data frame and a csv file
df = pd.DataFrame(l)
df.to_csv('Output.csv',mode='a',header='False'
Hooray... You now have a working webscraping script that is functional.
You can follow me on GitHub https://github.com/manulangat1
Corrections and criticism are highly welcomed and appreciated.
See you next time...
Top comments (0)