DEV Community

manulangat1
manulangat1

Posted on

Web scraping using Python, requests and beautiful soup.

Hello, noobs,
So this is my second post in this forum and we are going to learn how to scrape data from a website using Python,beautifulsoup4 and requests in this series.

I will presume that you are using a Unix based os for the purposes of this tutorial.
Now heading onto the yummy part of it. We will start by creating a new directory,.

   cd Desktop && mkdir webscraping && cd webscraping

While still on your terminal, create a virtual environment, activate it.

   python3 -m virtualenv venv && source venv/bin/activate

Now we install the libraries we need.

  pip install requests  beautifulsoup4 pandas

Now we are all set up to go. In the webscraping folder that we had created earlier create a file named app.py

   touch app.py

In the file, import the modules that we need.

 from bs4 import BeautifulSoup
 import requests
 import pandas as pd

We will use century21 as our data source for the purpose of this tutorial. we will now call the URL

url = "https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?ty=0"

response = requests.get(url)
c = response.content

We can now start our webscraping part.

soup = BeautifulSoup(c,'html.parser')
all = (I for I in soup.find_all('div',{'class':'infinite-item'} )

d = {}
l = []
for i in all:
    price = i.find('a',{'class':'listing-price'},text=True)
    if price:
        d['price'] = price.get_text().strip()
    beds= i.find('div',{'class':'property-beds'})
    if beds:
        d['beds'] = beds.get_text().strip()
    bath = i.find('div',{'class':'property-baths'})
    if bath:
        d['bath'] = bath.get_text().strip()

    address = i.find('div',{'class':'property-address'})
    if address:
        d['address'] = address.get_text().strip()

    address_city = i.find('div',{'class':'property-address-city'})
    if address_city:
        d['address_city'] = address_city.get_text().strip()
    l.append(d)

we can now save the data in a pandas data frame and a csv file

df = pd.DataFrame(l)
df.to_csv('Output.csv',mode='a',header='False'

Hooray... You now have a working webscraping script that is functional.
You can follow me on GitHub https://github.com/manulangat1

Corrections and criticism are highly welcomed and appreciated.
See you next time...

Discussion (0)