DEV Community

Wulfi
Wulfi

Posted on

Extracting data from a website using BeautifulSoup

There are mainly two ways to extract data from a website:

  • Use APIs(if available) to retrieve data.

  • Access the HTML of the webpage and extract useful information/data from it.

In this article, we will extract Billboard magazine's Top Hot 100 songs of the year 1970 from Billboard Year-End Hot 100 singles of 1970.

Image description

Task:

  • Perform Web scraping and extract all 100 songs with their artists.
  • Create python dictionary which contains key as title of the single and value as lists of artists.

Installation
We need to install requests and bs4.The requests module allows you to send HTTP requests using Python. Beautiful Soup (bs4) is a Python library for pulling data out of HTML and XML files.

pip install requests
pip install bs4
Enter fullscreen mode Exit fullscreen mode

Import the libraries

import requests
from bs4 import BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

Sending request

url = "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_1970"
response = requests.get(url)
print(response.url) # print url
response # response status
Enter fullscreen mode Exit fullscreen mode
songSoup = BeautifulSoup(response.text) # Object of BeautifulSoup

data_dictionary = {}

for song in songSoup.findAll('tr')[1:101]: # loop over index 1 to 101 because the findAll('tr') contains table headers
  # Priting 100 table rows.............
  # print(song)   

  title = song.findAll('a')[0].string

  artist = song.findAll('a')[1].string
  # Printing Titles and Artists.............
  print(title, ',', artist)

  # Printing Dictionary.............
  data_dictionary[title] = [artist]
print(data_dictionary)
Enter fullscreen mode Exit fullscreen mode

Image description

Latest comments (0)