First of all, make sure to import these libraries at the beginning of your python script:
import requests from bs4 import BeautifulSoup
For this tutorial, I'll be using BBC news as my news source, use these 2 lines of code to get it's url:
url='https://www.bbc.com/news' response = requests.get(url)
Now we're ready to scrape using BeautifulSoup!
Head over to BBC news and inspect a news headline by right clicking and pressing inspect.
As you'll see, all news headlines are contained within an "h3" tag:
Now add these 4 lines of code to scrape and display all the h3 tags from BBC news:
soup = BeautifulSoup(response.text, 'html.parser') headlines = soup.find('body').find_all('h3') for x in headlines: print(x.text.strip())
- First, we define "soup" as the innerHTML of the BBC news webpage.
- Next, we define "headlines" as an array of all h3 tags found within the webpage.
- Finally, paddle through the "headlines" array and display all of it's contents one by one ridding each element of it's outerHTML using the "text.strip()" method.
If you're a beginner who likes discovering new things about python, try my weekly python newsletter