DEV Community


Scrape news headlines with python in <10 lines of code!

Javascript, Node.js, Python, PHP, React and Vue. Coding since 2017
Updated on ・2 min read

Today I'll show you a way to scrape news headlines in python in under 10 lines of code!

Let's get started...

First of all, make sure to import these libraries at the beginning of your python script:

import requests
from bs4 import BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

For this tutorial, I'll be using BBC news as my news source, use these 2 lines of code to get it's url:

response = requests.get(url)
Enter fullscreen mode Exit fullscreen mode

Now we're ready to scrape using BeautifulSoup!

Head over to BBC news and inspect a news headline by right clicking and pressing inspect.
As you'll see, all news headlines are contained within an "h3" tag:
h3 tags

Now add these 4 lines of code to scrape and display all the h3 tags from BBC news:

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
for x in headlines:
Enter fullscreen mode Exit fullscreen mode
  • First, we define "soup" as the innerHTML of the BBC news webpage.
  • Next, we define "headlines" as an array of all h3 tags found within the webpage.
  • Finally, paddle through the "headlines" array and display all of it's contents one by one ridding each element of it's outerHTML using the "text.strip()" method.

Now if you run your script, your output should look something like this:
h3 results

If you're a beginner who likes discovering new things about python, try my weekly python newsletter

Python Explore


Discussion (0)