If you haven't read this tutorial explaining how to scrape news headlines in python, make sure you do.
In summary, here's the code for scraping news headlines in python:
import requests from bs4 import BeautifulSoup url='https://www.bbc.com/news' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') headlines = soup.find('body').find_all('h3') for x in headlines: print(x.text.strip())
To create a wordcloud out of these news headlines, first import these 2 libraries beside the libraries needed to scrape our news source:
import requests from bs4 import BeautifulSoup from wordcloud import WordCloud #add wordcloud import matplotlib.pyplot as plt #add pyplot from matplotlib
for x in headlines: print(x.text.strip())
h3text = '' for x in el: h3text = h3text + ' ' + x.text.strip()
- This will first define the "h3text" string, then add every news headline to the string and seperate them with spaces.
Before we make the wordcloud, you can check the news headlines by using
wordcloud = WordCloud(width=500, height=500, margin=0).generate(soup.get_text(h3text)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.margins(x=0, y=0) plt.show()
Let me explain...
- First create a wordcloud(well, more like a box in this case) sized 500 by 500.
- Next, our wordcloud will be created using "plt.imshow()" (
interpolation='bilinear'just makes the words in the wordcloud easier to read).
plt.margins(x=0, y=0)make sure our wordcloud isn't displayed as a graph.
- Finally, our wordcloud is displayed using "plt.show()".
If you're a beginner who likes discovering new things about python, try my weekly python newsletter