DEV Community

Code_Jedi
Code_Jedi

Posted on • Updated on

Sentiment Analysis With Python. Making Your First Sentiment Analysis Script.

Do you want to perform sentiment analysis with Python but don't know how to get started? Not to worry. In this article, I'll demonstrate and explain how you can make your own sentiment analysis app, even if you are new to Python.

What Exactly Is Sentiment Analysis?

If you've been following programming and data science, you'll probably be familiar with sentiment analysis. If you're not, here the definition:

The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.

Sentiment analysis programs have become increasingly popular in the tech world. It's time you make one for yourself!

Educative

Before I get on with the article, I'd like to recommend Educative for learners like you.
Why Educative?
It is home to hundreds of development courses, hands on tutorials, guides and demonstrations to help you stay ahead of the curve in your development journey.

You can get started with Educative here.

Making A Simple Sentiment Analysis Script

Let's make a simple sentiment analysis script with Python. What will it do?
It will:

  1. Scrape news headlines from BBC news.
  2. Get rid of unwanted scraped elements and duplicates.
  3. Scan every headline for words that may indicate it's sentiment.
  4. Based on the found words, determine each headline's sentiment.
  5. Aggregate the headlines into different arrays based on their sentiment.
  6. Print the number of scraped headlines and number of headlines with a positive, negative and neutral sentiment.

Setup

Create a new Python file with your favorite text-editor. You can name it however you want, but I'll name the file main.py for this tutorial.
Before writing the main code, make sure to install(if not already installed) and import the following libraries.

import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np
Enter fullscreen mode Exit fullscreen mode

The Dataset

A sentiment analysis script needs a dataset to train on.
Here's the dataset that I made for this script. I've tested it and found it to work well.
To work with this tutorial, make sure to download this dataset, move it into your Python file's directory and add the following code to your Python file.

df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']
Enter fullscreen mode Exit fullscreen mode

If you take a look at this dataset, you'll notice that it's just over 100 lines long. Each line contains a number, 1 or 0 and a word.
The number just gives a way for the Python file to paddle through each word, the word is what is going to indicate a headline's sentiment, and the 1 or 0 indicates whether the word has negative(0) or positive(1) sentiment.
This isn't a lot, but it is enough to perform accurate sentiment analysis on news headlines, which are typically only about 6-10 words long.

Scraping The News Headlines

Here's the code that is going to scrape the news headlines:

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
Enter fullscreen mode Exit fullscreen mode

As this is not a web scraping tutorial, you don't have to understand what's happening here. In case you are interested in how this works, here's a tutorial on how to scrape news headlines with Python in <10 lines of code.

Before performing sentiment analysis on the scraped headlines, add the following code to your Python file.

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
Enter fullscreen mode Exit fullscreen mode

The unwanted array contains elements that will be scraped from BBC news, that are not news headlines.

Full Code:

import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np

df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
Enter fullscreen mode Exit fullscreen mode

Performing Sentiment Analysis

It's time to write the code which will perform sentiment analysis on the scraped headlines.
Add the following code to your Python file.

neutral = []
bad = []
good = []
for x in headlines:
    if x.text.strip() not in unwanted and x.text.strip() not in news:
        news.append(x.text.strip())
Enter fullscreen mode Exit fullscreen mode

Here's what this code does:

  1. First, it defines the neutral, bad and good arrays.
  2. While paddling through every scraped headline element, it checks if it's not inside the unwanted and news array.
  3. It appends the headline to the news array.

The reason why it checks if the headline is in the unwanted and news array is to exclude non-headline elements and prevent duplicate headlines to be analyzed more than once.

Full Code:

import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np

df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
    if x.text.strip() not in unwanted and x.text.strip() not in news:
        news.append(x.text.strip())
Enter fullscreen mode Exit fullscreen mode

Now, let's perform sentiment analysis on the news headlines by adding the following code to the if x.text.strip() not in unwanted and x.text.strip() not in news: condition.

    for i in range(len(df['n'])):
        if sen[i] in x.text.strip().lower():
            if cat[i] == 0:
                bad.append(x.text.strip().lower())
            else:
                good.append(x.text.strip().lower())

Enter fullscreen mode Exit fullscreen mode

Here's what this code does:

  1. First, the for i in range(len(df["n"])): loop makes sure to search the headlines for any of the words in the sentiment.csv dataset.
  2. If a word from the dataset is found in the headline using the if sen[i] in x.text.strip().lower(): condition, the if cat[i] == 0: condition then finds if the found word has a negative or positive sentiment and adds the headline to either the bad or good array.

The lower() function converts all the letters inside the headlines to lowercase. This is done because the word search algorithm is case sensitive.

Full Code:

import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np

df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
    if x.text.strip() not in unwanted and x.text.strip() not in news:
        news.append(x.text.strip())
        for i in range(len(df['n'])):
            if sen[i] in x.text.strip().lower():
                if cat[i] == 0:
                    bad.append(x.text.strip().lower())
                else:
                    good.append(x.text.strip().lower())
Enter fullscreen mode Exit fullscreen mode

There's one thing left to do.

Add the following code to the end of your Python file.

badp = len(bad)
goodp = len(good)
nep = len(news) - (badp + goodp)
print('Scraped headlines: '+ str(len(news)))
print('Headlines with negative sentiment: ' + str(badp) + '\nHeadlines with positive sentiment: ' + str(goodp) + '\nHeadlines with neutral sentiment: ' + str(nep))
Enter fullscreen mode Exit fullscreen mode

This will print the number of scraped headlines and the number of headlines with a bad, good and neutral sentiment.

The End Result

Here's the full sentiment analysis code:

import requests
import pandas
from bs4 import BeautifulSoup
import numpy as np

df = pandas.read_csv('sentiment.csv')
sen = df['word']
cat = df['sentiment']

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']
news = []
neutral = []
bad = []
good = []
for x in headlines:
    if x.text.strip() not in unwanted and x.text.strip() not in news:
        news.append(x.text.strip())
        for i in range(len(df['n'])):
            if sen[i] in x.text.strip().lower():
                if cat[i] == 0:
                    bad.append(x.text.strip().lower())
                else:
                    good.append(x.text.strip().lower())

badp = len(bad)
goodp = len(good)
nep = len(news) - (badp + goodp)
print('Scraped headlines: '+ str(len(news)))
print('Headlines with negative sentiment: ' + str(badp) + '\nHeadlines with positive sentiment: ' + str(goodp) + '\nHeadlines with neutral sentiment: ' + str(nep))
Enter fullscreen mode Exit fullscreen mode

Now if you run your Python file containing the above code, you will see an output similar to the below.

Output


Conclusion

I hope that this tutorial has successfully demonstrated how you can perform sentiment analysis with Python.

Byeeee👋

Discussion (1)

Collapse
gokayburuc profile image
GOKAY BURUC

Any NLP Library could be useful instead of empty lists. Especially spacy or textblob libraries have sentiment analysis functions.