DEV Community

Brendah Achieng
Brendah Achieng

Posted on

Getting Started With Sentiment Analysis

Sentiment analysis(opinion mining) is a natural language processing (NLP) technique that focuses on analyzing and finding the intent/emotion behind a given text or speech.
There is always a sentiment behind any written or spoken speech.It could be negative,positive or neutral.

Sentimental analysis helps automate the processing of large amount of data in real time.It can be used to analyze customer feedback, survey responses,social media monitoring, reputation management, customer experience and product reviews.Business decisions can be made after analyzing and understanding people's reaction towards a given comodity.

Sentimental analysis is fast becoming an essential tool in understanding the sentiment behind all types of data.Being able to understand the responses from over 5000 customers from a given survey automatically is a great gain for a business.

Importance of sentimental analysis

  • Sorting large amount of data: Manually sorting through thousands of tweets or customer survey responses is very tidious.Sentimental analysis helps analyse large amounts of unstructured data within a short period of time.

  • Real time analysis:Through Sentimental analysis models urgent or critical issues can be detected in real time .For example an angry customer who needs immediate attention can be identified immediately and the situation delt with.

  • Consistent criteria:Using a centralize sentimental analysis model can help with the consistency and maintenance of the standard when interpreting data.Manually done interpretations can be bias as sometimes people get influenced with their experience,beliefs and thoughts.

How Does Sentiment Analysis Work?

With the use of machine learning and natural language processing sentimental analysis can determine whether a text is neutral,positive or negative.

Main approaches of sentimental analysis are:

1.Rule-based sentiment analysis.

A set of manually created rules is used for the analysis.NLP techniques like Lexicons (lists of words), Stemming, Tokenization, Parsing are used.

Lexicons-A list of both negative and positive words are created and later used to describe the sentiment.
Tokenization- Breaking a text or a sentence into smaller pieces called tokens.

Basic example of how a rule-based system works:

Defines two lists of polarized words that is negative words such as bad, ugly and positive words such as best, beautiful.

The text is then prepared,processed and formated to make analyzation by the machine possible and easy.Tokenizationm and Lemmatization occurs here.

The computer then counts the number of words classified as negative and the positive words in the text.

The overall sentiment score of the text is then calculated based on a given scale like -100 to 100.If the number of positive words are higher than the negative word the system returns positive sentiment and vice versa.Should the score be even the system returns neutral sentiment.

Disadvantages of Rule-based sentiment analysis

It is limited because it doesnt consider the whole sentences but parts of it.Human language is complicated and sometimes the real emotion can be missed.

2. Automated or Machine Learning Sentiment Analysis

Machine learning techniques are used.A model is trained with a given data set to classify the sentiment based on the words and their order in a given text.The quality of this approach depends on the quality of the training dataset used.

Step 1: Feature Extraction

Data(text) preparation is done here.Techniques such as tokenization,lemmatization,vectorization and stopword removal are used to make the text ready for classification by the model.Deep learning is used to achieve vectorization of the text.

Step 2: Training & Prediction
A sentiment-labelled training dataset is used to train the algorithm.The dataset is created manually or generated from reviews.

Step 3: Predictions

New text is fed into the model. The model then predicts labels for this new data using the model trained using the training dataset. The text is then classified as positive, negative or neutral in sentiment. This eliminate the need for a pre-defined lexicon used in rule-based sentiment analysis.

N/B-A hybrid of both rule-based and automated can be used sometimes.Although they are very complex, they provide the best result.

Building Sentiment Analysis Model

Pre-trained models are publicly available on the Hub hence they are the best place to get started.The available models use deep learning designs like transformers.For better results it is advisable to fine tune the chosen model with your own data to better fit the case at hand and for accurate results

Top comments (0)