The series “Anne with an E” is gaining more and more space in the media for talking about important issues in a sensitive and subtle way, such as: racism, feminism and self-acceptance, in addition to making us reflect on how the simple things of world can be magnificent and that with imagination and kindness, we can go through all the situations of life.
What is NLP?
NLP or Natural Language Processing is a branch of Artificial intelligence that is dedicated to understanding the relationship of human language (spoken, written) and machine language.
So I decided to generate a word cloud with with the users' tweets about this series.
P.S: I made the codes used in this article available in my Github repository.
What is wordcloud?
Word cloud or Wordcloud is used in the exploration of texts, to better understand which words are frequent in the sentences, and thus perform the possible treatments in them or better understand for example: what your users or customers are saying about a certain product or company .
Advantages of using wordcloud in analysis
We can analyze thousands of text feedbacks and bring up the most cited terms, which help us to understand:
- points of your brand that delight customers;
- items that do not meet the needs of your audience;
- problematic relationships during the shopping day;
And with all this we will have the most important topics related to the business and prioritize which ones are more important to prioritize, that is, we make more assertive decisions.
Extraction of tweets
For extract tweets from Twitter you need to have an account in Twitter Develper and install the TwitterSearch library, with this we can conect in Twitter's API and extract ours data.
Who knows next blog I can show you how can generate the API Keys and tokens in Twitter Developer! ;)
To make the filter in tweets, we use the ".set_keywords()" methods and we can filter to the tweet's language with the method ".set_language()"
In this method I create a file called "tweets.json" for storage all those tweets, where I load it on my jupiter notebook turning it into a Dataframe to make it easier to manipulate.
Pre processing
It's very important make a pre processing in the tweets, because most of then will come with "stop words" or words considered irrelevant for our processing (preposition, articles and pronouns), we can remove the accents in the words, extra spaces and so on. That is, in this pre processing we transform this unstructured data into structured and multi-dimensional representations.
For example:
Batman is better than Superman
After pre processing:
Batman better Superman
In other words, our NLP model or our analyzes will be able to focus on the words that are really important to the business.
For this I create a method to remove the accents, numbers and spaces.
Stop words can affect our wordcloud because this type of visualization catches which phrases are most frequent among all and certainly the words “and, with, that, the” are very frequent. In this method I am generating tokens in the sentences so that I can facilitate the stop words and some punctuation (‘,’ ’.’, ’#’,etc...).
It's very important to say that we need import the stop words, so we will import NLTK library, in my example I imported portuguese stop words!
stopwords = nltk.corpus.stopwords.words('portuguese')
Generating wordcloud
To generate a wordcloud we need to pass a list of words, so I generated a list from our pre-processed phrases called "todas_palavras".
To generate a wordcloud, we need to instantiate this "WordCloud ()" object and then use the ".generate ()" command to generate this wordcloud. So our code would look like this:
word_cloud = WordCloud( ).generate(todas_palavras)
To display this word cloud we use the following commands:
plt.imshow(word_cloud, interpolation=”bilinear”)
plt.show()
I created a method so that we can reuse our wordcloud in an easier way, but you can choose not to use methods.
And your result should look like this:
Before concluding, it is important to note that wordcloud is an excellent visualization tool that facilitates the analysis of texts or text analysis, allowing the visualization of the frequency distribution of words and thus we can improve our pre-processing or the understanding of our users.
That's all folks!!
My LinkedIn
Top comments (0)