DEV Community

lou
lou

Posted on

Word Frequency Counter using NLTK

NLTK is short for Natural Language Toolkit, which is an open-source Python library for NLP.

We want to count the frequency of words for the following text using NLTK.

text= "Morocco, officially the Kingdom of Morocco, is the westernmost country in the Maghreb region of North Africa. It overlooks the Mediterranean Sea to the north and the Atlantic Ocean to the west, and has land borders with Algeria to the east, and the disputed territory of Western Sahara to the south. "

To install NLTK

pip install nltk  
Enter fullscreen mode Exit fullscreen mode

If you don't have Jupyter installed type the following commands in your terminal.

pip install jupyterlab
Enter fullscreen mode Exit fullscreen mode
pip install notebook
Enter fullscreen mode Exit fullscreen mode
pip install voila
Enter fullscreen mode Exit fullscreen mode

run Jupyter with

jupyter notebook 
Enter fullscreen mode Exit fullscreen mode

Import the following libraries.

Image description

Assign the text to a variable.

Image description

The following function divides a sentence into words and punctuations.

Image description

Which you can see in the output.

Image description

The following code loops over the text tokens and counts the number of times a given token occurred.
Using lower() we're going to convert the words into lowercase, like this we can avoid considering the same word in uppercase as different.

Image description

Top 10 most frequent words:

Image description

Now let's visualize it using Matplotlib.
Image description

Top comments (0)