Getting Started in NLP

#machinelearning #beginners #tutorial #nlp

When I first started learning about NLP (Natural Language Processing - processing text data) I wanted to find a beginner's guide that gave me a framework for understanding the topics and terminology that I needed to search for to find learning resources. I struggled to find one. However, just this week, @omarsar0 ("elvis") on Twitter has posted some mind maps that look really useful, and I thought I should post them here in case other beginners find these useful too.

Text Mining Mind Map:

NLP Mind Map:

Also, here are some beginner's tutorials and code examples (in python) that I've found really helpful for getting started:

...and a useful book is "Applied Text Analysis with Python"

There are many more complicated "state of the art" (SOTA) methods not covered in the resources above (e.g. Word2Vec, GloVe, ELMo, BERT, and SOTA models since BERT) but I recommend staying away from those until you understand text mining with the traditional methods.

There are also many different tasks that can be performed using NLP techniques (e.g. translating between languages, summarising text, question answering, and more) and I recommend starting out with "text classification" or "sentiment analysis" (which is a type of text classification). There are lots of free tutorials and examples online for sentiment analysis e.g. trying to classify whether a Yelp review is a positive review or a negative review. Perhaps even before that I'd recommend importing text data and creating a wordcloud (this tutorial will help). If you don't know what a word cloud is, below is an example. It's a way to visualise the frequency of each word in some text.

I created the wordcloud above using this code:

# import matplotlib so that the wordcloud can be displayed
import matplotlib.pyplot as plt
%matplotlib inline

# import wordcloud so that the wordcloud can be created
from wordcloud import WordCloud

# create a string of text
text_string = "NLP, NLP, NLP, NLP, NLP, NLP, NLP, NLP, NLP, \
                text, text, text, spacy, spacy,\
                sentiment analysis, translation, stopwords,\
                tokenisation, tokenisation, tokenisation,\ 
                part-of-speech tagging, bag of words, TF-IDF,\
                embedding, summarisation, language modelling,\
                question answering, text classification,\
                text classification, RNN, LSTM"

# create a wordcloud from the string of text
my_wordcloud = WordCloud(background_color="white", 
                         max_words=50, 
                         ).generate(text_string)

# display the wordcloud
plt.imshow(my_wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Note: you may need to install wordcloud first (e.g. with !pip install wordcloud if you're writing python code in a Jupyter Notebook)

NLP is a massive field and it can be daunting and confusing to get started. It's a really interesting field though and well worth the effort. I'm planning on writing more about NLP in the future, as I'm learning a lot about it as part of my Data Science MSc project. In the meantime, I hope the resources I've mentioned here can help to make the journey a bit easier for total beginners.

Top comments (7)

kamilliano • Jul 1 '20

Thanks for that. I checked Kaggle tutorial for a bit more, I am interested to establish when a textual intent can evoke anxiety in a person.

Nic Fox • Jul 2 '20

That sounds really interesting! Will you share your project when you are finished?

kamilliano • Jul 2 '20

I think so, might take a while though :) as I am just at the initial stage - I have simple plans for what to do but still not sure how I will model that once I obtain data. I guess that will tackle that once when I get to that phase. There are a few papers I need to go through:
journals.sagepub.com/doi/full/10.1...
journals.sagepub.com/doi/abs/10.11...
microsoft.com/en-us/research/wp-co...
aaai.org/ocs/index.php/ICWSM/ICWSM...
Would you recommend any other readings that you can think of? I would appreciate any suggestions.

Nic Fox • Jul 2 '20

I bet! :) I don't know of anything specific to anxiety detection but here are some links to some resources I've found useful/interesting around emotion detection and NLP in general:

DeepMoji: github.com/bfelbo/deepmoji
NLP Progress: github.com/sebastianruder/NLP-prog...
Intro to BERT with links to other resources: skok.ai/2020/05/11/Top-Down-Introd...

Good luck with your project.