Part-of-Speech tagging is a very useful foundation of NLP and deeper NLP concepts. The action involves analysing text and returning the tokens of the text with the appropriate Part-of-Speech (POS) tags.
Read this blog post on my website for a better experience (dev.to please implement markdown tables)!
- ADJ: adjective -> super cat
- ADP: apposition -> my friend Liam
- ADV: adverb -> The boy ran quickly
- AUX: auxiliary -> He will make
- CCONJ: coordinating conjunction -> this and that
- DET: determiner -> the squirrel
- INTJ: interjection -> Oi!
- NOUN: noun -> the dog
- NUM: numeral -> one fork
- PART: particle -> not Sam ’s
- PRON: pronoun -> he shouts
- PROPN: proper noun -> Helsinki is beautiful
- PUNCT: punctuation -> Hello , there !
- SCONJ: subordinating conjunction -> I do this while they do that
- SYM: symbol -> 5 + 10
- VERB: verb -> They think that
- X: other -> this asd;lfkjasd;flkj
Source: Universal POS tags
Above is a list of the standardised POS tags. You will probably recognise many of them from English class.
Courtesy of Spacy’s visualiser (as always!), below I have included an example where the phrase “This is my house while I live here” has been analysed and POS tags have been assigned. You can see that the tags line up with what would be expected from the list above.
There are many different ways that POS Tags are assigned. These methods include dynamic programming algorithms (such as the popular Viterbi algorithm). It is common to see hidden Markov models implemented in POS-tagging algorithms. As with many modern NLP elements, machine learning is now very popular for POS-tagging. In this case, computers analyse corpora of text which has been appropriately tagged already and try to learn how to tag tokens itself.