DEV Community

loading...

Part-of-Speech Tagging Simplified

adamjhawley profile image Adam Hawley ・2 min read

Introduction

Part-of-Speech tagging is a very useful foundation of NLP and deeper NLP concepts. The action involves analysing text and returning the tokens of the text with the appropriate Part-of-Speech (POS) tags.

POS Tags

Read this blog post on my website for a better experience (dev.to please implement markdown tables)!

  1. ADJ: adjective -> super cat
  2. ADP: apposition -> my friend Liam
  3. ADV: adverb -> The boy ran quickly
  4. AUX: auxiliary -> He will make
  5. CCONJ: coordinating conjunction -> this and that
  6. DET: determiner -> the squirrel
  7. INTJ: interjection -> Oi!
  8. NOUN: noun -> the dog
  9. NUM: numeral -> one fork
  10. PART: particle -> not Sam ’s
  11. PRON: pronoun -> he shouts
  12. PROPN: proper noun -> Helsinki is beautiful
  13. PUNCT: punctuation -> Hello , there !
  14. SCONJ: subordinating conjunction -> I do this while they do that
  15. SYM: symbol -> 5 + 10
  16. VERB: verb -> They think that
  17. X: other -> this asd;lfkjasd;flkj

Source: Universal POS tags

Above is a list of the standardised POS tags. You will probably recognise many of them from English class.

Example

Courtesy of Spacy’s visualiser (as always!), below I have included an example where the phrase “This is my house while I live here” has been analysed and POS tags have been assigned. You can see that the tags line up with what would be expected from the list above.
Alt Text

How Are POS Tags Assigned?

There are many different ways that POS Tags are assigned. These methods include dynamic programming algorithms (such as the popular Viterbi algorithm). It is common to see hidden Markov models implemented in POS-tagging algorithms. As with many modern NLP elements, machine learning is now very popular for POS-tagging. In this case, computers analyse corpora of text which has been appropriately tagged already and try to learn how to tag tokens itself.

Discussion

pic
Editor guide