- Defining Named Entity Recognition
- What makes NER Difficult
- Current Approaches
- Experimenting With NER
This blog post is post number two in a series aimed at making Natural Language Processing topics more accessible by abstracting these topics away from mathematics and implementation details.
Named Entity Recognition (NER) is the detection and labelling of named entities (organisations, people, locations etc.) within a text.
Here is an example of what an NER analysis returns on a given sentence (using spaCy's NER and visualiser):
Tags (labels such as organisation) can vary between different systems and implementations but they will usually include:
- (GPE) Geo-Political Entities
Other categories you may encounter include location, money and time.
At first glance, the task can seem very easy. However, after considering a few examples it can equally quickly become clear how the task could be troublesome for a machine.
“First Lady Makes an Appearance at John Smith’s”
Are we discussing the first lady to ever exist or are we talking about the president’s wife here? Is John Smith's a pub, a person's house or something completely different? And, more importantly, how can a computer know?
There are several different approaches that people use to meet their NER requirements. Generally, these can be broken down into techniques using statistics (including machine learning) and techniques based on grammar. Obviously most implementations take a combination of both of these approaches.
- More grammar-based techniques involve a higher labour cost but see much more accurate results.
- Statistical approaches produce slightly less accurate results but can analyse sample sets much quicker. A common bottleneck for statistical approaches is the amount of annotated data (these are sentences like in the picture above which are labelled correctly so that the computer can learn how to label too).
Now that you hopefully have a baseline understanding of what NER is, I could not end this post without suggesting using the NLP package spaCy's NER feature and visualiser (which I even used when making this post)!
You can find an interactive code playground here where you can change anything from the NLP pipeline or just the sentence to analyse.