DEV Community


Active Learning for Natural Language Processing

crowintelligence profile image crowintelligence ・2 min read

More than 90% of machine learning applications improve with human feedback. For example, a model that classifying news articles into pre-defined topics has been trained on 1000s of examples where humans have manually annotated the topics. However, if there are tens of millions of news articles, it might not be feasible to manually annotate even 1% of them. If we only sample randomly, we will mostly get popular topics like “politics” that the machine learning model can already identify accurately. So, we need to be smarter about how we sample. This talk is about “Active Learning”, the process of deciding what raw data is the most optimal for human review, covering: Uncertainty Sampling; Diversity Sampling; and some advanced methods like Active Transfer Learning.

Robert Munro has worked as a leader at several Silicon Valley machine learning companies and also led AWS’s first Natural Language Processing and Machine Translation solutions. Robert is the author of Human-in-the-Loop Machine Learning, covering practical methods for Active Learning, Transfer Learning, and Annotation. Robert organizes Bay Area NLP, the world’s largest community of Language Technology professionals. Robert is also a disaster responder and is currently helping with the response to COVID-19.

The slides are available on this link.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Discussion (0)

Editor guide