Towards Realistic Single-Task Continuous Learning Research for NER

by   Justin Payan, et al.
University of Massachusetts Amherst

There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.


page 1

page 2

page 3

page 4


MasakhaNER: Named Entity Recognition for African Languages

We take a step towards addressing the under-representation of the Africa...

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Evaluating new techniques on realistic datasets plays a crucial role in ...

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Document digitization is essential for the digital transformation of our...

PEYMA: A Tagged Corpus for Persian Named Entities

The goal in the NER task is to classify proper nouns of a text into clas...

Open-world Machine Learning: Applications, Challenges, and Opportunities

Traditional machine learning especially supervised learning follows the ...

USFD: Twitter NER with Drift Compensation and Linked Data

This paper describes a pilot NER system for Twitter, comprising the USFD...

reproducing "ner and pos when nothing is capitalized"

Capitalization is an important feature in many NLP tasks such as Named E...

Please sign up or login with your details

Forgot password? Click here to reset