Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

05/31/2021
by   Xin Zhang, et al.
0

Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers. Previous studies focus on reducing the influences from the noises of the crowdsourced annotations for supervised models. We take a different point in this work, regarding all crowdsourced annotations as gold-standard with respect to the individual annotators. In this way, we find that crowdsourcing could be highly similar to domain adaptation, and then the recent advances of cross-domain methods can be almost directly applied to crowdsourcing. Here we take named entity recognition (NER) as a study case, suggesting an annotator-aware representation learning model that inspired by the domain adaptation methods which attempt to capture effective domain-aware features. We investigate both unsupervised and supervised crowdsourcing learning, assuming that no or only small-scale expert annotations are available. Experimental results on a benchmark crowdsourced NER dataset show that our method is highly effective, leading to a new state-of-the-art performance. In addition, under the supervised setting, we can achieve impressive performance gains with only a very small scale of expert annotations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

Zero-Resource Cross-Domain Named Entity Recognition

Existing models for cross-domain named entity recognition (NER) rely on ...
research
04/22/2022

Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations

Recent works of opinion expression identification (OEI) rely heavily on ...
research
06/10/2015

On-the-Job Learning with Bayesian Decision Theory

Our goal is to deploy a high-accuracy system starting with zero training...
research
10/15/2018

Neural Adaptation Layers for Cross-domain Named Entity Recognition

Recent research efforts have shown that neural architectures can be effe...
research
08/31/2017

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perfor...
research
09/09/2021

Truth Discovery in Sequence Labels from Crowds

Annotations quality and quantity positively affect the performance of se...
research
06/07/2017

Early Experiences with Crowdsourcing Airway Annotations in Chest CT

Measuring airways in chest computed tomography (CT) images is important ...

Please sign up or login with your details

Forgot password? Click here to reset