Log In Sign Up

Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

by   Xin Zhang, et al.

Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers. Previous studies focus on reducing the influences from the noises of the crowdsourced annotations for supervised models. We take a different point in this work, regarding all crowdsourced annotations as gold-standard with respect to the individual annotators. In this way, we find that crowdsourcing could be highly similar to domain adaptation, and then the recent advances of cross-domain methods can be almost directly applied to crowdsourcing. Here we take named entity recognition (NER) as a study case, suggesting an annotator-aware representation learning model that inspired by the domain adaptation methods which attempt to capture effective domain-aware features. We investigate both unsupervised and supervised crowdsourcing learning, assuming that no or only small-scale expert annotations are available. Experimental results on a benchmark crowdsourced NER dataset show that our method is highly effective, leading to a new state-of-the-art performance. In addition, under the supervised setting, we can achieve impressive performance gains with only a very small scale of expert annotations.


page 1

page 2

page 3

page 4


Zero-Resource Cross-Domain Named Entity Recognition

Existing models for cross-domain named entity recognition (NER) rely on ...

Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations

Recent works of opinion expression identification (OEI) rely heavily on ...

On-the-Job Learning with Bayesian Decision Theory

Our goal is to deploy a high-accuracy system starting with zero training...

Neural Adaptation Layers for Cross-domain Named Entity Recognition

Recent research efforts have shown that neural architectures can be effe...

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perfor...

Truth Discovery in Sequence Labels from Crowds

Annotations quality and quantity positively affect the performance of se...

Early Experiences with Crowdsourcing Airway Annotations in Chest CT

Measuring airways in chest computed tomography (CT) images is important ...

Code Repositories


Crowdsourcing Learning as Domain Adaptation.

view repo