DeepAI AI Chat
Log In Sign Up

Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations

by   Xin Zhang, et al.

Recent works of opinion expression identification (OEI) rely heavily on the quality and scale of the manually-constructed training corpus, which could be extremely difficult to satisfy. Crowdsourcing is one practical solution for this problem, aiming to create a large-scale but quality-unguaranteed corpus. In this work, we investigate Chinese OEI with extremely-noisy crowdsourcing annotations, constructing a dataset at a very low cost. Following zhang et al. (2021), we train the annotator-adapter model by regarding all annotations as gold-standard in terms of crowd annotators, and test the model by using a synthetic expert, which is a mixture of all annotators. As this annotator-mixture for testing is never modeled explicitly in the training phase, we propose to generate synthetic training samples by a pertinent mixup strategy to make the training and testing highly consistent. The simulation experiments on our constructed dataset show that crowdsourcing is highly promising for OEI, and our proposed annotator-mixup can further enhance the crowdsourcing modeling.


page 1

page 2

page 3

page 4


Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

Crowdsourcing is regarded as one prospective solution for effective supe...

Crowdsourcing Argumentation Structures in Chinese Hotel Reviews

Argumentation mining aims at automatically extracting the premises-claim...

Crowdsourcing a High-Quality Gold Standard for QA-SRL

Question-answer driven Semantic Role Labeling (QA-SRL) has been proposed...

Deep Robust Subjective Visual Property Prediction in Crowdsourcing

The problem of estimating subjective visual properties (SVP) of images (...

CrowdTeacher: Robust Co-teaching with Noisy Answers Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous...

Adversarial Learning for Chinese NER from Crowd Annotations

To quickly obtain new labeled data, we can choose crowdsourcing as an al...

A Crowdsourcing Extension of the ITU-T Recommendation P.835 with Validation

The quality of the speech communication systems, which include noise sup...