Modeling sequential annotations for sequence labeling with crowds

09/20/2022
by   Xiaolei Lu, et al.
0

Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this paper, we propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to jointly model sequential data and annotators' expertise, in which categorical distribution is introduced to estimate the reliability of each annotator in capturing local and non-local label dependency for sequential annotation. To accelerate the marginalization of the proposed model, a valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations. VLSE derives possible ground-truth labels from the token-wise level and further prunes sub-paths in the forward inference for label sequence decoding. VLSE reduces the number of candidate label sequences and improves the quality of possible ground-truth label sequences. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.

READ FULL TEXT

page 1

page 8

page 9

research
01/04/2023

Learning Ambiguity from Crowd Sequential Annotations

Most crowdsourcing learning methods treat disagreement between annotator...
research
09/09/2021

Truth Discovery in Sequence Labels from Crowds

Annotations quality and quantity positively affect the performance of se...
research
10/09/2019

Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling

Sequence labeling is a fundamental framework for various natural languag...
research
11/04/2022

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...
research
09/20/2022

Partial sequence labeling with structured Gaussian Processes

Existing partial sequence labeling models mainly focus on max-margin fra...
research
04/26/2018

Weak Labeling for Crowd Learning

Crowdsourcing has become very popular among the machine learning communi...
research
10/12/2021

On Releasing Annotator-Level Labels and Information in Datasets

A common practice in building NLP datasets, especially using crowd-sourc...

Please sign up or login with your details

Forgot password? Click here to reset