A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

11/19/2022
by   Yi Yang, et al.
0

Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to jointly and iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space and time complexities. In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network. From the dynamic model we derive two light-weight algorithms, LAonepass and LAtwopass, which can effectively and efficiently estimate worker qualities and true labels by traversing all the labels at most twice. Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data. We theoretically prove the convergence property of the proposed algorithms, and bound the error of estimated worker qualities. We also analyze the space and time complexities of the proposed algorithms and show that they are equivalent to those of majority voting. Experiments conducted on 20 real-world datasets demonstrate that the proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings even if they traverse all the labels at most twice.

READ FULL TEXT
research
01/13/2013

Crowd Labeling: a survey

Recently, there has been a burst in the number of research projects on h...
research
06/26/2019

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by usi...
research
08/18/2018

Exact Passive-Aggressive Algorithms for Learning to Rank Using Interval Labels

In this paper, we propose exact passive-aggressive (PA) online algorithm...
research
06/19/2017

Multi-Label Annotation Aggregation in Crowdsourcing

As a means of human-based computation, crowdsourcing has been widely use...
research
03/07/2018

Fast Dawid-Skene

Many real world problems can now be effectively solved using supervised ...
research
10/13/2022

Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Real-world data for classification is often labeled by multiple annotato...
research
06/24/2022

How many labelers do you have? A closer look at gold-standard labels

The construction of most supervised learning datasets revolves around co...

Please sign up or login with your details

Forgot password? Click here to reset