A Streaming Algorithm for Crowdsourced Data Classification

02/23/2016
by   Thomas Bonald, et al.
0

We propose a streaming algorithm for the binary classification of data based on crowdsourcing. The algorithm learns the competence of each labeller by comparing her labels to those of other labellers on the same tasks and uses this information to minimize the prediction error rate on each task. We provide performance guarantees of our algorithm for a fixed population of independent labellers. In particular, we show that our algorithm is optimal in the sense that the cumulative regret compared to the optimal decision with known labeller error probabilities is finite, independently of the number of tasks to label. The complexity of the algorithm is linear in the number of labellers and the number of tasks, up to some logarithmic factors. Numerical experiments illustrate the performance of our algorithm compared to existing algorithms, including simple majority voting and expectation-maximization algorithms, on both synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2013

Error Rate Bounds in Crowdsourcing Models

Crowdsourcing is an effective tool for human-powered computation on many...
research
11/15/2014

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

Crowdsourcing has become an effective and popular tool for human-powered...
research
10/01/2020

StreamSoNG: A Soft Streaming Classification Approach

Examining most streaming clustering algorithms leads to the understandin...
research
09/01/2016

Ten Steps of EM Suffice for Mixtures of Two Gaussians

The Expectation-Maximization (EM) algorithm is a widely used method for ...
research
02/21/2017

Stochastic Canonical Correlation Analysis

We tightly analyze the sample complexity of CCA, provide a learning algo...
research
06/22/2019

Unsupervised Ensemble Classification with Dependent Data

Ensemble learning, the machine learning paradigm where multiple algorith...
research
03/21/2020

Crowdsourced Labeling for Worker-Task Specialization Block Model

We consider crowdsourced labeling under a worker-task specialization blo...

Please sign up or login with your details

Forgot password? Click here to reset