Universal Clustering via Crowdsourcing

10/05/2016
by   Ravi Kiran Raman, et al.
0

Consider unsupervised clustering of objects drawn from a discrete set, through the use of human intelligence available in crowdsourcing platforms. This paper defines and studies the problem of universal clustering using responses of crowd workers, without knowledge of worker reliability or task difficulty. We model stochastic worker response distributions by incorporating traits of memory for similar objects and traits of distance among differing objects. We are particularly interested in two limiting worker types---temporary workers who retain no memory of responses and long-term workers with memory. We first define clustering algorithms for these limiting cases and then integrate them into an algorithm for the unified worker model. We prove asymptotic consistency of the algorithms and establish sufficient conditions on the sample complexity of the algorithm. Converse arguments establish necessary conditions on sample complexity, proving that the defined algorithms are asymptotically order-optimal in cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2017

Autocompletion interfaces make crowd workers slower, but their use promotes response diversity

Creative tasks such as ideation or question proposal are powerful applic...
research
09/04/2018

Parity Crowdsourcing for Cooperative Labeling

Consider a database of k objects, e.g., a set of videos, where each obje...
research
09/15/2016

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Microtask crowdsourcing is increasingly critical to the creation of extr...
research
06/24/2019

Measuring the Expertise of Workers for Crowdsourcing Applications

Crowdsourcing platforms enable companies to propose tasks to a large cro...
research
09/21/2022

Clustering Without Knowing How To: Application and Evaluation

Crowdsourcing allows running simple human intelligence tasks on a large ...
research
07/05/2022

Unsupervised Crowdsourcing with Accuracy and Cost Guarantees

We consider the problem of cost-optimal utilization of a crowdsourcing p...
research
05/24/2022

The Data-Production Dispositif

Machine learning (ML) depends on data to train and verify models. Very o...

Please sign up or login with your details

Forgot password? Click here to reset