Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds

05/31/2019
by   Peng Cao, et al.
1

Eliciting labels from crowds is a potential way to obtain large labeled data. Despite a variety of methods developed for learning from crowds, a key challenge remains unsolved: learning from crowds without knowing the information structure among the crowds a priori, when some people of the crowds make highly correlated mistakes and some of them label effortlessly (e.g. randomly). We propose an information theoretic approach, Max-MIG, for joint learning from crowds, with a common assumption: the crowdsourced labels and the data are independent conditioning on the ground truth. Max-MIG simultaneously aggregates the crowdsourced labels and learns an accurate data classifier. Furthermore, we devise an accurate data-crowds forecaster that employs both the data and the crowdsourced labels to forecast the ground truth. To the best of our knowledge, this is the first algorithm that solves the aforementioned challenge of learning from crowds. In addition to the theoretical validation, we also empirically show that our algorithm achieves the new state-of-the-art results in most settings, including the real-world data, and is the first algorithm that is robust to various information structures. Codes are available at https://github.com/Newbeeer/Max-MIGhttps://github.com/Newbeeer/Max-MIG

READ FULL TEXT
research
02/24/2018

Water from Two Rocks: Maximizing the Mutual Information

Our goal is to forecast ground truth Y using two sources of information ...
research
09/08/2019

L_DMI: An Information-theoretic Noise-robust Loss Function

Accurately annotating large scale dataset is notoriously expensive both ...
research
05/17/2023

Complementary Classifier Induced Partial Label Learning

In partial label learning (PLL), each training sample is associated with...
research
02/25/2023

Partial Label Learning for Emotion Recognition from EEG

Fully supervised learning has recently achieved promising performance in...
research
07/27/2020

openXDATA: A Tool for Multi-Target Data Generation and Missing Label Completion

A common problem in machine learning is to deal with datasets with disjo...
research
10/13/2022

Caption supervision enables robust learners

Vision language models like CLIP are robust to natural distribution shif...
research
03/21/2022

An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels

Pre-trained language models derive substantial linguistic and factual kn...

Please sign up or login with your details

Forgot password? Click here to reset