Leveraging Structure for Improved Classification of Grouped Biased Data

12/07/2022
by   Daniel Zeiberg, et al.
0

We consider semi-supervised binary classification for applications in which data points are naturally grouped (e.g., survey responses grouped by state) and the labeled data is biased (e.g., survey respondents are not representative of the population). The groups overlap in the feature space and consequently the input-output patterns are related across the groups. To model the inherent structure in such data, we assume the partition-projected class-conditional invariance across groups, defined in terms of the group-agnostic feature space. We demonstrate that under this assumption, the group carries additional information about the class, over the group-agnostic features, with provably improved area under the ROC curve. Further assuming invariance of partition-projected class-conditional distributions across both labeled and unlabeled data, we derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-calibrated classifier, despite the bias in the labeled data. Experiments on synthetic and real data demonstrate the efficacy of our algorithm over suitable baselines and ablative models, spanning standard supervised and semi-supervised learning approaches, with and without incorporating the group directly as a feature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2012

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Learning algorithms normally assume that there is at most one annotation...
research
06/29/2022

On Non-Random Missing Labels in Semi-Supervised Learning

Semi-Supervised Learning (SSL) is fundamentally a missing label problem,...
research
08/19/2017

Semi-supervised Conditional GANs

We introduce a new model for building conditional generative models in a...
research
04/20/2020

Local Clustering with Mean Teacher for Semi-supervised Learning

The Mean Teacher (MT) model of Tarvainen and Valpola has shown favorable...
research
12/09/2022

A soft nearest-neighbor framework for continual semi-supervised learning

Despite significant advances, the performance of state-of-the-art contin...
research
10/26/2020

Learning from Label Proportions by Optimizing Cluster Model Selection

In a supervised learning scenario, we learn a mapping from input to outp...
research
05/30/2017

Semi-Supervised Learning for Detecting Human Trafficking

Human trafficking is one of the most atrocious crimes and among the chal...

Please sign up or login with your details

Forgot password? Click here to reset