Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

by   Shida Lei, et al.

To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from m U-sets for m≥2. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed data is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.



There are no comments yet.


page 11


On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

Empirical risk minimization (ERM), with proper loss function and regular...

Binary Classification from Positive-Confidence Data

Reducing labeling costs in supervised learning is a critical issue in ma...

Binary classification with ambiguous training data

In supervised learning, we often face with ambiguous (A) samples that ar...

Continuum centroid classifier for functional data

Aiming at the binary classification of functional data, we propose the c...

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

We study the problem of fair binary classification using the notion of E...

Classification from Positive and Biased Negative Data with Skewed Labeled Posterior Probability

The binary classification problem has a situation where only biased data...

The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models

Novel prediction methods should always be compared to a baseline to know...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.