Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach

10/20/2019
by   Nan Lu, et al.
0

From two unlabeled (U) datasets with different class priors, we can train a binary classifier by empirical risk minimization, which is called UU classification. It is promising since UU methods are compatible with any neural network (NN) architecture and optimizer as if it is standard supervised classification. In this paper, however, we find that UU methods may suffer severe overfitting, and there is a high co-occurrence between the overfitting and the negative empirical risk regardless of datasets, NN architectures, and optimizers. Hence, to mitigate the overfitting problem of UU methods, we propose to keep two parts of the empirical risk (i.e., false positive and false negative) non-negative by wrapping them in a family of correction functions. We theoretically show that the corrected risk estimator is still asymptotically unbiased and consistent; furthermore we establish an estimation error bound for the corrected risk minimizer. Experiments with feedforward/residual NNs on standard benchmarks demonstrate that our proposed correction can successfully mitigate the overfitting of UU methods and significantly improve the classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2017

Positive-Unlabeled Learning with Non-Negative Risk Estimator

From only positive (P) and unlabeled (U) data, a binary classifier could...
research
07/04/2022

Learning from Multiple Unlabeled Datasets with Partial Risk Regularization

Recent years have witnessed a great success of supervised deep learning,...
research
06/12/2020

Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation

The estimation of the ratio of two probability densities has garnered at...
research
08/31/2018

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

Empirical risk minimization (ERM), with proper loss function and regular...
research
07/05/2020

Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

In weakly supervised learning, unbiased risk estimator(URE) is a powerfu...
research
04/26/2019

Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

Pairwise similarities and dissimilarities between data points might be e...
research
06/14/2020

A Neural Network Approach for Online Nonlinear Neyman-Pearson Classification

We propose a novel Neyman-Pearson (NP) classifier that is both online an...

Please sign up or login with your details

Forgot password? Click here to reset