Semi-Supervised Empirical Risk Minimization: When can unlabeled data improve prediction

09/01/2020
by   Oren Yuval, et al.
0

We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we provide a careful treatment of the effectiveness of the SSL to improve prediction performance. The key ideas are carefully considering the null model as a competitor, and utilizing the unlabeled data to determine signal-noise combinations where the SSL outperforms both the ERM learning and the null model. In the special case of linear regression with Gaussian covariates, we show that the previously suggested semi-supervised estimator is in fact not capable of improving on both the supervised estimator and the null model simultaneously. However, the new estimator presented in this work, can achieve an improvement of O(1/n) term over both competitors simultaneously. On the other hand, we show that in other scenarios, such as non-Gaussian covariates, misspecified linear regression, or generalized linear regression with non-linear link functions, having unlabeled data can derive substantial improvement in prediction by applying our suggested SSL approach. Moreover, it is possible to identify the usefulness of the SSL, by using the dedicated formulas we establish throughout this work. This is shown empirically through extensive simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2023

Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep learning

We present a methodology for using unlabeled data to design semi supervi...
research
07/13/2023

A zero-estimator approach for estimating the signal level in a high-dimensional regression setting

Analysis of high-dimensional data, where the number of covariates is lar...
research
11/28/2020

Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression

There are many scenarios such as the electronic health records where the...
research
06/27/2012

A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound

In this work, we develop a simple algorithm for semi-supervised regressi...
research
02/25/2016

Learning to Abstain from Binary Prediction

A binary classifier capable of abstaining from making a label prediction...
research
04/26/2019

Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

Pairwise similarities and dissimilarities between data points might be e...
research
01/31/2019

Semi-Supervised Ordinal Regression Based on Empirical Risk Minimization

We consider the semi-supervised ordinal regression problem, where unlabe...

Please sign up or login with your details

Forgot password? Click here to reset