Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

07/08/2019
by   Marc Lelarge, et al.
1

Semi-supervised learning (SSL) uses unlabeled data for training and has been shown to greatly improve performances when compared to a supervised approach on the labeled data available. This claim depends both on the amount of labeled data available and on the algorithm used. In this paper, we compute analytically the gap between the best fully-supervised approach on labeled data and the best semi-supervised approach using both labeled and unlabeled data. We quantify the best possible increase in performance obtained thanks to the unlabeled data, i.e. we compute the accuracy increase due to the information contained in the unlabeled data. Our work deals with a simple high-dimensional Gaussian mixture model for the data in a Bayesian setting. Our rigorous analysis builds on recent theoretical breakthroughs in high-dimensional inference and a large body of mathematical tools from statistical physics initially developed for spin glasses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2012

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learni...
research
03/03/2023

Asymptotic Bayes risk of semi-supervised multitask learning on Gaussian mixture

The article considers semi-supervised multitask learning on a Gaussian m...
research
04/28/2022

On tuning a mean-field model for semi-supervised classification

Semi-supervised learning (SSL) has become an interesting research area d...
research
01/13/2019

Gradient Regularized Budgeted Boosting

As machine learning transitions increasingly towards real world applicat...
research
09/14/2020

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the...
research
09/17/2015

Sparse Fisher's Linear Discriminant Analysis for Partially Labeled Data

Classification is an important tool with many useful applications. Among...
research
07/12/2012

A Hierarchical Graphical Model for Record Linkage

The task of matching co-referent records is known among other names as r...

Please sign up or login with your details

Forgot password? Click here to reset