A random matrix analysis and improvement of semi-supervised learning for large dimensional data

11/09/2017
by   Xiaoyi Mai, et al.
0

This article provides an original understanding of the behavior of a class of graph-oriented semi-supervised learning algorithms in the limit of large and numerous data. It is demonstrated that the intuition at the root of these methods collapses in this limit and that, as a result, most of them become inconsistent. Corrective measures and a new data-driven parametrization scheme are proposed along with a theoretical analysis of the asymptotic performances of the resulting approach. A surprisingly close behavior between theoretical performances on Gaussian mixture models and on real datasets is also illustrated throughout the article, thereby suggesting the importance of the proposed analysis for dealing with practical data. As a result, significant performance gains are observed on practical data classification using the proposed parametrization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2023

Asymptotic Bayes risk of semi-supervised multitask learning on Gaussian mixture

The article considers semi-supervised multitask learning on a Gaussian m...
research
11/17/2020

Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining

The goal of semi-supervised learning is to utilize the unlabeled, in-dom...
research
09/13/2020

Semi-supervised dictionary learning with graph regularization and active points

Supervised Dictionary Learning has gained much interest in the recent de...
research
06/13/2020

Consistent Semi-Supervised Graph Regularization for High Dimensional Data

Semi-supervised Laplacian regularization, a standard graph-based approac...
research
02/13/2018

Clustering and Semi-Supervised Classification for Clickstream Data via Mixture Models

Finite mixture models have been used for unsupervised learning for over ...
research
02/26/2023

Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R

Semi-supervised learning is being extensively applied to estimate classi...
research
10/28/2020

Data-driven prediction of multistable systems from sparse measurements

We develop a data-driven method, based on semi-supervised classification...

Please sign up or login with your details

Forgot password? Click here to reset