Self-training Converts Weak Learners to Strong Learners in Mixture Models

06/25/2021
by   Spencer Frei, et al.
2

We consider a binary classification problem when the data comes from a mixture of two isotropic distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant C_err>0 such that if a pseudolabeler β_pl can achieve classification error at most C_err, then for any ε>0, an iterative self-training algorithm initialized at β_0 := β_pl using pseudolabels ŷ = sgn(⟨β_t, 𝐱⟩) and using at most Õ(d/ε^2) unlabeled examples suffices to learn the Bayes-optimal classifier up to ε error, where d is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler β_pl with classification error C_err using only O(d) labeled examples (i.e., independent of ε). Together our results imply that mixture models can be learned to within ε of the Bayes-optimal accuracy using at most O(d) labeled examples and Õ(d/ε^2) unlabeled examples by way of a semi-supervised self-training algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

Semi-supervised Logistic Learning Based on Exponential Tilt Mixture Models

Consider semi-supervised learning for classification, where both labeled...
research
03/23/2021

The Success of AdaBoost and Its Application in Portfolio Management

We develop a novel approach to explain why AdaBoost is a successful clas...
research
06/19/2020

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

Self-training is a classical approach in semi-supervised learning which ...
research
11/29/2021

Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model

We investigate the generalization properties of a self-training algorith...
research
09/05/2018

Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Due to the limited amount and imbalanced classes of labeled training dat...
research
02/24/2022

Self-Training: A Survey

In recent years, semi-supervised algorithms have received a lot of inter...

Please sign up or login with your details

Forgot password? Click here to reset