Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model

11/29/2021
by   Lies Hadjadj, et al.
0

We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. The pseudo-labeled examples are then added to the training set, and a new classifier is learned. This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Self-Training: A Survey

In recent years, semi-supervised algorithms have received a lot of inter...
research
09/29/2021

Multi-class Probabilistic Bounds for Self-learning

Self-learning is a classical approach for learning with both labeled and...
research
07/02/2016

Rademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm

We propose Rademacher complexity bounds for multiclass classifiers train...
research
11/12/2019

Semi-supervised Wrapper Feature Selection with Imperfect Labels

In this paper, we propose a new wrapper approach for semi-supervised fea...
research
06/25/2021

Self-training Converts Weak Learners to Strong Learners in Mixture Models

We consider a binary classification problem when the data comes from a m...
research
07/15/2020

How to trust unlabeled data? Instance Credibility Inference for Few-Shot Learning

Deep learning based models have excelled in many computer vision task an...
research
11/15/2018

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervi...

Please sign up or login with your details

Forgot password? Click here to reset