Multi-class Probabilistic Bounds for Self-learning

09/29/2021
by   Vasilii Feofanov, et al.
0

Self-learning is a classical approach for learning with both labeled and unlabeled observations which consists in giving pseudo-labels to unlabeled training instances with a confidence score over a predetermined threshold. At the same time, the pseudo-labeling technique is prone to error and runs the risk of adding noisy labels into unlabeled training data. In this paper, we present a probabilistic framework for analyzing self-learning in the multi-class classification scenario with partially labeled data. First, we derive a transductive bound over the risk of the multi-class majority vote classifier. Based on this result, we propose to automatically choose the threshold for pseudo-labeling that minimizes the transductive bound. Then, we introduce a mislabeling error model to analyze the error of the majority vote classifier in the case of the pseudo-labeled data. We derive a probabilistic C-bound over the majority vote error when an imperfect label is given. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2019

Semi-supervised Wrapper Feature Selection with Imperfect Labels

In this paper, we propose a new wrapper approach for semi-supervised fea...
research
05/11/2022

Multi-Class 3D Object Detection with Single-Class Supervision

While multi-class 3D detectors are needed in many robotics applications,...
research
01/31/2022

Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection

Pseudo-labeling solutions for positive-unlabeled (PU) learning have the ...
research
11/29/2021

Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model

We investigate the generalization properties of a self-training algorith...
research
02/24/2022

Self-Training: A Survey

In recent years, semi-supervised algorithms have received a lot of inter...
research
01/26/2020

An interpretable semi-supervised classifier using two different strategies for amended self-labeling

In the context of some machine learning applications, obtaining data ins...
research
01/25/2022

DebtFree: Minimizing Labeling Cost in Self-Admitted Technical Debt Identification using Semi-Supervised Learning

Keeping track of and managing Self-Admitted Technical Debts (SATDs) is i...

Please sign up or login with your details

Forgot password? Click here to reset