In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

03/02/2023
by   Julian Rodemann, et al.
0

Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we aim at rendering PLS more robust towards the involved modeling assumptions. To this end, we propose to select pseudo-labeled data that maximize a multi-objective utility function. The latter is constructed to account for different sources of uncertainty, three of which we discuss in more detail: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian alpha-cut updating rule for credal sets. As a practical proof of concept, we spotlight the application of three of our robust extensions on simulated and real-world data. Results suggest that in particular robustness w.r.t. model choice can lead to substantial accuracy gains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Doubly Robust Self-Training

Self-training is an important technique for solving semi-supervised lear...
research
02/17/2023

Approximate Bayes Optimal Pseudo-Label Selection

Semi-supervised learning by self-training heavily relies on pseudo-label...
research
05/19/2020

Self-Updating Models with Error Remediation

Many environments currently employ machine learning models for data proc...
research
03/15/2021

Semi-supervised learning by selective training with pseudo labels via confidence estimation

We propose a novel semi-supervised learning (SSL) method that adopts sel...
research
01/03/2022

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

Most semi-supervised learning methods over-sample labeled data when cons...
research
03/31/2021

The GIST and RIST of Iterative Self-Training for Semi-Supervised Segmentation

We consider the task of semi-supervised semantic segmentation, where we ...
research
10/15/2022

How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?

This paper provides an exact characterization of the expected generaliza...

Please sign up or login with your details

Forgot password? Click here to reset