Approximate Bayes Optimal Pseudo-Label Selection

02/17/2023
by   Julian Rodemann, et al.
0

Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). The selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace's method and the Gaussian integral. We empirically assess BPLS for parametric generalized linear and non-parametric generalized additive models on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.

READ FULL TEXT

page 6

page 7

page 8

research
03/02/2023

In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

Self-training is a simple yet effective method within semi-supervised le...
research
08/31/2022

Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised Text Recognition

This paper looks at semi-supervised learning (SSL) for image-based text ...
research
12/22/2021

Barely-Supervised Learning: Semi-Supervised Learning with very few labeled images

This paper tackles the problem of semi-supervised learning when the set ...
research
10/21/2019

Safe-Bayesian Generalized Linear Regression

We study generalized Bayesian inference under misspecification, i.e. whe...
research
09/06/2021

Bayesian data selection

Insights into complex, high-dimensional data can be obtained by discover...
research
01/01/2021

The Bayesian Method of Tensor Networks

Bayesian learning is a powerful learning framework which combines the ex...
research
02/11/2019

Domain Constraint Approximation based Semi Supervision

Deep learning for supervised learning has achieved astonishing performan...

Please sign up or login with your details

Forgot password? Click here to reset