Debiased Pseudo Labeling in Self-Training

02/15/2022
by   Baixu Chen, et al.
0

Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. However, large-scale annotations are time-consuming and labor-exhaustive to obtain on realistic tasks. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. Despite its popularity, pseudo labeling is well-believed to be unreliable and often leads to training instability. Our experimental studies further reveal that the performance of self-training is biased due to data sampling, pre-trained models, and training strategies, especially the inappropriate utilization of pseudo labels. To this end, we propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads. To further improve the quality of pseudo labels, we introduce a worst-case estimation of pseudo labeling and seamlessly optimize the representations to avoid the worst-case. Extensive experiments justify that the proposed Debiased not only yields an average improvement of 14.4% against state-of-the-art algorithms on 11 tasks (covering generic object recognition, fine-grained object recognition, texture classification, and scene classification) but also helps stabilize training and balance performance across classes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2023

Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data

Recent progress in semi- and self-supervised learning has caused a rift ...
research
02/25/2021

Self-Tuning for Data-Efficient Deep Learning

Deep learning has made revolutionary advances to diverse applications in...
research
10/09/2021

X-model: Improving Data Efficiency in Deep Learning with A Minimax Model

To mitigate the burden of data labeling, we aim at improving data effici...
research
02/05/2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small am...
research
10/07/2020

Adaptive Self-training for Few-shot Neural Sequence Labeling

Neural sequence labeling is an important technique employed for many Nat...
research
07/13/2023

Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues

Answer selection in open-domain dialogues aims to select an accurate ans...
research
01/27/2022

Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift

Graph Convolutional Networks (GCNs) have recently attracted vast interes...

Please sign up or login with your details

Forgot password? Click here to reset