On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

01/15/2023
by   Lu Han, et al.
0

When there are unlabeled Out-Of-Distribution (OOD) data from other classes, Semi-Supervised Learning (SSL) methods suffer from severe performance degradation and even get worse than merely training on labeled data. In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL. PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data according to the model's prediction. We aim to answer two main questions: (1) How do OOD data influence PL? (2) What is the proper usage of OOD data with PL? First, we show that the major problem of PL is imbalanced pseudo-labels on OOD data. Second, we find that OOD data can help classify In-Distribution (ID) data given their OOD ground truth labels. Based on the findings, we propose to improve PL in class-mismatched SSL with two components – Re-balanced Pseudo-Labeling (RPL) and Semantic Exploration Clustering (SEC). RPL re-balances pseudo-labels of high-confidence data, which simultaneously filters out OOD data and addresses the imbalance problem. SEC uses balanced clustering on low-confidence data to create pseudo-labels on extra classes, simulating the process of training with ground truth. Experiments show that our method achieves steady improvement over supervised baseline and state-of-the-art performance under all class mismatch ratios on different benchmarks.

READ FULL TEXT

page 4

page 11

research
06/13/2022

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning

Recent state-of-the-art methods in semi-supervised learning (SSL) combin...
research
07/11/2021

Learning from Crowds with Sparse and Imbalanced Annotations

Traditional supervised learning requires ground truth labels for the tra...
research
03/13/2023

InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

Recent state-of-the-art methods in imbalanced semi-supervised learning (...
research
08/12/2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pse...
research
01/26/2023

SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

The critical challenge of Semi-Supervised Learning (SSL) is how to effec...
research
01/19/2021

On The Consistency Training for Open-Set Semi-Supervised Learning

Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, ach...
research
08/23/2023

Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch

Semi-Supervised Learning (SSL) under class distribution mismatch aims to...

Please sign up or login with your details

Forgot password? Click here to reset