An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

01/03/2022
by   Miquel Martí i Rabadán, et al.
0

Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Doubly Robust Self-Training

Self-training is an important technique for solving semi-supervised lear...
research
03/15/2021

Semi-supervised learning by selective training with pseudo labels via confidence estimation

We propose a novel semi-supervised learning (SSL) method that adopts sel...
research
03/13/2020

Learning Graph Embedding with Limited Labeled Data: An Efficient Sampling Approach

Semi-supervised graph embedding methods represented by graph convolution...
research
04/19/2021

Labels, Information, and Computation: Efficient, Privacy-Preserving Learning Using Sufficient Labels

In supervised learning, obtaining a large set of fully-labeled training ...
research
09/18/2023

Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

Recent semi-supervised learning (SSL) methods typically include a filter...
research
03/02/2023

In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

Self-training is a simple yet effective method within semi-supervised le...
research
02/20/2019

Learning with Inadequate and Incorrect Supervision

Practically, we are often in the dilemma that the labeled data at hand a...

Please sign up or login with your details

Forgot password? Click here to reset