DeepAI AI Chat
Log In Sign Up

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

by   Miquel Martí i Rabadán, et al.
KTH Royal Institute of Technology

Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.


page 1

page 2

page 3

page 4


Semi-supervised learning by selective training with pseudo labels via confidence estimation

We propose a novel semi-supervised learning (SSL) method that adopts sel...

Why pseudo label based algorithm is effective? –from the perspective of pseudo labeled data

Recently, pseudo label based semi-supervised learning has achieved great...

Learning Graph Embedding with Limited Labeled Data: An Efficient Sampling Approach

Semi-supervised graph embedding methods represented by graph convolution...

Labels, Information, and Computation: Efficient, Privacy-Preserving Learning Using Sufficient Labels

In supervised learning, obtaining a large set of fully-labeled training ...

In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

Self-training is a simple yet effective method within semi-supervised le...

One-bit Supervision for Image Classification

This paper presents one-bit supervision, a novel setting of learning fro...

Learning with Inadequate and Incorrect Supervision

Practically, we are often in the dilemma that the labeled data at hand a...