Impact of Labelled Set Selection and Supervision Policies on Semi-supervised Learning

11/27/2022
by   Shuvendu Roy, et al.
0

In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important. Existing literature on semi-supervised learning randomly sample a limited number of data points for labelling. All these labelled samples are then used along with the unlabelled data throughout the training process. In this work, we ask two important questions in this context: (1) does it matter which samples are selected for labelling? (2) does it matter how the labelled samples are used throughout the training process along with the unlabelled data? To answer the first question, we explore a number of unsupervised methods for selecting specific subsets of data to label (without prior knowledge of their labels), with the goal of maximizing representativeness w.r.t. the unlabelled set. Then, for our second line of inquiry, we define a variety of different label injection strategies in the training process. Extensive experiments on four popular datasets, CIFAR-10, CIFAR-100, SVHN, and STL-10, show that unsupervised selection of samples that are more representative of the entire data improves performance by up to  2 over the existing semi-supervised frameworks such as MixMatch, ReMixMatch, FixMatch and others with random sample labelling. We show that this boost could even increase to 7.5 that gradually injecting the labels throughout the training procedure does not impact the performance considerably versus when all the existing labels are used throughout the entire training.

READ FULL TEXT

page 1

page 10

page 11

research
05/10/2021

Boosting Semi-Supervised Face Recognition with Noise Robustness

Although deep face recognition benefits significantly from large-scale t...
research
08/22/2020

Data Programming using Semi-Supervision and Subset Selection

The paradigm of data programming <cit.> has shown a lot of promise in us...
research
05/21/2019

Semi-Supervised Learning with Scarce Annotations

While semi-supervised learning (SSL) algorithms provide an efficient way...
research
10/28/2022

When does mixup promote local linearity in learned representations?

Mixup is a regularization technique that artificially produces new sampl...
research
07/07/2022

Semi-supervised Object Detection via Virtual Category Learning

Due to the costliness of labelled data in real-world applications, semi-...
research
12/14/2020

Effective and Efficient Data Poisoning in Semi-Supervised Learning

Semi-Supervised Learning (SSL) aims to maximize the benefits of learning...
research
12/22/2019

Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Acquiring ground truth labels for unlabelled data can be a costly proced...

Please sign up or login with your details

Forgot password? Click here to reset