DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

10/26/2021
by   Yi Xu, et al.
0

The scarcity of labeled data is a critical obstacle to deep learning. Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels. However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low quality of learned pseudo labels. In this paper, we propose a new SSL method called DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning (MCL) based approach to automatically generate LFs from scratch in SSL style. With the noisy labels produced by the LFs, we design a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples. Extensive experiments on four standard SSL benchmarks show that DP-SSL can provide reliable labels for unlabeled data and achieve better classification performance on test sets than existing SSL methods, especially when only a small number of labeled samples are available. Concretely, for CIFAR-10 with only 40 labeled samples, DP-SSL achieves 93.82 annotation accuracy on unlabeled data and 93.46 test data, which are higher than the SOTA results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2022

Pseudo-Labeling Based Practical Semi-Supervised Meta-Training for Few-Shot Learning

Most existing few-shot learning (FSL) methods require a large amount of ...
research
03/27/2023

ScarceNet: Animal Pose Estimation with Scarce Annotations

Animal pose estimation is an important but under-explored task due to th...
research
12/04/2019

Large-Scale Semi-Supervised Learning via Graph Structure Learning over High-Dense Points

We focus on developing a novel scalable graph-based semi-supervised lear...
research
06/24/2020

Labeled Optimal Partitioning

In data sequences measured over space or time, an important problem is a...
research
07/29/2016

A Non-Parametric Learning Approach to Identify Online Human Trafficking

Human trafficking is among the most challenging law enforcement problems...
research
06/27/2023

Biclustering random matrix partitions with an application to classification of forensic body fluids

Classification of unlabeled data is usually achieved by supervised learn...
research
10/12/2020

Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

Unlabeled data learning has attracted considerable attention recently. H...

Please sign up or login with your details

Forgot password? Click here to reset