Data-Centric Semi-Supervised Learning

10/06/2021
by   Xudong Wang, et al.
0

We study unsupervised data selection for semi-supervised learning (SSL), where a large-scale unlabeled data is available and a small subset of data is budgeted for label acquisition. Existing SSL methods focus on learning a model that effectively integrates information from given small labeled data and large unlabeled data, whereas we focus on selecting the right data for SSL without any label or task information, in an also stark contrast to supervised data selection for active learning. Intuitively, instances to be labeled shall collectively have maximum diversity and coverage for downstream tasks, and individually have maximum information propagation utility for SSL. We formalize these concepts in a three-step data-centric SSL method that improves FixMatch in stability and accuracy by 8 ImageNet-1K (0.2 careful labeled data selection brings big annotation efficiency and model performance gain without changing the learning pipeline. Our completely unsupervised data selection can be easily extended to other weakly supervised learning settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2012

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Learning algorithms normally assume that there is at most one annotation...
research
11/30/2020

MUSCLE: Strengthening Semi-Supervised Learning Via Concurrent Unsupervised Learning Using Mutual Information Maximization

Deep neural networks are powerful, massively parameterized machine learn...
research
04/28/2022

On tuning a mean-field model for semi-supervised classification

Semi-supervised learning (SSL) has become an interesting research area d...
research
03/15/2023

Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and Consistency

Active Learning (AL) and Semi-supervised Learning are two techniques tha...
research
04/08/2021

Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Deep learning (DL) relies on massive amounts of labeled data, and improv...
research
08/02/2023

Semi-supervised Cooperative Learning for Multiomics Data Fusion

Multiomics data fusion integrates diverse data modalities, ranging from ...
research
02/12/2018

Fast Interactive Image Retrieval using large-scale unlabeled data

An interactive image retrieval system learns which images in the databas...

Please sign up or login with your details

Forgot password? Click here to reset