They are Not Completely Useless: Towards Recycling Transferable Unlabeled Data for Class-Mismatched Semi-Supervised Learning

11/27/2020
by   Huang Zhuo, et al.
0

Semi-Supervised Learning (SSL) with mismatched classes deals with the problem that the classes-of-interests in the limited labeled data is only a subset of the classes in massive unlabeled data. As a result, the classes only possessed by the unlabeled data may mislead the classifier training and thus hindering the realistic landing of various SSL methods. To solve this problem, existing methods usually divide unlabeled data to in-distribution (ID) data and out-of-distribution (OOD) data, and directly discard or weaken the OOD data to avoid their adverse impact. In other words, they treat OOD data as completely useless and thus the potential valuable information for classification contained by them is totally ignored. To remedy this defect, this paper proposes a "Transferable OOD data Recycling" (TOOR) method which properly utilizes ID data as well as the "recyclable" OOD data to enrich the information for conducting class-mismatched SSL. Specifically, TOOR firstly attributes all unlabeled data to ID data or OOD data, among which the ID data are directly used for training. Then we treat the OOD data that have a close relationship with ID data and labeled data as recyclable, and employ adversarial domain adaptation to project them to the space of ID data and labeled data. In other words, the recyclability of an OOD datum is evaluated by its transferability, and the recyclable OOD data are transferred so that they are compatible with the distribution of known classes-of-interests. Consequently, our TOOR method extracts more information from unlabeled data than existing approaches, so it can achieve the improved performance which is demonstrated by the experiments on typical benchmark datasets.

READ FULL TEXT

page 1

page 8

research
07/22/2020

Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning

Semi-supervised learning (SSL) has been proposed to leverage unlabeled d...
research
10/13/2022

Exploiting Mixed Unlabeled Data for Detecting Samples of Seen and Unseen Out-of-Distribution Classes

Out-of-Distribution (OOD) detection is essential in real-world applicati...
research
09/16/2022

Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection

Out-of-distribution (OOD) detection is the key to deploying models safel...
research
06/30/2023

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

In this paper, we address a complex but practical scenario in semi-super...
research
07/24/2019

Discriminative Consistent Domain Generation for Semi-supervised Learning

Deep learning based task systems normally rely on a large amount of manu...
research
06/23/2022

Few-Shot Non-Parametric Learning with Deep Latent Variable Model

Most real-world problems that machine learning algorithms are expected t...
research
06/29/2022

On Non-Random Missing Labels in Semi-Supervised Learning

Semi-Supervised Learning (SSL) is fundamentally a missing label problem,...

Please sign up or login with your details

Forgot password? Click here to reset