Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data

by   Bing Yu, et al.

The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled (PU) classification and conditional generation with extra unlabeled data simultaneously, both of which aim to make full use of agnostic unlabeled data to improve classification and generation performances. In particular, we present a novel training framework to jointly target both PU classification and conditional generation when exposing to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Conditional Generative Adversarial Network (CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Our key contribution is a Classifier-Noise-Invariant Conditional GAN (CNI-CGAN) that can learn the clean data distribution from noisy labels predicted by a PU classifier. Theoretically, we proved the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets, verifying the simultaneous improvements on both classification and generation.



There are no comments yet.


page 7

page 15


A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels

Positive-unlabeled learning refers to the process of training a binary c...

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Continual learning usually assumes the incoming data are fully labeled, ...

Learning Classifiers on Positive and Unlabeled Data with Policy Gradient

Existing algorithms aiming to learn a binary classifier from positive (P...

A Novel Semi-Supervised Data-Driven Method for Chiller Fault Diagnosis with Unlabeled Data

In practical chiller systems, applying efficient fault diagnosis techniq...

Diversify and Disambiguate: Learning From Underspecified Data

Many datasets are underspecified, which means there are several equally ...

Estimating the class prior and posterior from noisy positives and unlabeled data

We develop a classification algorithm for estimating posterior distribut...

On Information Regularization

We formulate a principle for classification with the knowledge of the ma...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.