Instance-Dependent PU Learning by Bayesian Optimal Relabeling

08/07/2018
by   Fengxiang He, et al.
0

When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of X conditional on Y = 1, where X stands for the feature and Y the label. Most existing algorithms are optimally designed under the assumption. However, for many real-world applications, the observed positive examples are dependent on the conditional probability P(Y = 1|X) and should be sampled biasedly. In this paper, we assume that a positive example with a higher P(Y = 1|X) is more likely to be labelled and propose a probabilistic-gap based PU learning algorithms. Specifically, by treating the unlabelled data as noisy negative examples, we could automatically label a group positive and negative examples whose labels are identical to the ones assigned by a Bayesian optimal classifier with a consistency guarantee. The relabelled examples have a biased domain, which is remedied by the kernel mean matching technique. The proposed algorithm is model-free and thus do not have any parameters to tune. Experimental results demonstrate that our method works well on both generated and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2017

Learning with Bounded Instance- and Label-dependent Label Noise

Instance- and label-dependent label noise (ILN) is widely existed in rea...
research
02/24/2020

Learning from Positive and Unlabeled Data with Arbitrary Positive Shift

Positive-unlabeled (PU) learning trains a binary classifier using only p...
research
10/25/2021

Instance-Dependent Partial Label Learning

Partial label learning (PLL) is a typical weakly supervised learning pro...
research
12/13/2018

Local Probabilistic Model for Bayesian Classification: a Generalized Local Classification Model

In Bayesian classification, it is important to establish a probabilistic...
research
09/28/2020

Learning Classifiers under Delayed Feedback with a Time Window Assumption

We consider training a binary classifier under delayed feedback (DF Lear...
research
05/18/2022

Dependent Latent Class Models

Latent Class Models (LCMs) are used to cluster multivariate categorical ...
research
10/26/2020

Learning from Label Proportions by Optimizing Cluster Model Selection

In a supervised learning scenario, we learn a mapping from input to outp...

Please sign up or login with your details

Forgot password? Click here to reset