Recovering True Classifier Performance in Positive-Unlabeled Learning

02/02/2017
by   Shantanu Jain, et al.
0

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision-recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the proportion of noise, we experimentally evaluate two correction approaches and demonstrate their efficacy on real-life data.

READ FULL TEXT

page 2

page 11

research
03/08/2021

A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels

Positive-unlabeled learning refers to the process of training a binary c...
research
11/01/2021

Mixture Proportion Estimation and PU Learning: A Modern Approach

Given only positive examples and unlabeled examples (from both positive ...
research
04/26/2015

Assessing binary classifiers using only positive and unlabeled data

Assessing the performance of a learned model is a crucial part of machin...
research
01/08/2016

Nonparametric semi-supervised learning of class proportions

The problem of developing binary classifiers from positive and unlabeled...
research
10/27/2022

Learning One-Class Hyperspectral Classifier from Positive and Unlabeled Data for Low Proportion Target

Hyperspectral imagery (HSI) one-class classification is aimed at identif...
research
02/18/2020

Hierarchical Classification of Enzyme Promiscuity Using Positive, Unlabeled, and Hard Negative Examples

Despite significant progress in sequencing technology, there are many ce...
research
03/02/2021

Botcha: Detecting Malicious Non-Human Traffic in the Wild

Malicious bots make up about a quarter of all traffic on the web, and de...

Please sign up or login with your details

Forgot password? Click here to reset