A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models

02/13/2014
by   Marc Claesen, et al.
0

We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framework which provides an intuitive explanation for its mechanics in a semi-supervised setting. We compared our method to state-of-the-art approaches in simulations using multiple public benchmark data sets. The included benchmark comprises three settings with increasing label noise: (i) fully supervised, (ii) PU learning and (iii) PU learning with false positives. Our approach shows a marginal improvement over existing methods in the second setting and a significant improvement in the third.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2022

Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective

Positive-Unlabeled (PU) learning tries to learn binary classifiers from ...
research
08/01/2023

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

Learning from positive and unlabeled data is known as positive-unlabeled...
research
12/04/2018

A Deep Learning Framework for Semi-Supervised Cross-Modal Retrieval with Label Prediction

Due to abundance of data from multiple modalities, cross-modal retrieval...
research
03/26/2019

A method on selecting reliable samples based on fuzziness in positive and unlabeled learning

Traditional semi-supervised learning uses only labeled instances to trai...
research
10/05/2010

A bagging SVM to learn from positive and unlabeled examples

We consider the problem of learning a binary classifier from a training ...
research
09/23/2020

Using Under-trained Deep Ensembles to Learn Under Extreme Label Noise

Improper or erroneous labelling can pose a hindrance to reliable general...
research
03/02/2021

Botcha: Detecting Malicious Non-Human Traffic in the Wild

Malicious bots make up about a quarter of all traffic on the web, and de...

Please sign up or login with your details

Forgot password? Click here to reset