Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision

09/06/2022
by   Emilio Dorigatti, et al.
7

Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In this paper, we present ImPULSeS, a unified representation learning framework for Imbalanced Positive Unlabeled Learning leveraging Self-Supervised debiase pre-training. ImPULSeS uses a generic combination of large-scale unsupervised learning with debiased contrastive loss and additional reweighted PU loss. We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art, even compared with previous methods that are given the true prior. Moreover, our method showed increased robustness to prior misspecification and superior performance even when pretraining was performed on an unrelated dataset. We anticipate such robustness and efficiency will make it much easier for practitioners to obtain excellent results on other PU datasets of interest. The source code is available at <https://github.com/JSchweisthal/ImPULSeS>

READ FULL TEXT
research
06/06/2021

Self-Damaging Contrastive Learning

The recent breakthrough achieved by contrastive learning accelerates the...
research
03/25/2021

Rethinking Self-Supervised Learning: Small is Beautiful

Self-supervised learning (SSL), in particular contrastive learning, has ...
research
10/13/2022

The Hidden Uniform Cluster Prior in Self-Supervised Learning

A successful paradigm in representation learning is to perform self-supe...
research
09/17/2021

Self-Supervised Neural Architecture Search for Imbalanced Datasets

Neural Architecture Search (NAS) provides state-of-the-art results when ...
research
07/31/2023

Visual Geo-localization with Self-supervised Representation Learning

Visual Geo-localization (VG) has emerged as a significant research area,...
research
07/27/2023

Federated Model Aggregation via Self-Supervised Priors for Highly Imbalanced Medical Image Classification

In the medical field, federated learning commonly deals with highly imba...
research
01/20/2022

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...

Please sign up or login with your details

Forgot password? Click here to reset