Prediction in the presence of response-dependent missing labels

03/25/2021
by   Hyebin Song, et al.
0

In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and occurrence of events. We provide conditions under which our model is identifiable and prove that even though our approach leads to a non-convex objective, any local minimizer has optimal statistical error (up to a log term) and projected gradient descent has geometric convergence rates. We demonstrate on both synthetic data and a California wildfire dataset that our method out-performs existing state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2022

Learning From Positive and Unlabeled Data Using Observer-GAN

The problem of learning from positive and unlabeled data (A.K.A. PU lear...
research
04/18/2023

High-dimensional Multi-class Classification with Presence-only Data

Classification with positive and unlabeled (PU) data frequently arises i...
research
06/07/2016

Regret Bounds for Non-decomposable Metrics with Missing Labels

We consider the problem of recommending relevant labels (items) for a gi...
research
04/14/2021

Fast quantum state reconstruction via accelerated non-convex programming

We propose a new quantum state reconstruction method that combines ideas...
research
03/19/2022

Font Generation with Missing Impression Labels

Our goal is to generate fonts with specific impressions, by training a g...
research
01/31/2022

Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection

Pseudo-labeling solutions for positive-unlabeled (PU) learning have the ...
research
06/24/2020

Labeled Optimal Partitioning

In data sequences measured over space or time, an important problem is a...

Please sign up or login with your details

Forgot password? Click here to reset