Predicting Rare Events by Shrinking Towards Proportional Odds

05/30/2023
by   Gregory Faletto, et al.
0

Training classifiers is difficult with severe class imbalance, but many rare events are the culmination of a sequence with much more common intermediate outcomes. For example, in online marketing a user first sees an ad, then may click on it, and finally may make a purchase; estimating the probability of purchases is difficult because of their rarity. We show both theoretically and through data experiments that the more abundant data in earlier steps may be leveraged to improve estimation of probabilities of rare events. We present PRESTO, a relaxation of the proportional odds model for ordinal regression. Instead of estimating weights for one separating hyperplane that is shifted by separate intercepts for each of the estimated Bayes decision boundaries between adjacent pairs of categorical responses, we estimate separate weights for each of these transitions. We impose an L1 penalty on the differences between weights for the same feature in adjacent weight vectors in order to shrink towards the proportional odds model. We prove that PRESTO consistently estimates the decision boundary weights under a sparsity assumption. Synthetic and real data experiments show that our method can estimate rare probabilities in this setting better than both logistic regression on the rare category, which fails to borrow strength from more abundant categories, and the proportional odds model, which is too inflexible.

READ FULL TEXT
research
04/19/2021

Risk prediction models for discrete ordinal outcomes: calibration and the impact of the proportional odds assumption

Calibration is a vital aspect of the performance of risk prediction mode...
research
06/01/2020

Logistic Regression for Massive Data with Rare Events

This paper studies binary logistic regression for rare events data, or i...
research
05/11/2020

Analysis and Simulation of Extremes and Rare Events in Complex Systems

Rare weather and climate events, such as heat waves and floods, can brin...
research
10/04/2021

When can relative risks provide causal estimates?

It is emphasised that for epidemiological studies where disease incidenc...
research
04/05/2023

Distributed Logistic Regression for Massive Data with Rare Events

Large-scale rare events data are commonly encountered in practice. To ta...
research
06/16/2013

Local case-control sampling: Efficient subsampling in imbalanced data sets

For classification problems with significant class imbalance, subsamplin...
research
08/08/2018

A Method for Estimating the Probability of Extremely Rare Accidents in Complex Systems

Estimating the probability of failures or accidents with aerospace syste...

Please sign up or login with your details

Forgot password? Click here to reset