Efficient posterior sampling for high-dimensional imbalanced logistic regression

05/27/2019
by   Deborshee Sen, et al.
0

High-dimensional data are routinely collected in many application areas. In this article, we are particularly interested in classification models in which one or more variables are imbalanced. This creates difficulties in estimation. To improve performance, one can apply a Bayesian approach with Markov chain Monte Carlo algorithms used for posterior computation. However, current algorithms can be inefficient as n and/or p increase due to worsening time per step and mixing rates. One promising strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize recent piece-wise deterministic Markov chain Monte Carlo algorithms to include stratified and importance-weighted sub-sampling. We also propose a new sub-sampling algorithm based on sorting data-points. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2022

SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

Divide-and-conquer strategies for Monte Carlo algorithms are an increasi...
research
04/08/2020

Posterior computation with the Gibbs zig-zag sampler

Markov chain Monte Carlo (MCMC) sampling algorithms have dominated the l...
research
12/08/2020

Robust Sparse Bayesian Infinite Factor Models

Most of previous works and applications of Bayesian factor model have as...
research
12/17/2020

A fresh take on 'Barker dynamics' for MCMC

We study a recently introduced gradient-based Markov chain Monte Carlo m...
research
01/01/2021

A Two Stage Adaptive Metropolis Algorithm

We propose a new sampling algorithm combining two quite powerful ideas i...
research
12/06/2017

Targeted Random Projection for Prediction from High-Dimensional Features

We consider the problem of computationally-efficient prediction from hig...
research
01/04/2022

A Statistical Approach to Estimating Adsorption-Isotherm Parameters in Gradient-Elution Preparative Liquid Chromatography

Determining the adsorption isotherms is an issue of significant importan...

Please sign up or login with your details

Forgot password? Click here to reset