Log In Sign Up

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

by   Brian L Trippe, et al.

Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.


Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

Bayesian inference typically requires the computation of an approximatio...

Scaling Up Bayesian Uncertainty Quantification for Inverse Problems using Deep Neural Networks

Due to the importance of uncertainty quantification (UQ), Bayesian appro...

Sparse Approximate Cross-Validation for High-Dimensional GLMs

Leave-one-out cross validation (LOOCV) can be particularly accurate amon...

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

In statistical applications, it is common to encounter parameters suppor...

Data-dependent compression of random features for large-scale kernel approximation

Kernel methods offer the flexibility to learn complex relationships in m...

Bayesian Cox Regression for Population-scale Inference in Electronic Health Records

The Cox model is an indispensable tool for time-to-event analysis, parti...

Low-rank statistical finite elements for scalable model-data synthesis

Statistical learning additions to physically derived mathematical models...