Adaptive Randomized Dimension Reduction on Massive Data

by   Gregory Darnell, et al.
Duke University
Princeton University
Stanford University

The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide an efficient solution to principal component analysis (PCA), and we use this efficient solver to improve parameter estimation in large-scale linear mixed models (LMM) for association mapping in statistical and quantitative genomics. A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance by implicitly regularizing the covariance matrix estimate of the random effect in a LMM. These statistical and computational advantages are highlighted in our experiments on simulated data and large-scale genomic studies.


page 13

page 15

page 16


Randomized Dimension Reduction on Massive Data

Scalability of statistical estimators is of increasing importance in mod...

Theory of Dual-sparse Regularized Randomized Reduction

In this paper, we study randomized reduction methods, which reduce high-...

Poisson PCA for matrix count data

We develop a dimension reduction framework for data consisting of matric...

A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data

This paper proposes a supervised dimension reduction methodology for ten...

Spectral estimation from simulations via sketching

Sketching is a stochastic dimension reduction method that preserves geom...

Scalable Algorithms for Learning High-Dimensional Linear Mixed Models

Linear mixed models (LMMs) are used extensively to model dependecies of ...

Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures

This article introduces a novel structured random matrix composed blockw...

Please sign up or login with your details

Forgot password? Click here to reset