Adaptive Randomized Dimension Reduction on Massive Data

04/13/2015
by   Gregory Darnell, et al.
0

The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide an efficient solution to principal component analysis (PCA), and we use this efficient solver to improve parameter estimation in large-scale linear mixed models (LMM) for association mapping in statistical and quantitative genomics. A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance by implicitly regularizing the covariance matrix estimate of the random effect in a LMM. These statistical and computational advantages are highlighted in our experiments on simulated data and large-scale genomic studies.

READ FULL TEXT

page 13

page 15

page 16

research
11/07/2012

Randomized Dimension Reduction on Massive Data

Scalability of statistical estimators is of increasing importance in mod...
research
04/15/2015

Theory of Dual-sparse Regularized Randomized Reduction

In this paper, we study randomized reduction methods, which reduce high-...
research
10/27/2021

Poisson PCA for matrix count data

We develop a dimension reduction framework for data consisting of matric...
research
07/22/2022

A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data

This paper proposes a supervised dimension reduction methodology for ten...
research
07/21/2020

Spectral estimation from simulations via sketching

Sketching is a stochastic dimension reduction method that preserves geom...
research
03/12/2018

Scalable Algorithms for Learning High-Dimensional Linear Mixed Models

Linear mixed models (LMMs) are used extensively to model dependecies of ...
research
10/20/2022

Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures

This article introduces a novel structured random matrix composed blockw...

Please sign up or login with your details

Forgot password? Click here to reset