Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

03/17/2020
by   Kun Chen, et al.
0

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2020

Generalized Co-sparse Factor Regression

Multivariate regression techniques are commonly applied to explore the a...
research
04/26/2017

SOFAR: large-scale association network learning

Many modern big data applications feature large scale in both numbers of...
research
02/25/2015

Sparse Multivariate Factor Regression

We consider the problem of multivariate regression in a setting where th...
research
11/03/2018

Boosted Sparse and Low-Rank Tensor Regression

We propose a sparse and low-rank tensor regression model to relate a uni...
research
07/05/2011

Distributed Matrix Completion and Robust Factorization

If learning methods are to scale to the massive sizes of modern datasets...
research
05/11/2020

Scalable Interpretable Learning for Multi-Response Error-in-Variables Regression

Corrupted data sets containing noisy or missing observations are prevale...
research
04/02/2018

A Fast Divide-and-Conquer Sparse Cox Regression

We propose a computationally and statistically efficient divide-and-conq...

Please sign up or login with your details

Forgot password? Click here to reset