Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

by   Kun Chen, et al.

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.



There are no comments yet.


page 1

page 2

page 3

page 4


Generalized Co-sparse Factor Regression

Multivariate regression techniques are commonly applied to explore the a...

SOFAR: large-scale association network learning

Many modern big data applications feature large scale in both numbers of...

Sparse Multivariate Factor Regression

We consider the problem of multivariate regression in a setting where th...

Boosted Sparse and Low-Rank Tensor Regression

We propose a sparse and low-rank tensor regression model to relate a uni...

Distributed Matrix Completion and Robust Factorization

If learning methods are to scale to the massive sizes of modern datasets...

Scalable Interpretable Learning for Multi-Response Error-in-Variables Regression

Corrupted data sets containing noisy or missing observations are prevale...

Generalized infinite factorization models

Factorization models express a statistical object of interest in terms o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.