Log In Sign Up

Generalized Matrix Factorization

by   Łukasz Kidziński, et al.

Unmeasured or latent variables are often the cause of correlations between multivariate measurements and are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVM) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to such high-volume, high-dimensional datasets. We approximate the likelihood using penalized quasi-likelihood and use a Newton method and Fisher scoring to learn the model parameters. Our method greatly reduces the computation time and can be easily parallelized, enabling factorization at unprecedented scale using commodity hardware. We illustrate application of our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit, finding that most of the variability can be explained with a handful of factors.


Generalized probabilistic principal component analysis of correlated data

Principal component analysis (PCA) is a well-established tool in machine...

High Dimensional Semiparametric Latent Graphical Model for Mixed Data

Graphical models are commonly used tools for modeling multivariate rando...

Cross-product Penalized Component Analysis (XCAN)

Matrix factorization methods are extensively employed to understand comp...

A Matrix–free Likelihood Method for Exploratory Factor Analysis of High-dimensional Gaussian Data

This paper proposes a novel profile likelihood method for estimating the...

Statistical inference in factor analysis for diffusion processes from discrete observations

We consider statistical inference in factor analysis for ergodic and non...

Bayesian inference on high-dimensional multivariate binary data

It has become increasingly common to collect high-dimensional binary dat...