Generalized Matrix Factorization

10/06/2020
by   Łukasz Kidziński, et al.
13

Unmeasured or latent variables are often the cause of correlations between multivariate measurements and are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVM) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to such high-volume, high-dimensional datasets. We approximate the likelihood using penalized quasi-likelihood and use a Newton method and Fisher scoring to learn the model parameters. Our method greatly reduces the computation time and can be easily parallelized, enabling factorization at unprecedented scale using commodity hardware. We illustrate application of our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit, finding that most of the variability can be explained with a handful of factors.

READ FULL TEXT
research
08/31/2018

Generalized probabilistic principal component analysis of correlated data

Principal component analysis (PCA) is a well-established tool in machine...
research
04/29/2014

High Dimensional Semiparametric Latent Graphical Model for Mixed Data

Graphical models are commonly used tools for modeling multivariate rando...
research
05/14/2023

Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

Modern biomedical datasets are increasingly high dimensional and exhibit...
research
06/28/2019

Cross-product Penalized Component Analysis (XCAN)

Matrix factorization methods are extensively employed to understand comp...
research
07/27/2019

A Matrix–free Likelihood Method for Exploratory Factor Analysis of High-dimensional Gaussian Data

This paper proposes a novel profile likelihood method for estimating the...
research
06/03/2021

Bayesian inference on high-dimensional multivariate binary data

It has become increasingly common to collect high-dimensional binary dat...

Please sign up or login with your details

Forgot password? Click here to reset