Integrative Factorization of Bidimensionally Linked Matrices

06/09/2019
by   Jun Young Park, et al.
0

Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (mRNA and miRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from TCGA. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

Bidimensional linked matrix factorization for pan-omics pan-cancer analysis

Several modern applications require the integration of multiple large da...
research
07/20/2017

Structural Learning and Integrative Decomposition of Multi-View Data

The increased availability of the multi-view data (data on the same samp...
research
08/30/2023

Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis

Statistical approaches that successfully combine multiple datasets are m...
research
02/19/2021

A Higher-Order Generalized Singular Value Decomposition for Rank Deficient Matrices

The higher-order generalized singular value decomposition (HO-GSVD) is a...
research
06/26/2022

Hierarchical nuclear norm penalization for multi-view data

The prevalence of data collected on the same set of samples from multipl...
research
03/03/2021

PIntMF: Penalized Integrative Matrix Factorization Method for Multi-Omics Data

It is more and more common to explore the genome at diverse levels and n...
research
12/06/2021

Bayesian Structural Equation Modeling in Multiple Omics Data Integration with Application to Circadian Genes

It is well known that the integration among different data-sources is re...

Please sign up or login with your details

Forgot password? Click here to reset