RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data
With increasing availability of high dimensional, multi-source data, the identification of joint and data specific patterns of variability has become a subject of interest in many research areas. Several matrix decomposition methods have been formulated for this purpose, for example JIVE (Joint and Individual Variation Explained), and its angle based variation, aJIVE. Although the effect of data contamination on the estimated joint and individual components has not been considered in the literature, gross errors and outliers in the data can cause instability in such methods, and lead to incorrect estimation of joint and individual variance components. We focus on the aJIVE factorization method and provide a thorough analysis of the effect outliers on the resulting variation decomposition. After showing that such effect is not negligible when all data-sources are contaminated, we propose a robust extension of aJIVE (RaJIVE) that integrates a robust formulation of the singular value decomposition into the aJIVE approach. The proposed RaJIVE is shown to provide correct decompositions even in the presence of outliers and improves the performance of aJIVE. We use extensive simulation studies with different levels of data contamination to compare the two methods. Finally, we describe an application of RaJIVE to a multi-omics breast cancer dataset from The Cancer Genome Atlas. We provide the R package RaJIVE with a ready-to-use implementation of the methods and documentation of code and examples.
READ FULL TEXT