Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

04/20/2020
by   Yong He, et al.
0

Microbial communities analysis is drawing growing attention due to the rapid development of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even may be leptokurtic, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix, which is approximately indistinguishable from the real basis covariance matrix as the dimensionality tends to infinity. The procedure can be viewed as adaptively thresholding the Median-of-Means estimator for the centered log-ratio covariance matrix. We derive the rates of convergence under the spectral norm and element-wise ℓ_∞ norm. In addition, we also provide theoretical guarantees on support recovery. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2018

Ensemble Estimation of Large Sparse Covariance Matrix Based on Modified Cholesky Decomposition

Estimation of large sparse covariance matrices is of great importance fo...
research
09/10/2021

Principal component analysis for high-dimensional compositional data

Dimension reduction for high-dimensional compositional data plays an imp...
research
09/13/2023

CARE: Large Precision Matrix Estimation for Compositional Data

High-dimensional compositional data are prevalent in many applications. ...
research
06/15/2021

Multi-sample estimation of centered log-ratio matrix in microbiome studies

In microbiome studies, one of the ways of studying bacterial abundances ...
research
03/27/2023

Cross-study analyses of microbial abundance using generalized common factor methods

By creating networks of biochemical pathways, communities of micro-organ...
research
01/11/2022

Estimation and Inference with Proxy Data and its Genetic Applications

Existing high-dimensional statistical methods are largely established fo...
research
12/19/2018

Optimal covariance matrix estimation for high-dimensional noise in high-frequency data

In this paper, we consider efficiently learning the structural informati...

Please sign up or login with your details

Forgot password? Click here to reset