Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

10/10/2017
by   Brian L. Claggett, et al.
0

Background. Emerging technologies now allow for mass spectrometry based profiling of up to thousands of small molecule metabolites (metabolomics) in an increasing number of biosamples. While offering great promise for revealing insight into the pathogenesis of human disease, standard approaches have yet to be established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes including disease outcomes. To determine optimal statistical approaches for metabolomics analysis, we sought to formally compare traditional statistical as well as newer statistical learning methods across a range of metabolomics dataset types. Results. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observed that with an increasing number of study subjects, univariate compared to multivariate methods resulted in a higher false discovery rate due to substantial correlations among metabolites. In scenarios wherein the number of assayed metabolites increases, as in the application of nontargeted versus targeted metabolomics measures, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. Conclusion. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small sized cohorts, sparse multivariate models exhibited the most robust statistical power with more consistent results. These findings have important implications for the analysis of metabolomics studies of human disease.

READ FULL TEXT

page 24

page 25

page 27

research
11/01/2017

Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

Microorganisms play critical roles in human health and disease. It is we...
research
10/10/2017

Statistical Methods and Workflow for Analyzing Human Metabolomics Data

High-throughput metabolomics investigations, when conducted in large hum...
research
11/18/2018

A Tracy-Widom Empirical Estimator For Valid P-values With High-Dimensional Datasets

Recent technological advances in many domains including both genomics an...
research
10/20/2021

Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements, Sparse Coding, and Correntropy

Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD...
research
03/29/2022

Towards Filling the Gaps around Recurrent Events in High-Dimensional Framework: Literature Review and Early Comparison

Background Study individuals may face repeated events overtime. However,...
research
09/07/2014

Multiscale statistical testing for connectome-wide association studies in fMRI

Alterations in brain connectivity have been associated with a variety of...
research
05/25/2020

Statistical Analysis of Data Repeatability Measures

The advent of modern data collection and processing techniques has seen ...

Please sign up or login with your details

Forgot password? Click here to reset