A Guideline for the Statistical Analysis of Compositional Data in Immunology
The study of immune cellular composition is of great scientific interest in immunology and multiple large-scale data have also been generated recently to support this investigation. From the statistical point of view, such immune cellular composition data corresponds to compositional data that conveys relative information. In compositional data, each element is positive and all the elements together sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations among elements in the compositional data. As this type of data has become more widely available, investigation of optimal statistical strategies considering compositional features in data became more in great need. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio and Dirichlet approaches, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.
READ FULL TEXT