Regression and Classification of Compositional Data via a novel Supervised Log Ratio Method

03/31/2023
by   Jing Ma, et al.
0

Compositional data in which only the relative abundances of variables are measured are ubiquitous. In the context of health and medical compositional data, an important class of biomarkers is the log ratios between groups of variables. However, selecting log ratios that are predictive of a response variable is a combinatorial problem. Existing greedy-search based methods are time-consuming, which hinders their application to high-dimensional data sets. We propose a novel selection approach called the supervised log ratio method that can efficiently select predictive log ratios in high-dimensional settings. The proposed method is motivated by a latent variable model and we show that the log ratio biomarker can be selected via simple clustering after supervised feature screening. The supervised log ratio method is implemented in an R package, which is publicly available at <https://github.com/drjingma/slr>. We illustrate the merits of our approach through simulation studies and analysis of a microbiome data set on HIV infection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2020

A likelihood-based approach for multivariate categorical response regression in high dimensions

We propose a penalized likelihood method to fit the bivariate categorica...
research
11/28/2018

High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

In microbiome and genomic study, the regression of compositional data ha...
research
11/03/2022

Principal Balances of Compositional Data for Regression and Classification using Partial Least Squares

High-dimensional compositional data are commonplace in the modern omics ...
research
05/15/2022

Supervised Learning and Model Analysis with Compositional Data

The compositionality and sparsity of high-throughput sequencing data pos...
research
03/04/2019

Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications

Compositional data sets are ubiquitous in science, including geology, ec...
research
09/11/2020

TCA and TLRA: A comparison on contingency tables and compositional data

There are two popular general approaches for the analysis and visualizat...
research
12/16/2022

Penalised regression with multiple sources of prior effects

In many high-dimensional prediction or classification tasks, complementa...

Please sign up or login with your details

Forgot password? Click here to reset