Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

05/20/2022
by   Elliott Gordon-Rodriguez, et al.
0

Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease. In addition, our data augmentations enable us to define a novel contrastive learning model, which improves on previous representation learning approaches for microbiome compositional data. Our code is available at https://github.com/cunningham-lab/AugCoDa.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction

Contrastive learning has been used to learn a high-quality representatio...
research
05/22/2023

Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

Data augmentation methods have played an important role in the recent ad...
research
10/16/2021

Virtual Augmentation Supported Contrastive Learning of Sentence Representations

Despite profound successes, contrastive representation learning relies o...
research
03/16/2018

A Kernel Theory of Modern Data Augmentation

Data augmentation, a technique in which a training set is expanded with ...
research
10/21/2022

Exploring Representation-Level Augmentation for Code Search

Code search, which aims at retrieving the most relevant code fragment fo...
research
05/21/2023

Contrastive Learning with Logic-driven Data Augmentation for Logical Reasoning over Text

Pre-trained large language model (LLM) is under exploration to perform N...
research
05/27/2023

Toward Understanding Generative Data Augmentation

Generative data augmentation, which scales datasets by obtaining fake la...

Please sign up or login with your details

Forgot password? Click here to reset