Deep Multiple Instance Learning for Taxonomic Classification of Metagenomic read sets

09/28/2019
by   Andreas Georgiou, et al.
0

Metagenomic studies have increasingly utilized sequencing technologies in order to analyze DNA fragments found in environmental samples. It can provide useful insights for studying the interactions between hosts and microbes, infectious disease proliferation, and novel species discovery. One important step in this analysis is the taxonomic classification of those DNA fragments. Of particular interest is the determination of the distribution of the taxa of microbes in metagenomic samples. Recent attempts using deep learning focus on architectures that classify single DNA reads independently from each other. In this work, we attempt to solve the task of directly predicting the distribution over the taxa of whole metagenomic read sets. We formulate this task as a Multiple Instance Learning (MIL) problem. We extend architectures used in single-read taxonomic classification with two different types of permutation-invariant MIL pooling layers: a) deepsets and b) attention-based pooling. We illustrate that our architecture can exploit the co-occurrence of species in metagenomic read sets and outperforms the single-read architectures in predicting the distribution over the taxa at higher taxonomic ranks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2023

Embed-Search-Align: DNA Sequence Alignment using Transformer Models

DNA sequence alignment involves assigning short DNA reads to the most pr...
research
05/26/2015

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communit...
research
09/12/2021

Single-Read Reconstruction for DNA Data Storage Using Transformers

As the global need for large-scale data storage is rising exponentially,...
research
11/22/2022

eDNAPlus: A unifying modelling framework for DNA-based biodiversity monitoring

DNA-based biodiversity surveys involve collecting physical samples from ...
research
02/26/2018

AI4AI: Quantitative Methods for Classifying Host Species from Avian Influenza DNA Sequence

Avian Influenza breakouts cause millions of dollars in damage each year ...
research
05/31/2021

Sequenceable Event Recorders

With recent high-throughput technology we can synthesize large heterogen...
research
12/02/2020

Classifying bacteria clones using attention-based deep multiple instance learning interpreted by persistence homology

In this work, we analyze if it is possible to distinguish between differ...

Please sign up or login with your details

Forgot password? Click here to reset