Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

01/24/2022
by   Alessandro Zito, et al.
0

In ecology it has become common to apply DNA barcoding to biological samples leading to datasets containing a large number of nucleotide sequences. The focus is then on inferring the taxonomic placement of each of these sequences by leveraging on existing databases containing reference sequences having known taxa. This is highly challenging because i) sequencing is typically only available for a relatively small region of the genome due to cost considerations; ii) many of the sequences are from organisms that are either unknown to science or for which there are no reference sequences available. These issues can lead to substantial classification uncertainty, particularly in inferring new taxa. To address these challenges, we propose a new class of Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow new taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly efficient algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. BayesANT is shown to have excellent performance in real data, including when many sequences in the test set belong to taxa unobserved in training.

READ FULL TEXT
research
01/13/2018

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences ...
research
05/11/2021

Constrained Consensus Sequence Algorithm for DNA Archiving

The paper describes an algorithm to compute a consensus sequence from a ...
research
10/20/2022

Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

DNA has immense potential as an emerging data storage medium. The princi...
research
04/27/2017

DNA Steganalysis Using Deep Recurrent Neural Networks

The technique of hiding messages in digital data is called a steganograp...
research
12/08/2020

AI to Identify Mosquitos

Researchers have resorted to artificial neural network (ANN) to identify...
research
12/04/2018

Bridging trees for posterior inference on Ancestral Recombination Graphs

We present a new Markov chain Monte Carlo algorithm, implemented in soft...

Please sign up or login with your details

Forgot password? Click here to reset