Optimal Sequence Length Requirements for Phylogenetic Tree Reconstruction with Indels

11/02/2018
by   Arun Ganesh, et al.
0

We consider the phylogenetic tree reconstruction problem with insertions and deletions (indels). Phylogenetic algorithms proceed under a model where sequences evolve down the model tree, and given sequences at the leaves, the problem is to reconstruct the model tree with high probability. Traditionally, sequences mutate by substitution-only processes, although some recent work considers evolutionary processes with insertions and deletions. In this paper, we improve on previous work by giving a reconstruction algorithm that simultaneously has O(poly n) sequence length and tolerates constant indel probabilities on each edge. Our recursively-reconstructed distance-based technique provably outputs the model tree when the model tree has O(poly n) diameter and discretized branch lengths, allowing for the probability of insertion and deletion to be non-uniform and asymmetric on each edge. Our polylogarithmic sequence length bounds improve significantly over previous polynomial sequence length bounds and match sequence length bounds in the substitution-only models of phylogenetic evolution, thereby challenging the idea that many global misalignments caused by insertions and deletions when p_indel is large are a fundamental obstruction to reconstruction with short sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2021

Ancestral state reconstruction with large numbers of sequences and edge-length estimation

Likelihood-based methods are widely considered the best approaches for r...
research
02/02/2021

Tree trace reconstruction using subtraces

Tree trace reconstruction aims to learn the binary node labels of a tree...
research
05/10/2023

Fundamental Limits of Multiple Sequence Reconstruction from Substrings

The problem of reconstructing a sequence from the set of its length-k su...
research
06/15/2022

Reconstructing Ultrametric Trees from Noisy Experiments

The problem of reconstructing evolutionary trees or phylogenies is of gr...
research
10/27/2020

Impossibility of phylogeny reconstruction from k-mer counts

We consider phylogeny estimation under a two-state model of sequence evo...
research
10/16/2018

Constructing sparse Davenport-Schinzel sequences by hypergraph edge coloring

A sequence is called r-sparse if every contiguous subsequence of length ...
research
07/25/2022

Pairwise sequence alignment at arbitrarily large evolutionary distance

Ancestral sequence reconstruction is a key task in computational biology...

Please sign up or login with your details

Forgot password? Click here to reset