Efficient and consistent inference of ancestral sequences in an evolutionary model with insertions and deletions under dense taxon sampling

07/18/2017
by   Wai-Tong Fan, et al.
0

In evolutionary biology, the speciation history of living organisms is represented graphically by a phylogeny, that is, a rooted tree whose leaves correspond to current species and branchings indicate past speciation events. Phylogenies are commonly estimated from molecular sequences, such as DNA sequences, collected from the species of interest. At a high level, the idea behind this inference is simple: the further apart in the Tree of Life are two species, the greater is the number of mutations to have accumulated in their genomes since their most recent common ancestor. In order to obtain accurate estimates in phylogenetic analyses, it is standard practice to employ statistical approaches based on stochastic models of sequence evolution on a tree. For tractability, such models necessarily make simplifying assumptions about the evolutionary mechanisms involved. In particular, commonly omitted are insertions and deletions of nucleotides -- also known as indels. Properly accounting for indels in statistical phylogenetic analyses remains a major challenge in computational evolutionary biology. Here we consider the problem of reconstructing ancestral sequences on a known phylogeny in a model of sequence evolution incorporating nucleotide substitutions, insertions and deletions, specifically the classical TKF91 process. We focus on the case of dense phylogenies of bounded height, which we refer to as the taxon-rich setting, where statistical consistency is achievable. We give the first polynomial-time ancestral reconstruction algorithm with provable guarantees under constant rates of mutation. Our algorithm succeeds when the phylogeny satisfies the "big bang" condition, a necessary and sufficient condition for statistical consistency in this context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2022

Pairwise sequence alignment at arbitrarily large evolutionary distance

Ancestral sequence reconstruction is a key task in computational biology...
research
04/28/2014

Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method

We consider the problem of estimating the evolutionary history of a set ...
research
05/13/2019

Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

Given a set of species whose evolution is represented by a species tree,...
research
10/05/2020

On the Identifiability of Phylogenetic Networks under a Pseudolikelihood model

The Tree of Life is the graphical structure that represents the evolutio...
research
01/07/2021

The Geometry of the space of Discrete Coalescent Trees

Computational inference of dated evolutionary histories relies upon vari...
research
06/15/2022

Reconstructing Ultrametric Trees from Noisy Experiments

The problem of reconstructing evolutionary trees or phylogenies is of gr...
research
08/25/2018

Ranked Schröder Trees

In biology, a phylogenetic tree is a tool to represent the evolutionary ...

Please sign up or login with your details

Forgot password? Click here to reset