Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods

03/07/2018
by   Sebastien Roch, et al.
0

With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus datasets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2017

Species tree estimation using ASTRAL: how many genes are enough?

Species tree reconstruction from genomic data is increasingly performed ...
research
12/20/2018

On the variance of internode distance under the multispecies coalescent

We consider the problem of estimating species trees from unrooted gene t...
research
03/10/2019

On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model

Maximum likelihood estimators are used extensively to estimate unknown p...
research
07/13/2020

Species tree estimation under joint modeling of coalescence and duplication: sample complexity of quartet methods

We consider species tree estimation under a standard stochastic model of...
research
12/04/2019

A New Paradigm for Identifying Reconciliation-Scenario Altering Mutations Conferring Environmental Adaptation

An important goal in microbial computational genomics is to identify cru...
research
03/31/2021

Ancestral state reconstruction with large numbers of sequences and edge-length estimation

Likelihood-based methods are widely considered the best approaches for r...
research
06/05/2023

Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

Discovering evolutionary traits that are heritable across species on the...

Please sign up or login with your details

Forgot password? Click here to reset