Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

05/13/2019
by   Cedric Chauve, et al.
0

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including - but not limited to - speciation, gene duplication, gene loss, horizontal gene transfer. The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the DLT-model). We introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the DLT-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and the complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfer induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the DLT-model is almost independent of the species tree topology.

READ FULL TEXT
research
04/24/2023

The Theory of Gene Family Histories

Most genes are part of larger families of evolutionary related genes. Th...
research
01/01/2023

Inferring multiple consensus trees and supertrees using clustering: a review

Phylogenetic trees (i.e. evolutionary trees, additive trees or X-trees) ...
research
12/04/2019

A New Paradigm for Identifying Reconciliation-Scenario Altering Mutations Conferring Environmental Adaptation

An important goal in microbial computational genomics is to identify cru...
research
08/25/2018

Ranked Schröder Trees

In biology, a phylogenetic tree is a tool to represent the evolutionary ...
research
12/05/2022

Relative Timing Information and Orthology in Evolutionary Scenarios

Evolutionary scenarios describing the evolution of a family of genes wit...
research
05/08/2022

Assigning Species Information to Corresponding Genes by a Sequence Labeling Framework

The automatic assignment of species information to the corresponding gen...
research
07/18/2017

Efficient and consistent inference of ancestral sequences in an evolutionary model with insertions and deletions under dense taxon sampling

In evolutionary biology, the speciation history of living organisms is r...

Please sign up or login with your details

Forgot password? Click here to reset