Phylogenomics with Paralogs

12/18/2017
by   Marc Hellmuth, et al.
0

Phylogenomics heavily relies on well-curated sequence data sets that consist, for each gene, exclusively of 1:1-orthologous. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise in order to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. While the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer. We demonstrate that the distribution of paralogs in large gene families contains in itself sufficient phylogenetic signal to infer fully resolved species phylogenies. This source of phylogenetic information is independent of information contained in orthologous sequences and is resilient against horizontal gene transfer. An important consequence is that phylogenomics data sets need not be restricted to 1:1 orthologs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

Reconstruction of time-consistent species trees

The history of gene families – which are equivalent to event-labeled gen...
research
04/24/2023

The Theory of Gene Family Histories

Most genes are part of larger families of evolutionary related genes. Th...
research
12/19/2019

Reconstruction of Gene Regulatory Networks usingMultiple Datasets

Motivation: Laboratory gene regulatory data for a species are sporadic. ...
research
04/19/2019

Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

Microbial clades modeling is a challenging problem in biology based on m...
research
08/03/2021

Identifiability of species network topologies from genomic sequences using the logDet distance

Inference of network-like evolutionary relationships between species fro...
research
11/01/2017

Partial Orthology, Paralogy and Xenology Relations - Satisfiability in terms of Di-Cographs

A variety of methods based on sequence similarity, reconciliation, synte...
research
07/30/2023

Redundancy-aware unsupervised rankings for collections of gene sets

The biological roles of gene sets are used to group them into collection...

Please sign up or login with your details

Forgot password? Click here to reset