Evolutionary distances in the twilight zone -- a rational kernel approach

11/23/2010
by   Roland F. Schwarz, et al.
0

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.

READ FULL TEXT

page 11

page 12

research
07/25/2022

Pairwise sequence alignment at arbitrarily large evolutionary distance

Ancestral sequence reconstruction is a key task in computational biology...
research
09/26/2016

Robust Time-Series Retrieval Using Probabilistic Adaptive Segmental Alignment

Traditional pairwise sequence alignment is based on matching individual ...
research
05/30/2018

A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

Evolutionary modeling applications are the best way to provide full info...
research
08/30/2017

Optimizing scoring function of dynamic programming of pairwise profile alignment using derivative free neural network

A profile comparison method with position-specific scoring matrix (PSSM)...
research
12/12/2017

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

Sequence classification algorithms, such as SVM, require a definition of...
research
11/20/2001

The similarity metric

A new class of distances appropriate for measuring similarity relations ...
research
07/19/2018

The colored longest common prefix array computed via sequential scans

Due to the increased availability of large datasets of biological sequen...

Please sign up or login with your details

Forgot password? Click here to reset