An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

09/27/2011
by   S. Schroedl, et al.
0

Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschbergs algorithm, Dynamic Programming needs O(kN^(k-1)) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Closed list. In this article, we present an algorithm that attempts at combining the respective advantages; like A* it uses a heuristic for pruning the search space, but reduces both the maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The underlying idea is to conduct a series of searches with successively increasing upper bounds, but using the DP ordering as the key for the Open priority queue. With a suitable choice of thresholds, in practice, a running time below four times that of A* can be expected. In our experiments we show that our algorithm outperforms one of the currently most successful algorithms for optimal multiple sequence alignments, Partial Expansion A*, both in time and memory. Moreover, we apply a refined heuristic based on optimal alignments not only of pairs of sequences, but of larger subsets. This idea is not new; however, to make it practically relevant we show that it is equally important to bound the heuristic computation appropriately, or the overhead can obliterate any possible gain. Furthermore, we discuss a number of improvements in time and space efficiency with regard to practical implementations. Our algorithm, used in conjunction with higher-dimensional heuristics, is able to calculate for the first time the optimal alignment for almost all of the problems in Reference 1 of the benchmark database BAliBASE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2011

Anytime Heuristic Search

We describe how to convert the heuristic search algorithm A* into an any...
research
12/22/2019

LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search

Motivation: Predicting the secondary structure of an RNA sequence is use...
research
04/29/2023

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)

Sequence alignment is common nowadays as it is used in many fields to de...
research
11/22/2022

Branch-and-Bound with Barrier: Dominance and Suboptimality Detection for DD-Based Branch-and-Bound

The branch-and-bound algorithm based on decision diagrams introduced by ...
research
03/16/2018

Heuristics for vehicle routing problems: Sequence or set optimization?

We investigate a structural decomposition for the capacitated vehicle ro...
research
11/24/2014

A Greedy, Flexible Algorithm to Learn an Optimal Bayesian Network Structure

In this report paper we first present a report of the Advanced Machine L...
research
08/30/2019

Comparative study of performance of parallel Alpha Beta Pruning for different architectures

Optimization of searching the best possible action depending on various ...

Please sign up or login with your details

Forgot password? Click here to reset