LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search

12/22/2019
by   Liang Huang, et al.
0

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative O(n^3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100,000nt).

READ FULL TEXT
research
12/31/2019

LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities

RNA secondary structure prediction is widely used to understand RNA func...
research
09/27/2011

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Multiple sequence alignment (MSA) is a ubiquitous problem in computation...
research
07/18/2023

LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs

The classical Sankoff algorithm for the simultaneous folding and alignme...
research
05/03/2007

Multiresolution Approximation of Polygonal Curves in Linear Complexity

We propose a new algorithm to the problem of polygonal curve approximati...
research
06/29/2022

LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments

Predicting the consensus structure of a set of aligned RNA homologs is a...
research
10/26/2022

LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules

Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA...
research
06/22/2021

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Kernel segmentation aims at partitioning a data sequence into several no...

Please sign up or login with your details

Forgot password? Click here to reset