LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments

06/29/2022
by   Liang Zhang, et al.
0

Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has applications in SARS-CoV-2 diagnostics and therapeutics. However, the state-of-the-art algorithm for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, and even slower when analyzing many such sequences, due to a superlinear scaling with the number of homologs, taking 4 days on 200 SARS-CoV variants. We present LinearAlifold, an efficient algorithm for folding aligned RNA homologs that scales linearly with both the sequence length and the number of sequences, based on our recent work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (e.g., 0.5 hours on the above 200 sequences or 316 times speedup) and achieves comparable accuracies compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, outperforming RNAalifold. Finally, LinearAlifold supports three modes: minimum free energy (MFE), partition function, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants, while only the MFE mode of RNAalifold works for them, taking days or weeks.

READ FULL TEXT
research
12/31/2019

LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities

RNA secondary structure prediction is widely used to understand RNA func...
research
10/26/2022

LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules

Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA...
research
07/18/2023

LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs

The classical Sankoff algorithm for the simultaneous folding and alignme...
research
12/22/2019

LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search

Motivation: Predicting the secondary structure of an RNA sequence is use...
research
05/19/2020

Linear Time Construction of Indexable Founder Block Graphs

We introduce a compact pangenome representation based on an optimal segm...
research
01/13/2022

Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants

Pattern detection and string matching are fundamental problems in comput...
research
08/06/2014

MCMC for Hierarchical Semi-Markov Conditional Random Fields

Deep architecture such as hierarchical semi-Markov models is an importan...

Please sign up or login with your details

Forgot password? Click here to reset