Designing RNA Secondary Structures is Hard

10/31/2017
by   Édouard Bonnet, et al.
0

An RNA sequence is a word over an alphabet on four elements {A,C,G,U} called bases. RNA sequences fold into secondary structures where some bases match one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice.

READ FULL TEXT
research
06/24/2021

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a...
research
11/29/2022

Construction of Multiple Constrained DNA Codes

DNA sequences are prone to creating secondary structures by folding back...
research
01/27/2023

Algorithms for ranking and unranking the combinatorial set of RNA secondary structures

In this paper, we study the combinatorial set of RNA secondary structure...
research
02/13/2020

RNA Secondary Structure Prediction By Learning Unrolled Algorithms

In this paper, we propose an end-to-end deep learning model, called E2Ef...
research
11/12/2021

Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...
research
10/26/2022

LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules

Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA...
research
02/22/2023

RNA secondary structures: from ab initio prediction to better compression, and back

In this paper, we use the biological domain knowledge incorporated into ...

Please sign up or login with your details

Forgot password? Click here to reset