Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning

08/10/2022
by   Siba Moussa, et al.
16

Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria. Relevant criteria may be, for example, the presence of specific folding motifs, binding to molecular ligands, sensing properties, etc. Most practical approaches to aptamer design identify a small set of promising candidate sequences using high-throughput experiments (e.g. SELEX), and then optimize performance by introducing only minor modifications to the empirically found candidates. Sequences that possess the desired properties but differ drastically in chemical composition will add diversity to the search space and facilitate the discovery of useful nucleic acid aptamers. Systematic diversification protocols are needed. Here we propose to use an unsupervised machine learning model known as the Potts model to discover new, useful sequences with controllable sequence diversity. We start by training a Potts model using the maximum entropy principle on a small set of empirically identified sequences unified by a common feature. To generate new candidate sequences with a controllable degree of diversity, we take advantage of the model's spectral feature: an energy bandgap separating sequences that are similar to the training set from those that are distinct. By controlling the Potts energy range that is sampled, we generate sequences that are distinct from the training set yet still likely to have the encoded features. To demonstrate performance, we apply our approach to design diverse pools of sequences with specified secondary structure motifs in 30-mer RNA and DNA aptamers.

READ FULL TEXT

page 6

page 7

research
05/14/2020

Thermodynamically Stable DNA Code Design using a Similarity Significance Model

DNA code design aims to generate a set of DNA sequences (codewords) with...
research
12/17/2017

Generating and designing DNA with deep generative models

We propose generative neural network methods to generate DNA sequences a...
research
11/17/2016

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Transcription factors (TFs) are macromolecules that bind to cis-regulato...
research
07/16/2021

Ranking labs-of-origin for genetically engineered DNA using Metric Learning

With the constant advancements of genetic engineering, a common concern ...
research
03/08/2020

ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning

Antibodies are capable of potently and specifically binding individual a...
research
11/18/2022

Forecasting labels under distribution-shift for machine-guided sequence design

The ability to design and optimize biological sequences with specific fu...
research
03/10/2023

Resource saving taxonomy classification with k-mer distributions and machine learning

Modern high throughput sequencing technologies like metagenomic sequenci...

Please sign up or login with your details

Forgot password? Click here to reset