Efficient approximation of DNA hybridisation using deep learning

02/19/2021
by   David Buterez, et al.
0

Deoxyribonucleic acid (DNA) has shown great promise in enabling computational applications, most notably in the fields of DNA data storage and DNA computing. The former exploits the natural properties of DNA, such as high storage density and longevity, for the archival of digital information, while the latter aims to use the interactivity of DNA to encode computations. Recently, the two paradigms were jointly used to formulate the near-data processing concept for DNA databases, where the computations are performed directly on the stored data. The fundamental, low-level operation that DNA naturally possesses is that of hybridisation, also called annealing, of complementary sequences. Information is encoded as DNA strands, which will naturally bind in solution, thus enabling search and pattern-matching capabilities. Being able to control and predict the process of hybridisation is crucial for the ambitious future of the so-called Hybrid Molecular-Electronic Computing. Current tools are, however, limited in terms of throughput and applicability to large-scale problems. In this work, we present the first comprehensive study of machine learning methods applied to the task of predicting DNA hybridisation. For this purpose, we introduce a synthetic hybridisation dataset of over 2.5 million data points, enabling the use of a wide range of machine learning algorithms, including the latest in deep learning. Depending on the hardware, the proposed models provide a reduction in inference time ranging from one to over two orders of magnitude compared to the state-of-the-art, while retaining high fidelity. We then discuss the integration of our methods in modern, scalable workflows. The implementation is available at: https://github.com/davidbuterez/dna-hyb-deep-learning

READ FULL TEXT

page 12

page 13

page 17

research
02/03/2021

On Coding for an Abstracted Nanopore Channel for DNA Storage

In the emerging field of DNA storage, data is encoded as DNA sequences a...
research
09/12/2021

Single-Read Reconstruction for DNA Data Storage Using Transformers

As the global need for large-scale data storage is rising exponentially,...
research
03/07/2019

A biologically constrained encoding solution for long-term storage of images onto synthetic DNA

Living in the age of the digital media explosion, the amount of data tha...
research
05/11/2022

DNA data storage, sequencing data-carrying DNA

DNA is a leading candidate as the next archival storage media due to its...
research
11/18/2018

Prediction of Signal Sequences in Abiotic Stress Inducible Genes from Main Crops by Association Rule Mining

It is important to study on genes affecting to growing environment of ma...
research
02/23/2022

Using Deep Learning to Detect Digitally Encoded DNA Trigger for Trojan Malware in Bio-Cyber Attacks

This article uses Deep Learning technologies to safeguard DNA sequencing...
research
10/19/2022

NDN-TR70 – Utilizing NDN-DPDK for Kubernetes Genomics Data Lake

As the growth of genomics samples rapidly expands due to increased acces...

Please sign up or login with your details

Forgot password? Click here to reset