Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

by   Daniella Bar-Lev, et al.

The concept of DNA storage was first suggested in 1959 by Richard Feynman who shared his vision regarding nanotechnology in the talk "There is plenty of room at the bottom". Later, towards the end of the 20-th century, the interest in storage solutions based on DNA molecules was increased as a result of the human genome project which in turn led to a significant progress in sequencing and assembly methods. DNA storage enjoys major advantages over the well-established magnetic and optical storage solutions. As opposed to magnetic solutions, DNA storage does not require electrical supply to maintain data integrity and is superior to other storage solutions in both density and durability. Given the trends in cost decreases of DNA synthesis and sequencing, it is now acknowledged that within the next 10-15 years DNA storage may become a highly competitive archiving technology and probably later the main such technology. With that said, the current implementations of DNA based storage systems are very limited and are not fully optimized to address the unique pattern of errors which characterize the synthesis and sequencing processes. In this work, we propose a robust, efficient and scalable solution to implement DNA-based storage systems. Our method deploys Deep Neural Networks (DNN) which reconstruct a sequence of letters based on imperfect cluster of copies generated by the synthesis and sequencing processes. A tailor-made Error-Correcting Code (ECC) is utilized to combat patterns of errors which occur during this process. Since our reconstruction method is adapted to imperfect clusters, our method overcomes the time bottleneck of the noisy DNA copies clustering process by allowing the use of a rapid and scalable pseudo-clustering instead. Our architecture combines between convolutions and transformers blocks and is trained using synthetic data modelled after real data statistics.


page 12

page 13


Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

DNA has immense potential as an emerging data storage medium. The princi...

Single-Read Reconstruction for DNA Data Storage Using Transformers

As the global need for large-scale data storage is rising exponentially,...

MQ-Coder inspired arithmetic coder for synthetic DNA data storage

Over the past years, the ever-growing trend on data storage demand, more...

Deletion Correcting Codes for Efficient DNA Synthesis

The synthesis of DNA strands remains the most costly part of the DNA sto...

DNA based Network Model and Blockchain

Biological cells can transmit, process and receive chemically encoded da...

Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

Storing information in DNA molecules is of great interest because of its...

Data-Driven Bee Identification for DNA Strands

We study a data-driven approach to the bee identification problem for DN...

Please sign up or login with your details

Forgot password? Click here to reset