Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

10/20/2022
by   Yun Qin, et al.
0

DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably introduce errors, posing challenges to accurate data recovery. Sequence reconstruction consists of inferring the DNA reference from a cluster of erroneous copies. A common assumption in existing methods is that all the strands within a cluster are noisy copies originating from the same reference, thereby contributing equally to the reconstruction. However, this is not always valid considering the existence of contaminated sequences caused, for example, by DNA fragmentation and rearrangement during the DNA storage process.This paper proposed a robust multi-read reconstruction model using DNN, which is resilient to contaminated clusters with outlier sequences, as well as to noisy reads with IDS errors. The effectiveness and robustness of the method are validated on three next-generation sequencing datasets, where a series of comparative experiments are performed by simulating varying contamination levels that occurring during the process of DNA storage.

READ FULL TEXT

page 6

page 12

page 13

research
09/12/2021

Single-Read Reconstruction for DNA Data Storage Using Transformers

As the global need for large-scale data storage is rising exponentially,...
research
08/31/2021

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

The concept of DNA storage was first suggested in 1959 by Richard Feynma...
research
01/18/2018

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

DNA as a data storage medium has several advantages, including far great...
research
07/11/2022

Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

Storing information in DNA molecules is of great interest because of its...
research
04/27/2017

DNA Steganalysis Using Deep Recurrent Neural Networks

The technique of hiding messages in digital data is called a steganograp...
research
01/24/2022

Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

In ecology it has become common to apply DNA barcoding to biological sam...
research
10/12/2020

Trace Reconstruction Problems in Computational Biology

The problem of reconstructing a string from its error-prone copies, the ...

Please sign up or login with your details

Forgot password? Click here to reset