Information-Theoretic Foundations of DNA Data Storage

11/10/2022
by   Ilan Shomorony, et al.
0

Due to its longevity and enormous information density, DNA is an attractive medium for archival data storage. Thanks to rapid technological advances, DNA storage is becoming practically feasible, as demonstrated by a number of experimental storage systems, making it a promising solution for our society's increasing need of data storage. While in living things, DNA molecules can consist of millions of nucleotides, due to technological constraints, in practice, data is stored on many short DNA molecules, which are preserved in a DNA pool and cannot be spatially ordered. Moreover, imperfections in sequencing, synthesis, and handling, as well as DNA decay during storage, introduce random noise into the system, making the task of reliably storing and retrieving information in DNA challenging. This unique setup raises a natural information-theoretic question: how much information can be reliably stored on and reconstructed from millions of short noisy sequences? The goal of this monograph is to address this question by discussing the fundamental limits of storing information on DNA. Motivated by current technological constraints on DNA synthesis and sequencing, we propose a probabilistic channel model that captures three key distinctive aspects of the DNA storage systems: (1) the data is written onto many short DNA molecules that are stored in an unordered fashion; (2) the molecules are corrupted by noise and (3) the data is read by randomly sampling from the DNA pool. Our goal is to investigate the impact of each of these key aspects on the capacity of the DNA storage system. Rather than focusing on coding-theoretic considerations and computationally efficient encoding and decoding, we aim to build an information-theoretic foundation for the analysis of these channels, developing tools for achievability and converse arguments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

DNA-Based Storage: Models and Fundamental Limits

Due to its longevity and enormous information density, DNA is an attract...
research
02/27/2019

Capacity Results for the Noisy Shuffling Channel

Motivated by DNA-based storage, we study the noisy shuffling channel, wh...
research
07/19/2022

A self-contained and self-explanatory DNA storage system

Current research on DNA storage usually focuses on the improvement of st...
research
01/16/2022

The Secure Storage Capacity of a DNA Wiretap Channel Model

In this paper, we propose a strategy for making DNA-based data storage i...
research
03/18/2022

Image Storage on Synthetic DNA Using Autoencoders

Over the past years, the ever-growing trend on data storage demand, more...
research
04/26/2022

Managing Reliability Skew in DNA Storage

DNA is emerging as an increasingly attractive medium for data storage du...
research
03/15/2019

Review on DNA Cryptography

Cryptography is the science that secures data and communication over the...

Please sign up or login with your details

Forgot password? Click here to reset