Embracing Errors is More Efficient than Avoiding Them through Constrained Coding for DNA Data Storage

08/11/2023
by   Franziska Weindel, et al.
0

DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, which makes error-correcting coding techniques critical for reliable DNA data storage. To reduce the number of errors, a common technique is to include constraints that avoid homopolymers (consecutive repeated nucleotides) and balance the GC content, as sequences with homopolymers and unbalanced GC contents are often associated with larger error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative (unconstrained coding) is to control the errors by randomizing the sequences, embracing errors, and paying for them with additional coding redundancy. In this paper, we determine the error regimes in which embracing errors is more efficient than constrained coding. We find that constrained coding is inefficient in most common error regimes for DNA data storage. Specifically, the error probabilities for nucleotides in homopolymers and in sequences with unbalanced GC contents must be very large for constrained coding to achieve a higher code rate than unconstrained coding.

READ FULL TEXT

page 7

page 9

page 22

research
01/15/2018

Coding over Sets for DNA Storage

In this paper we study error-correcting codes for the storage of data in...
research
02/03/2021

On Coding for an Abstracted Nanopore Channel for DNA Storage

In the emerging field of DNA storage, data is encoded as DNA sequences a...
research
04/06/2022

SPIDER-WEB enables stable, repairable, and encryptible algorithms under arbitrary local biochemical constraints in DNA-based storage

DNA has been considered as a promising medium for storing digital inform...
research
09/13/2023

Implicit Neural Multiple Description for DNA-based data storage

DNA exhibits remarkable potential as a data storage solution due to its ...
research
04/26/2022

Managing Reliability Skew in DNA Storage

DNA is emerging as an increasingly attractive medium for data storage du...
research
10/08/2022

Constrained Optimal Querying: Huffman Coding and Beyond

Huffman coding is well known to be useful in certain decision problems i...
research
11/11/2021

Multivariate Analytic Combinatorics for Cost Constrained Channels and Subsequence Enumeration

Analytic combinatorics in several variables is a powerful tool for deriv...

Please sign up or login with your details

Forgot password? Click here to reset