Error-correcting Codes for Noisy Duplication Channels

08/18/2020
by   Yuanyuan Tang, et al.
0

Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2018

Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems

A (tandem) duplication of length k is an insertion of an exact copy of...
research
08/03/2022

Low-redundancy codes for correcting multiple short-duplication and edit errors

Due to its higher data density, longevity, energy efficiency, and ease o...
research
11/11/2020

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Due to its high data density and longevity, DNA is considered a promisin...
research
01/18/2018

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

DNA as a data storage medium has several advantages, including far great...
research
05/17/2023

Error-Correcting Codes for Nanopore Sequencing

Nanopore sequencers, being superior to other sequencing technologies for...
research
10/21/2022

Non-binary Codes for Correcting a Burst of at Most t Deletions

The problem of correcting deletions has received significant attention, ...
research
03/11/2019

Clustering-Correcting Codes

A new family of codes, called clustering-correcting codes, is presented ...

Please sign up or login with your details

Forgot password? Click here to reset