Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage

10/15/2019
by   Kui Cai, et al.
0

An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with log n + O(log log n) redundancy bits, while the other corrects a single indel with log n + 2 redundant bits. These two encoders are order-optimal. The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the GC-balanced constraint and require that exactly half of the symbols of any DNA codeword to be either C or G. In particular, via a modification of Knuth's balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of GC-balanced codes that correct a single indel or a single edit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2022

Every Bit Counts: A New Version of Non-binary VT Codes with More Efficient Encoder

In this work, we present a new version of non-binary VT codes that are c...
research
12/18/2021

Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

We consider the problem of designing low-redundancy codes in settings wh...
research
12/14/2018

Properties and constructions of constrained codes for DNA-based data storage

We describe properties and constructions of constraint-based codes for D...
research
04/29/2022

Average Redundancy of Variable-Length Balancing Schemes à la Knuth

We study and propose schemes that map messages onto constant-weight code...
research
08/03/2022

Low-redundancy codes for correcting multiple short-duplication and edit errors

Due to its higher data density, longevity, energy efficiency, and ease o...
research
07/02/2022

Balanced reconstruction codes for single edits

Motivated by the sequence reconstruction problem initiated by Levenshtei...
research
10/28/2018

Near-Linear Time Insertion-Deletion Codes and (1+ε)-Approximating Edit Distance via Indexing

We introduce fast-decodable indexing schemes for edit distance which can...

Please sign up or login with your details

Forgot password? Click here to reset