Generalized Unique Reconstruction from Substrings

10/10/2022
by   Yonatan Yehezkeally, et al.
0

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2022

Reconstruction from Substrings with Partial Overlap

This paper introduces a new family of reconstruction codes which is moti...
research
08/26/2021

Multi-strand Reconstruction from Substrings

The problem of string reconstruction based on its substrings spectrum ha...
research
12/23/2019

Reconstruction of Strings from their Substrings Spectrum

This paper studies reconstruction of strings based upon their substrings...
research
12/14/2018

Properties and constructions of constrained codes for DNA-based data storage

We describe properties and constructions of constraint-based codes for D...
research
03/08/2018

Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets

Synchronization strings are recently introduced by Haeupler and Shahrasb...
research
10/06/2021

Coded Shotgun Sequencing

Most DNA sequencing technologies are based on the shotgun paradigm: many...
research
01/20/2020

Uncertainty of Reconstructing Multiple Messages from Uniform-Tandem-Duplication Noise

A growing number of works have, in recent years, been concerned with in-...

Please sign up or login with your details

Forgot password? Click here to reset