Fundamental Limits of Multiple Sequence Reconstruction from Substrings

05/10/2023
by   Kel Levick, et al.
0

The problem of reconstructing a sequence from the set of its length-k substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources are to be simultaneously reconstructed from the union of their k-mer sets. We consider an asymptotic regime where m = n^α i.i.d. source sequences of length n are to be reconstructed from the set of their substrings of length k=βlog n, and seek to characterize the (α,β) pairs for which reconstruction is information-theoretically feasible. We show that, as n →∞, the source sequences can be reconstructed if β > max(2α+1,α+2) and cannot be reconstructed if β < max( 2α+1, α+ 32), characterizing the feasibility region almost completely. Interestingly, our result shows that there are feasible (α,β) pairs where repeats across the source strings abound, and non-trivial reconstruction algorithms are needed to achieve the fundamental limit.

READ FULL TEXT
research
08/26/2021

Multi-strand Reconstruction from Substrings

The problem of string reconstruction based on its substrings spectrum ha...
research
11/02/2018

Optimal Sequence Length Requirements for Phylogenetic Tree Reconstruction with Indels

We consider the phylogenetic tree reconstruction problem with insertions...
research
07/19/2023

Fundamental Limits of Reference-Based Sequence Reordering

The problem of reconstructing a sequence of independent and identically ...
research
07/10/2019

On the Algorithmic Probability of Sets

The combined universal probability m(D) of strings x in sets D is close ...
research
10/05/2021

Reconstruction of Sets of Strings from Prefix/Suffix Compositions

The problem of reconstructing strings from substring information has fou...
research
03/26/2019

Reconstruction of r-Regular Objects from Trinary Images

We study digital images of r-regular objects where a pixel is black if i...
research
10/27/2020

Impossibility of phylogeny reconstruction from k-mer counts

We consider phylogeny estimation under a two-state model of sequence evo...

Please sign up or login with your details

Forgot password? Click here to reset