Efficient average-case population recovery in the presence of insertions and deletions

07/12/2019
by   Frank Ban, et al.
0

Several recent works have considered the trace reconstruction problem, in which an unknown source string x∈{0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the best algorithms known for worst-case strings use (O(n^1/3)) traces DOS17,NazarovPeres17, highly efficient algorithms are known PZ17,HPP18 for the average-case version, in which x is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, there is an unknown distribution D over s unknown source strings x^1,...,x^s ∈{0,1}^n, and each sample is independently generated by drawing some x^i from D and returning an independent trace of x^i. Building on PZ17 and HPP18, we give an efficient algorithm for this problem. For any support size s ≤(Θ(n^1/3)), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s}⊂{0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm efficiently recovers D up to total variation distance ϵ with high probability, given access to independent traces of independent draws from D. The algorithm runs in time poly(n,s,1/ϵ) and its sample complexity is poly(s,1/ϵ,(^1/3n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm BCFSS19 is doubly exponential in s.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset