Efficient average-case population recovery in the presence of insertions and deletions
Several recent works have considered the trace reconstruction problem, in which an unknown source string x∈{0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the best algorithms known for worst-case strings use (O(n^1/3)) traces DOS17,NazarovPeres17, highly efficient algorithms are known PZ17,HPP18 for the average-case version, in which x is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, there is an unknown distribution D over s unknown source strings x^1,...,x^s ∈{0,1}^n, and each sample is independently generated by drawing some x^i from D and returning an independent trace of x^i. Building on PZ17 and HPP18, we give an efficient algorithm for this problem. For any support size s ≤(Θ(n^1/3)), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s}⊂{0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm efficiently recovers D up to total variation distance ϵ with high probability, given access to independent traces of independent draws from D. The algorithm runs in time poly(n,s,1/ϵ) and its sample complexity is poly(s,1/ϵ,(^1/3n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm BCFSS19 is doubly exponential in s.
READ FULL TEXT