Population Recovery from the Deletion Channel: Nearly Matching Trace Reconstruction Bounds

04/14/2020
by   Shyam Narayanan, et al.
0

The population recovery problem asks one to recover an unknown distribution over n-bit strings given query access to independent noisy samples of strings drawn from the distribution. Recently, Ban et. al. [BCF+19] studied the problem where the unknown distribution over n-bit strings is known to be ℓ-sparse for some fixed ℓ, and the noise is induced through the deletion channel. The deletion channel is a noise model where each bit of the string is independently deleted with some fixed probability, and the retained bits are concatenated. We note that if ℓ = 1, i.e., we are trying to learn a single string, learning the distribution is equivalent to the famous trace reconstruction problem. The best known algorithms for trace reconstruction require (O(n^1/3)) samples. For population recovery under the deletion channel, Ban et. al. provided an algorithm that could learn ℓ-sparse distributions over strings using (n^1/2· (log n)^O(ℓ)) samples. In this work, we provide an algorithm that learns the distribution using only (Õ(n^1/3) ·ℓ^2) samples, by developing a higher-moment analog of the algorithms of [DOS17, NP17]. We also give the first algorithm with a runtime subexponential in n, which solves population recovery in (Õ(n^1/3) ·ℓ^3) samples and time. Notably, our dependence on n nearly matches the known upper bound when ℓ = 1, and we reduce the dependence on ℓ from doubly to nearly singly exponential. Therefore, we are able to learn the mixture even for much larger values of ℓ. For instance, Ban et. al.'s algorithm can only learn a mixture of O(log n/loglog n) strings with a subexponential number of queries, whereas we are able to learn a mixture of up to n^o(1) strings in subexponential queries and time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2019

Beyond trace reconstruction: Population recovery from the deletion channel

Population recovery is the problem of learning an unknown distribution o...
research
01/15/2018

Subpolynomial trace reconstruction for random strings and arbitrary deletion probability

The deletion-insertion channel takes as input a bit string x∈{0,1}^n, a...
research
12/04/2020

Polynomial-time trace reconstruction in the low deletion rate regime

In the trace reconstruction problem, an unknown source string x ∈{0,1}^n...
research
06/29/2020

Recovery of Sparse Signals from a Mixture of Linear Samples

Mixture of linear regressions is a popular learning theoretic model that...
research
09/02/2020

Circular Trace Reconstruction

Trace Reconstruction is the problem of learning an unknown string x from...
research
11/03/2018

Learning sparse mixtures of rankings from noisy information

We study the problem of learning an unknown mixture of k rankings over n...
research
03/04/2021

Remember What You Want to Forget: Algorithms for Machine Unlearning

We study the problem of forgetting datapoints from a learnt model. In th...

Please sign up or login with your details

Forgot password? Click here to reset