Testing Convex Truncation

05/04/2023
by   Anindya De, et al.
0

We study the basic statistical problem of testing whether normally distributed n-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set S ⊆ℝ^n. As our main algorithmic results, (1) We give a computationally efficient O(n)-sample algorithm that can distinguish the standard normal distribution N(0,I_n) from N(0,I_n) conditioned on an unknown and arbitrary convex set S. (2) We give a different computationally efficient O(n)-sample algorithm that can distinguish N(0,I_n) from N(0,I_n) conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially n^√(n) samples. An easy argument shows that no finite number of samples suffices to distinguish N(0,I_n) from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish N(0,I_n) from N(0,I_n) conditioned on an unknown symmetric convex set must use Ω(n) samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2019

Efficient Truncated Statistics with Unknown Truncation

We study the problem of estimating the parameters of a Gaussian distribu...
research
07/14/2022

Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability dis...
research
03/03/2022

Private High-Dimensional Hypothesis Testing

We provide improved differentially private algorithms for identity testi...
research
08/27/2023

Testing Junta Truncation

We consider the basic statistical problem of detecting truncation of the...
research
10/25/2022

Gaussian Mean Testing Made Simple

We study the following fundamental hypothesis testing problem, which we ...
research
09/11/2018

Efficient Statistics, in High Dimensions, from Truncated Samples

We provide an efficient algorithm for the classical problem, going back ...
research
05/24/2016

Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models

We consider the problem of learning the underlying graph of an unknown I...

Please sign up or login with your details

Forgot password? Click here to reset