Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure
Let 𝐗 = (X_i)_1≤ i ≤ n be an i.i.d. sample of square-integrable variables in ℝ^d, with common expectation μ and covariance matrix Σ, both unknown. We consider the problem of testing if μ is η-close to zero, i.e. μ≤η against μ≥ (η + δ); we also tackle the more general two-sample mean closeness (also known as relevant difference) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance δ such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of μ^2 used a test statistic, and secondly for estimating the operator and Frobenius norms of Σ coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension d_* of the distribution, defined as d_* := Σ_2^2/Σ_∞^2. In particular, for η=0, the minimum separation distance is Θ( d_*^1/4√(Σ_∞/n)), in contrast with the minimax estimation distance for μ, which is Θ(d_e^1/2√(Σ_∞/n)) (where d_e:=Σ_1/Σ_∞). This generalizes a phenomenon spelled out in particular by Baraud (2002).
READ FULL TEXT