The All-or-Nothing Phenomenon in Sparse Linear Regression

03/12/2019
by   Galen Reeves, et al.
0

We study the problem of recovering a hidden binary k-sparse p-dimensional vector β from n noisy linear observations Y=Xβ+W where X_ij are i.i.d. N(0,1) and W_i are i.i.d. N(0,σ^2). A closely related hypothesis testing problem is to distinguish the pair (X,Y) generated from this structured model from a corresponding null model where (X,Y) consist of purely independent Gaussian entries. In the low sparsity k=o(p) and high signal to noise ratio k/σ^2=Ω(1) regime, we establish an `All-or-Nothing' information-theoretic phase transition at a critical sample size n^*=2 k(p/k) /(1+k/σ^2), resolving a conjecture of gamarnikzadik. Specifically, we show that if _p→∞ n/n^*>1, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if _p→∞ n/n^*<1, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess. Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback-Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.

READ FULL TEXT
research
01/16/2017

High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition

We consider a sparse linear regression model Y=Xβ^*+W where X has a Gaus...
research
05/04/2018

Global testing under the sparse alternatives for single index models

For the single index model y=f(β^τx,ϵ) with Gaussian design, and β is a...
research
01/01/2021

Sub-Gaussian Error Bounds for Hypothesis Testing

We interpret likelihood-based test functions from a geometric perspectiv...
research
05/11/2022

Second-Order Asymptotics of Hoeffding-Like Hypothesis Tests

We consider a binary statistical hypothesis testing problem, where n ind...
research
01/10/2015

On model misspecification and KL separation for Gaussian graphical models

We establish bounds on the KL divergence between two multivariate Gaussi...
research
02/07/2023

Phase Transitions in the Detection of Correlated Databases

We study the problem of detecting the correlation between two Gaussian d...
research
11/18/2018

Information Theoretic Bound on Optimal Worst-case Error in Binary Mixture Identification

Identification of latent binary sequences from a pool of noisy observati...

Please sign up or login with your details

Forgot password? Click here to reset