High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition
We consider a sparse linear regression model Y=Xβ^*+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β^* is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error _βY-Xβ_2, where the minimization is over all k-sparse binary vectors β. The approximation reveals interesting structural properties of the underlying regression problem. In particular, a) We establish that n^*=2k p/ (2k/σ^2+1) is a phase transition point with the following "all-or-nothing" property. When n exceeds n^*, (2k)^-1β_2-β^*_0≈ 0, and when n is below n^*, (2k)^-1β_2-β^*_0≈ 1, where β_2 is the optimal solution achieving the smallest squared error. With this we prove that n^* is the asymptotic threshold for recovering β^* information theoretically. b) We compute the squared error for an intermediate problem _βY-Xβ_2 where minimization is restricted to vectors β with β-β^*_0=2k ζ, for ζ∈ [0,1]. We show that a lower bound part Γ(ζ) of the estimate, which corresponds to the estimate based on the first moment method, undergoes a phase transition at three different thresholds, namely n_inf,1=σ^2 p, which is information theoretic bound for recovering β^* when k=1 and σ is large, then at n^* and finally at n_LASSO/CS. c) We establish a certain Overlap Gap Property (OGP) on the space of all binary vectors β when n< ck p for sufficiently small constant c. We conjecture that OGP is the source of algorithmic hardness of solving the minimization problem _βY-Xβ_2 in the regime n<n_LASSO/CS.
READ FULL TEXT