Log In Sign Up

Testing noisy linear functions for sparsity

by   Xue Chen, et al.

We consider the following basic inference problem: there is an unknown high-dimensional vector w ∈R^n, and an algorithm is given access to labeled pairs (x,y) where x ∈R^n is a measurement and y = w · x + noise. What is the complexity of deciding whether the target vector w is (approximately) k-sparse? The recovery analogue of this problem — given the promise that w is sparse, find or approximate the vector w— is the famous sparse recovery problem, with a rich body of work in signal processing, statistics, and computer science. We study the decision version of this problem (i.e. deciding whether the unknown w is k-sparse) from the vantage point of property testing. Our focus is on answering the following high-level question: when is it possible to efficiently test whether the unknown target vector w is sparse versus far-from-sparse using a number of samples which is completely independent of the dimension n? We consider the natural setting in which x is drawn from a i.i.d. product distribution D over R^n and the noise process is independent of the input x. As our main result, we give a general algorithm which solves the above-described testing problem using a number of samples which is completely independent of the ambient dimension n, as long as D is not a Gaussian. In fact, our algorithm is fully noise tolerant, in the sense that for an arbitrary w, it approximately computes the distance of w to the closest k-sparse vector. To complement this algorithmic result, we show that weakening any of our condition makes it information-theoretically impossible for any algorithm to solve the testing problem with fewer than essentially log n samples.


page 1

page 2

page 3

page 4


Robust testing of low-dimensional functions

A natural problem in high-dimensional inference is to decide if a classi...

The algorithm for the recovery of integer vector via linear measurements

In this paper we continue the studies on the integer sparse recovery pro...

Efficient Truncated Statistics with Unknown Truncation

We study the problem of estimating the parameters of a Gaussian distribu...

Robust Testing in High-Dimensional Sparse Models

We consider the problem of robustly testing the norm of a high-dimension...

VC Dimension and Distribution-Free Sample-Based Testing

We consider the problem of determining which classes of functions can be...

Sparse Signal Processing with Linear and Nonlinear Observations: A Unified Shannon-Theoretic Approach

We derive fundamental sample complexity bounds for recovering sparse and...

Testing of Index-Invariant Properties in the Huge Object Model

The study of distribution testing has become ubiquitous in the area of p...