Compressed sensing, also known as sparse recovery, is a central object of study in data stream algorithms, with applications to monitoring network traffic , analysis of genetic data [23, 13], and many other domains . The problem can be stated as recovering an underlying signal from measurements with the -approximate recovery guarantee being
where the are drawn from a distribution and . The focus of this work is on adaptive compressed sensing, in which the measurements are chosen in rounds, and the choice of measurement in each round depends on the outcome of the measurements in previous rounds.
in theoretical computer science, machine learning, image processing, and many other domains[11, 21, 2]
. In theoretical computer science and machine learning, adaptive compressed sensing serves as an important tool to obtain sublinear algorithms for active learning in both time and space[11, 6, 21, 2]. In image processing, the study of adaptive compressed sensing has led to compressed acquisition of sequential images with various applications in celestial navigation and attitude determination .
Despite a large amount of works on adaptive compressed sensing, the power of adaptivity remains a long-standing open problem. Indyk, Price, and Woodruff  were the first to show that without any assumptions on the signal , one can obtain a number of measurements which is a factor smaller than what can be achieved in the non-adaptive setting. Specifically, for and , they show that measurements suffice to achieve guarantee (1), whereas it is known that any non-adaptive scheme requires measurements, provided (Theorem 4.4 of , see also ). Improving the sample complexity as much as possible is desired, as it might correspond to, e.g., the amount of radiation a hospital patient is exposed to, or the amont of time a patient must be present for diagnosis.
The problem was studied in , for which perhaps surprisingly, a better dependence on was obtained than is possible for schemes. Still, the power of adaptivity for the recovery problem over its non-adaptive counterpart has remained unclear. An non-adaptive bound was shown in , while an adaptive lower bound of was shown in . Recently several works [24, 18] have looked at other values of and , even those for which , which do not correspond to normed spaces. The power of adaptivity for such error measures is also unknown.
1.1 Our Results
Our work studies the problem of adaptive compressed sensing by providing affirmative answers to the above-mentioned open questions. We improve over the best known results for , and then provide novel adaptive compressed sensing guarantees for for every and . See Table 1 for a comparison of results.
|, Guarantees||Upper Bounds||Rounds||Lower Bounds|
For , we design an adaptive algorithm which requires only measurements for the problem. More generally, we study the problem for . One of our main theorems is the following.
[ Recovery Upper Bound] Let and . There exists a randomized algorithm that performs adaptive linear measurements on in rounds, and with probability , returns a vector such that
Theorem 1 improves the previous sample complexity upper bound for the case of and from to . Compared with the non-adaptive -approximate upper bound of , we show that adaptivity exponentially improves the sample complexity w.r.t. the dependence on over non-adaptive algorithms while retaining the improved dependence on of non-adaptive algorithms. Furthermore, Theorem 1 extends the working range of adaptive compressed sensing from to general values of .
We also state a complementary lower bound to formalize the hardness of the above problem. [ Recovery Lower Bound] Fix , any -approximate recovery scheme with sufficiently small constant failure probability must make measurements. Theorem 1 shows that our upper bound in Theorem 1 is tight up to the factor.
We also study the case when . In particular, we focus on the case when and , as in the following theorem.
[ Recovery Upper Bound] Let . There exists a randomized algorithm that performs linear measurements on in rounds, and with probability returns a vector such that , where is the vector with the largest coordinates (in the sense of absolute value) being zeroed out. We also provide an improved result for -approximate problems.
[ Sparse Recovery Upper Bounds] Let . There exists a randomized algorithm that
uses linear measurements on in rounds;
uses linear measurements on in rounds;
and with constant probability returns a vector such that . Previously the best known tradeoff was samples and rounds for -approximation for the problem . Our result improves both the sample complexity (the first result) and the number of rounds (the second result). We summarize our results in Table 1.
1.2 Our Techniques
Sparse Recovery. Our sparse recovery scheme hashes every to buckets, and then proceeds by finding all the buckets that have mass at least . Clearly, there are of such buckets, and since all heavy coordinates are isolated due to hashing, we can find a set of buckets that contain all heavy coordinates, and moreover all these heavy coordinates are isolated from each other. Then, we run a -sparse recovery in each bucket in parallel in order to find all the heavy coordinate. However, since we have buckets, we cannot afford to take a union bound over all one-sparse recovery routines called. Instead, we show that most buckets succeed and hence we can substract from the elements returned, and then run a standard CountSketch algorithm to recover everything else. This algorithm obtains an optimal number of rounds and number of measurements, while succeeding with probability at least .
We proceed by showing an algorithm for sparse recovery with measurements and rounds. This will be important for our more general scheme, saving a factor from the number of rounds, achieving optimality with respect to this quantity. For this scheme, we utilize the scheme we just developed, observing that for small , the measurement complexity is . Our idea is then to exploit the fact that we can reduce the problem to smaller instances with logarithmic sparsity. The algorithm hashes to buckets, and in each bucket runs with sparsity . Now, in each bucket there exist at most heavy elements, and the noise from non-heavy elements is “low” enough. The algorithm in each bucket succeeds with probability ; this fact allows us to argue that all but a fraction of the buckets will succeed, and hence we can recover all but a fraction of the heavy coordinates. The next step is to subtract these coordinates from our initial vector, and then run a standard algorithm with decreased sparsity.
Sparse Recovery. Our scheme, , is based on carefully invoking several schemes with different parameters. We focus our discussion on , then mention extensions to general . A main difficulty of adapting the scheme of  is that it relies upon an scheme, and all known schemes, including ours, have at least a dependence on the number of measurements, which is too large for our overall goal.
A key insight in  for is that since the output does not need to be exactly -sparse, one can compensate for mistakes on approximating the top entries of by accurately outputting enough smaller entries. For example, if , consider two possible signals and , where occurs times in both and . One can show, using known lower bound techniques, that distinguishing from requires measurements. Moreover, and , and any -sparse approximation to or must therefore distinguish from , and so requires measurements. An important insight though, is that if one does not require the output signal to be -sparse, then one can output in both cases, without actually distinguishing which case one is in!
As another example, suppose that and for some . In this case, one can show that one needs measurements to distinguish and , and as before, to output an exactly -sparse signal providing a -approximation requires measurements. In this case if one outputs a signal with , one cannot simply find a single other coordinate to “make up” for the poor approximation on the first coordinate. However, if one were to output coordinates each of value , then the “mass" lost by poorly approximating the first coordinate would be compensated for by outputting mass on these remaining coordinates. It is not clear how to find such remaining coordinates though, since they are much smaller; however, if one randomly subsamples an fraction of coordinates, then roughly of the coordinates of value survive and these could all be found with a number of measurements proportional to . Balancing the two measurement complexities of and at gives roughly the optimal dependence on in the number of measurements.
To extend this to the adaptive case, a recurring theme of the above examples is that the top , while they need to be found, they do not need to be approximated very accurately. Indeed, they do need to be found, if, e.g., the top entries of were equal to an arbitrarily large value and the remaining entries were much smaller. We accomplish this by running an scheme with parameters and , as well as an scheme with parameters and (up to logarithmic factors in ). Another theme is that the mass in the smaller coordinates we find to compensate for our poor approximation in the larger coordinates also does not need to be approximated very well, and we find this mass by subsampling many times and running an scheme with parameters and . This technique is surprisingly general, and does not require the underlying error measure we are approximating to be a norm. It just uses scale-invariance and how its rate of growth compares to that of the -norm.
Sparse Recovery. Our last algorithm, which concerns sparse recovery, achieves measurements, showing that does not need to multiply . The key insight lies in first solving the -sparse recovery task with measurements, and then extending this to the general case. To achieve this, we hash to buckets, then solve with constant sparsity on a new vector, where coordinate equals the norm of the th bucket; this steps requires only measurements. Now, we can run standard -sparse recovery in each of these buckets returned. Extending this idea to the general case follows by plugging this sub-routine in the iterative algorithm of , while ensuring that sub-sampling does not increase the number of measurements. This means that we have to sub-sample at a slower rate, slower roughly by a factor of . The guarantee from our -sparse recovery algorithm fortunately allows this slower sub-sampling to go through and give the desired result.
Notation: For a vector , we define to be the set of its largest coordinates in absolute value. For a set , denote by the vector with every coordinate being zeroed out. We also define and , where represents the set . For a set , let be the cardinality of .
Due to space constraints, we defer the proof of Theorem 1 to the appendix.
2 Adaptive Recovery
Let , and . We will invoke the following oracle frequently throughout the paper.
Oracle 1 ().
The oracle is fed with as input parameters, and outputs a set of coordinates of size which corresponds to the support of vector , where can be any vector for which .
Suppose we subsample with probability and let be the subsampled vector formed from . Then with failure probability ,
Let be the set of coordinates in the subsample. Then . So by the Chernoff bound, Thus holds with high probability. Let if if . Then Notice that there are at least elements in with absolute value larger than . Thus for , Again by a Chernoff bound, Conditioned on the latter event not happening, By a union bound, with failure probability , we have ∎
Let be the output of the scheme on with parameters . Then with small constant failure probability,
Notice that with small constant failure probability, the guarantee holds and we have
Let be such that , and define , . Then if we are done. Otherwise, let denote the size of , and define .
Since we have ∎
Fix . For , there exists a -approximation algorithm that performs adaptive linear measurements in rounds, and with probability at least , we can find a vector such that
The algorithm is stated in Algorithm 1.
We first consider the difference .
Let be the smallest integer such that for any , .
Then for all , we have . Hence must contain at least of these indices; if not, the total squared loss is at least , a contradiction to . It follows that On the other hand, is at most , since by the guarantee
It follows that
Case 2. , and .
We claim that must contain at least a fraction of coordinates in ; if not, then the cost for missing at least a fraction of the -norm of will be at least , contradicting the guarantee. Since all coordinates ’s for have value at most , it follows that the -norm of coordinates corresponding to is at least Then
Case 3. , and .
With a little abuse of notation, let denote the output of the with parameters . Notice that there are at most non-zero elements in , and By Lemma 2, we have According to the above three cases, we conclude that Thus with failure probability at most ,
In order to convert the first term on the right hand side of (3) to a term related to the norm (which is a semi-norm if ), we need the following inequalities: for every and , by splitting into chunks of size , we have
Define . This gives us that, for Therefore,
Let denote an independent subsample of with probability , and be the output of the algorithm with parameter s. Notice that by the adaptive guarantee. Define . There are at least elements in , and every element in has absolute value at least . In each subsample, notice that . Thus with sufficiently small constant failure probability there exists at least element in with absolute value at least . On the other hand, by Lemma 2 and Lemma 2,
with sufficiently small constant failure probability given by the union bound. For the independent copies of subsamples, by a Chernoff bound, a fraction of them will have the largest absolute value in and (5) will also hold, with the overall failure probability being . Therefore, since , and by the fact that ,
The total number of measurements will be at most
while the total failure probability given by the union bound is , which completes the proof. ∎
3 Adaptive Sparse Recovery
In this section, we will prove Theorem 1. Our algorithm first approximates . The goal is to compute a value which is not much smaller than , and also at least . This value will be used to filter out coordinates that are not large enough, while ensuring that heavy coordinates are included. We need the following lemma, which for example can be found in Section 4 of .
Using non-adaptive measurements we can find with probability a value such that where are absolute constants larger than .
We use the aforementioned lemma with measuremenents to obtain such a value with probability . Now let be an absolute constant and let be a random hash function. Then, with probability at least we have that for every , . By running PartitionCountSketch, we get back an estimate for every ; here is an absolute constant. Let be an absolute constant to be chosen later. We set and We prove the following lemma.
Let be an absolute constant. With probability at least the following holds.
Every such that there exists , will be present in .
For every , there exists exactly one coordinate with .
For every ,