# Recovering Graph-Structured Activations using Adaptive Compressive Measurements

We study the localization of a cluster of activated vertices in a graph, from adaptively designed compressive measurements. We propose a hierarchical partitioning of the graph that groups the activated vertices into few partitions, so that a top-down sensing procedure can identify these partitions, and hence the activations, using few measurements. By exploiting the cluster structure, we are able to provide localization guarantees at weaker signal to noise ratios than in the unstructured setting. We complement this performance guarantee with an information theoretic lower bound, providing a necessary signal-to-noise ratio for any algorithm to successfully localize the cluster. We verify our analysis with some simulations, demonstrating the practicality of our algorithm.

## Authors

• 57 publications
• 19 publications
• 61 publications
• ### Recovering Block-structured Activations Using Compressive Measurements

We consider the problems of detection and localization of a contiguous b...
09/15/2012 ∙ by Sivaraman Balakrishnan, et al. ∙ 0

• ### Compressive Shack-Hartmann Wavefront Sensing based on Deep Neural Networks

The Shack-Hartmann wavefront sensor is widely used to measure aberration...
11/20/2020 ∙ by Peng Jia, et al. ∙ 1

• ### Generalized FMD Detection for Spectrum Sensing Under Low Signal-to-Noise Ratio

Spectrum sensing is a fundamental problem in cognitive radio. We propose...
02/19/2012 ∙ by Feng Lin, et al. ∙ 0

• ### Semi-Supervised Cluster Extraction via a Compressive Sensing Approach

We use techniques from compressive sensing to design a local clustering ...
08/17/2018 ∙ by Ming-Jun Lai, et al. ∙ 0

• ### Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors

Compressive phase retrieval is a popular variant of the standard compres...
06/29/2021 ∙ by Zhaoqiang Liu, et al. ∙ 0

• ### Testing Changes in Communities for the Stochastic Block Model

We introduce the problems of goodness-of-fit and two-sample testing of t...

• ### Level Set Estimation from Compressive Measurements using Box Constrained Total Variation Regularization

Estimating the level set of a signal from measurements is a task that ar...
10/09/2012 ∙ by Akshay Soni, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We are interested in recovering the support of a sparse vector

observed through the noisy linear model:

 yi=aTi\xb+ϵi

Where and

. This support recovery problem is well-known and fundamental to the theory of compressive sensing, which involves estimating a high-dimensional signal vector from few linear measurements

[5]. Indeed if is a -sparse vector whose non-zero components are , it is now well known that one cannot identify these components if and one can if , provided that  [14]. Indeed if is a -sparse vector whose non-zero components are , it is now well known that one can identify these components if and only if and  [1, 14].

We build upon the classical results of compressive sensing by developing procedures that are adaptive and that exploit additional structure in the underlying signal. Adaptivity allows the procedure to focus measurements on activated components of the signal while structure can dramatically reduce the combinatorial search space of the problem. Combined, both ideas can lead to significant performance improvements over classical compressed sensing. This paper explores the role of adaptivity and structure in a very general support recovery problem.

Active learning and adaptivity are not new ideas to the signal processing community and a number of papers in recent years have characterized the advantages and limits of adaptive sensing over passive approaches. One of the first ideas in this direction was distilled sensing [8], which uses direct rather than compressive measurements. Inspired by that work, a number of authors have studied adaptivity in compressive sensing and shown similar performance gains [6, 7, 10]. These approaches do not incorporate any notion of structure.

The introduction of structure to the compressed sensing framework has also been explored by a number of authors [12, 4, 3]. Broadly speaking, these structural assumptions restrict the signal to a few of the linear subspaces that contain -sparse signals. With this restrictions, one can often design sensing procedures that focus on these allowed subspaces and enjoy significant performance improvements over unstructured problems. We remark that both Soni and Haupt and Balakrishnan et. al. develop adaptive sensing procedures for structured problems, but under a more restrictive setting than this study [12, 3].

This paper continues in both of these directions exploring the role of adaptivity and structure in recovering activated clusters in graphs. We consider localizing activated clusters of nodes whose boundary in the graph is smaller than some parameter . This notion of structure is more general than previous studies, yet we are still able to demonstrate performance improvements over unstructured problems.

Our study of cluster identification is motivated by a number of applications in sensor networks measurement and monitoring, including identification of viruses in human or computer networks or contamination in a body of water. In these settings, we expect the signal of interest to be localized, or clustered, in the underlying network and want to develop efficient procedures that exploit this cluster structure.

In this paper, we propose two related adaptive sensing procedures for identifying a cluster of activations in a network. We give a sufficient condition on the signal-to-noise ration (SNR) under which the first procedure exactly identifies the cluster. While this SNR is only slightly weaker than the SNR that is sufficient for unstructured problems, we show via information-theoretic arguments that one cannot hope for significantly better performance.

For the second procedure, we perform a more refined analysis and show that the required SNR depends on how our algorithmic tool captures the cluster structure. In some cases this can lead to consistent recovery at much weaker SNR. The second procedure can also be adapted to recover a large fraction of the cluster. We also explore the performance of our procedures via an empirical study. Our results demonstrate the gains from exploiting both structure and adaptivity in support recovery problems.

We put our results in context of the compressed sensing landscape in Tables 1 and 2. Here is the cluster size and, in the structured setting, denotes the number of edges leaving the cluster. In the unstructured setting, Wainwright, and later Aeron et al., studied the passive support recovery problem while Haupt and Nowak consider the adaptive case [14, 1, 7]. These works analyze algorithms with near-optimal performance guarantees. Our work provides both upper and lower bounds for the adaptive structured setting. Focusing on different notions of structure, Balakrishnan et. al. give necessary and sufficient conditions for recovering a small square of activiations in a grid [3] while Soni and Haupt analyze the recovery of tree-sparse signals [12, 13]. Our work provides guarantees that depends on how well the signal is captured by our algorithmic construction. In the worst case, we guarantee exact recover with an SNR of (Proposition 2.1) and in the best case, we can tolerate an SNR of (Theorem 2.2). It is worth mentioning that [12] obtains better results than ours, but study a very specific setting where the graph is a rooted tree and the signal is rooted subtree.

## 2 Main Results

Let denote a set of activated vertices in a known graph on nodes with maximal degree . We observe through noisy compressed measurements of the vector , that is we may select sensing vectors and observe where independently. We require so that the total sensing energy, or budget, is at most . We allow for adaptivity, meaning that the procedure may use the measurements to inform the choice of the subsequent vector . We assume that the signal strength is known. Our goal is to develop procedures that successfully recover in a low signal-to-noise ratio regime.

We will require the set , which we will henceforth call a cluster, to have small cut-size in the graph . Formally:

 C⋆∈\Ccalρ={C:|{(u,v):u∈C,v∉C}|≤ρ}

Our algorithmic tool for identification of is a dendrogram , a hierarchical partitioning of .

A dendrogram is a tree of blocks where each block is a connected set of vertices in and:

1. The root of is , the set of all vertices, and the leaves of the dendrogram are all of the singletons , . The sets corresponding to the children of a block form a partition of the elements in while preserving graph connectivity in each cluster.

2. has degree at most , the maximum degree in .

3. is approximately balanced. Specifically the child of any block has size at most .

4. The height of is at most .

See Figure 1. We will see one way to construct such dendrograms in Section 2.3. Note that the results of Sharpnack et. al. imply that one can construct a suitable dendrogram for any graph [11]. By the fact that each block of is a connected set of vertices, we immediately have the following proposition: A block is impure if . For any in at most blocks are impure at any level in .

### 2.1 Universal Guarantees

With a dendrogram , we can sense with measurements of the form for a parent block and recursively sense on the children blocks to identify the activated vertices. This procedure has the same flavor as the compressive binary search procedure [6]. Specifically, fix a threshold and energy parameter and when sensing on block obtain the measurement

 yD=√α1TD\xb+ϵD (1)

If continue sensing on ’s children, otherwise terminate the recursion. At a fairly weak SNR and with appropriate setting for and , we can show that this procedure will exactly identify :

Set . If the SNR satisfies:

 μσ≥√8αlog(dρL+1δ) (2)

then with probability , Algorithm 1 recovers and using a sensing budget of at most .

We must set so we do not exceed our budget of . With the this setting, the SNR requirement is:

 μσ≥√24nmlog2(dρ)log(dρL+1δ)

Algorithm 1 performs similarly to the adaptive procedures for unstructured support recovery. For constant , the SNR requirement is which is on the same order as the compressive binary search procedure [6] for recovering -sparse signals. For -sparse signals, the best results require SNR of which can be much worse than our guarantee when and is small [10, 7].

Thus, the procedure does enjoy small benefit from exploiting structure, but the generality of our set up precludes more substantial performance gains. Indeed, we are able to show that one cannot do much better than Algorithm 1. This information theoretic lower bound is a simple consequence of the results from Arias-Castro et. al. [2].

Fix any graph and suppose . If:

 μσ=o(√nm)

then . Therefore no procedure can reliably estimate .

The lower bound demonstrates one of the fundamental challenges in exploiting structure in the cluster recovery problem: since is not parameterized by cluster size, in the worst case, one should not hope for performance improvements that depend on cluster size or sparsity. More concretely, if , the set contains all singleton vertices, reducing to a completely unstructured setting. Here, the results of Davenport and Arias-Castro imply that to exactly recover a cluster of size one, it is necessary to have SNR of  [6]. Moreover, nothing in our setup prevents from being a complete graph on vertices, which also reduces to the unstructured setting.

The inherent difficulty of this problem is not only information-theoretic, but also computational. The typical way to exploit structure is to scan across the possible signal patterns, using the fact that the search space is highly restricted. Unfortunately, Karger proved that the number of cuts of size is [9], meaning that is not very restrictive. Even if we could efficiently scan all patterns in , distinguishing between two clusters with high overlap would still require high SNR. As a concrete example, Balakrishnan et. al. showed that localizing a contiguous chain of activations in a line graph is impossible when  [3]. The second term arises from the overlap between the contiguous blocks and is independent of both and , demonstrating the challenge in distinguishing these overlapping clusters.

### 2.2 Cluster-Specific Guarantees

The main performance bottleneck for Algorithm 1 comes from testing whether a block of size 1 is active or not. If there are no such singleton blocks, meaning that the cluster is grouped into large blocks in , we might expect that Algorithm 1 or a variant can succeed at lower SNR. We formalize this idea here, analyzing an algorithm whose performance depends on how is partitioned across the dendrogram .

We quantify this dependence with the notion of maximal blocks which are the largest blocks that are completely active. Formally is maximal if and ’s parent is impure, and we denote this set of maximal blocks . If the maximal blocks are all large, then we can hope to obtain performance improvements.

The algorithm consists of two phases. The first phase (the adaptive phase) is similar to Algorithm 1. With a threshold , and energy parameter , we sense on a block with

 yD=√α1TD\xb+ϵD

If we sense on ’s children and we construct a pruned dendrogram of all blocks , for which . The pruned dendrogram is much smaller than but it retains a large fraction of .

Since we have significantly reduced the dimensionality of the problem we can now use a passive localization procedure to identify at a low SNR. In the passive phase, we construct an orthonormal basis for the subspace:

 {1D:D∈\Kcal}

With another energy parameter , we observe for each basis vector and form the vector by stacking these observations. We then construct the vector . With the vector we solve the following optimization problem to identify the cluster ():

 ^C=argmaxC⊆[n]1TC^\xb||^\xb||√|C|

which can be solved by a simple greedy algorithm. A detailed description is in Algorithm 2. For a more concise presentation, in the following results, we omit the dependence on the maximum degree of the graph, . This localization guarantee is stated in terms of the distance .

Set so that and 111We provide exact definitions of and in the appendix.

 α=mnlog2((ρ+k)logn),β=m(ρ+k)% polylog(n,ρ)

where . If

 μσ=ω⎛⎝(ρ+k)polylog(n,ρ)√mk+ ⎷nlog2((ρ+k)logn)m|Mmin|2⎞⎠

where , then and the budget is .

The SNR requirement in the theorem decomposes into two terms, corresponding to the two phases of the algorithm, and our choice of and distribute the sensing budget evenly over the terms, allocating energy to each. Note however, that the first term, corresponding to the passive phase, has a logarithmic dependence on while the second term, corresponding to the adaptive phase, has a polynomial dependence, so in practice one should allocate more energy to the adaptive phase. With our allocation, the second term usually dominates, particularly for small and , which is a regime of interest. Then the required SNR is:

 μσ=ω(1|Mmin|√nmlog2((ρ+k)logn))

To more concretely interpret the result, we present sufficient SNR scalings for three scenarios in Table 3. We think of . The most favorable realization is when there is only one maximal block of size . Here, there is a significant gain in SNR over unstructured recovery or even Algorithm 1.

Another interesting case is when the maximal blocks are all at the same level in the dendrogram. In this case, there can be at most maximal blocks since each of the parents is impure and there can only be impure blocks per level. If the maximal blocks are approximately the same size, then , and we arrive at the requirement in the second row of Table 3. Again we see performance gains from structure, although there is some degradation.

Unfortunately, since the bound depends on , we do not always realize such gains. When is a singleton block (one node), our bound deteriorates to the third row of Table 3. We remark that modulo factors, this matches the SNR scaling for the unstructured (sparse) setting. It also nearly matches the lower bound in Theorem 2.1.

Theorem 2.2 shows that the size of is the bottleneck to recovering . If we are willing to tolerate missing the small blocks we can sense at lower SNR.

Let and . If:

 μσ=ω((ρ+k)polylog(n,ρ)√mk+1t√nmpolylog(n,ρ,j,t))

then with probability , and .

In particular, we can recover all maximal blocks of size with SNR on the order of , which clearly shows the gain in exploiting structure in this problem.

### 2.3 Constructing Dendrograms

A general algorithm for constructing a dendrogram parallels the construction of spanning tree wavelets in Sharpnack et. al. [11]. Given a spanning tree for , the root of the dendrogram is , and the children are the subtrees around a balancing vertex . The dendrogram is built recursively by identifying balancing vertices and using the subtrees as children. See Algorithm 4 for details. It is not hard to verify that this algorithm produces a dendrogram according to Definition 2.

## 3 Experiments

We conducted two simulation studies to verify our theoretical results and examine the performance of our algorithms empirically. First, we empirically verify the SNR scaling in Proposition 2.1. In the second experiment, we compare both of our algorithms with the algorithm of Haupt and Nowak [7], which is an unstructured adaptive compressed sensing procedure with state-of-the-art performance.

In Figure 2 we plot the probability of successful recovery of as a function of a rescaled parameter. This parameter was chosen so that the condition on the SNR in Proposition 2.1 is equivalent to for some constant . Proposition 2.1 then implies that with this rescaling, the curves should all line up, which is the phenomenon we observe in Figure 2. Here is the two dimensional torus and was constructed using Algorithm 4.

In Figure 3 we plot the error, measured by , as a function of for three algorithms. We use both Algorithms 1 and 2 as well as the sequentially designed compressed sensing algorithm (SDC) [7],which has near-optimal performance for unstructured sparse recovery. Here is the line graph, is the balanced binary dendrogram, and so each signal is a contiguous block.

In the first figure, and since the maximal clusters are necessarily small, there should be little benefit from structure. Indeed, we see that all three algorithms perform similarly. This demonstrates that in the absence of structure, our procedures perform comparably to existing approaches for unstructured recovery. When (the second figure), we see that both Algorithms 1 and 2 outperform SDC, particularly at low SNRs. Here, as predicted by our theory, Algorithm 2 can identify a large part of the cluster at very low SNR by exploiting the cluster structure. In fact Algorithm 1 empirically performs well in this regime although we do not have theory to justify this.

## 4 Conclusion

We explore the role of structure and adaptivity in the support recovery problem, specifically in localizing a cluster of activations in a network. We show that when the cluster has small cut size, exploiting this structure can result in performance improvements in terms of signal-to-noise ratios sufficient for cluster recovery. If the true cluster coincides with a dendrogram over the graph, then weaker SNRs can be tolerated. These results do not contradict the necessary conditions for this problem, which shows that one cannot do much better than the unstructured setting for exact recovery.

While our work contributes to understanding the role of structure in compressive sensing, our knowledge is still fairly limited. We now know of some specific instances where structured signals can be localized at a very weak SNR, but we do not have a full characterization of this effect. Our goal was to give such a precise characterization, but the generality of our set-up resulted in an information-theoretic barrier to demonstrating significant performance gains. An interesting direction for future research is to precisely quantify settings that are not too general nor very specific when structure can lead to improved sensing performance and to develop algorithms that enjoy these gains.

## Appendix A Proof of Theorem 2.1

The proof is a simple extension of Theorem 2 from Davenport and Arias-Castro [6]. In particular, if then contains all one-sparse signals. Restricting to just these signals, the results from [6] imply that we cannot even detect if the activation is in the first or second half of the vertices unless . This results in the lower bound.

If we are also interested in introducing the cluster size parameter we are can prove a similar lower bound by reduction to one-sparse testing. If then all support patterns are in so we are again in the unstructured setting. Here, the results from [2] give the lower bound.

If then we are in a structured setting in that not all support patterns are possible. However, if we look at the cycle graph, each contiguous block contributes to the cut size, so if we are allowed at least two contiguous blocks. If of the activations lie in one contiguous block, then the last activation can be distributed in any of the remaining vertices. Even if the localization procedure was provided with knowledge of the location of the activations, an SNR of would be necessary for identifying the last activation.

## Appendix B Proof of Proposition 2.1

Recall that for any block that we sense, we obtain . Consider a single block , Gaussian tail bounds reveal the following facts:

1. If , then with probability , .

2. If , then with probability , .

3. Otherwise, with probability : .

The above facts reveal that if:

 μσ≥2√2log(1/δ)α

then we will correctly identify if is empty, full or impure. Assuming we perform this test correctly, we only refine if it is impure, and Proposition 2 reveals that at most clusters can be impure per level. For each of these clusters that we refine, we search on at most clusters at the subsequent level. The total budget that we use is (recall that is the height of ):

 L∑l=0αmin{n,ρdn2l} = α⎛⎝log2(ρd)−1∑l=0n+L∑l=log2(ρd)ρdn2l⎞⎠ = α⎛⎝nlog2(ρd)+L−log2(ρd)∑l=0ρ2log2(ρd)n2l⎞⎠ ≤ α(nlog2(ρd)+2n)≤3αnlog2(ρd)

Setting as in the Proposition makes this quantity smaller than . Finally, we take a union bound over the blocks that we sense on ( per level not counting the root and one more for the root) and plug in our bound on to arrive at the final rate of:

 μσ≥√24nlog2(ρd)log((ρdL+1)/δ)m

The threshold is specified to ensure that that failure probability for all of the tests is at most . In the algorithm, thresholding at and favors identifying clusters as impure over identifying them as pure. In practice these thresholds may improve performance because it is worse to incorrectly mark an impure block as pure (leading to incorrect recovery of the cluster) than it is to incorrectly mark a pure block as impure (leading to an increase in measurement budget). Using these thresholds has no ramifications on our theory.

## Appendix C Proof of Theorem 2.2

To prove Theorem 2.2 we must analyze each phase of the procedure. We first turn to the adaptive phase. By setting the threshold correctly, we retain a large fraction of while removing a large number of inactive nodes. We call the set of blocks retained by the adaptive phase and we measure the fraction of lost by the projection onto the basis for the subspace spanned by the blocks in . In the passive phase, we use the fact that is small to bound the MSE of the reconstruction . We then translate this MSE guarantee into an error guarantee for .

Throughout, let denote the total number of impure blocks in the dendrogram and let denote the height of . Note that and . Recall that .

With all the results in the following sections we will be able to bound with probability as:

 d(^C,C⋆) ≤ 4μ2k||^\xb−μ1C⋆||2 (3) ≤ 4cσ2|\Kcal|μ2kβ+4Lδexp{−1/8α|Mmin|2μ2/σ2} (4) ≤ 8cσ2L2(3rdlog(rdL/δ)+k)2μ2km+ (6) +4Lδexp{−148m|Mmin|2μ2/σ2nlog2(4rd2log(rdL/δ)+k)}

Here Equation 3 follows from our analysis of the optimization phase (Lemma C.3), and Equation 4 follows from the bounds in Section C.2. The last step follows by plugging in bounds on and if we want to allocate energy to each phase. Specifically the bound on comes from Lemma C.1 while the bound for comes from Lemma C.1. Lemma C.1 shows that the energy used in the first phase of the algorithm is and allocating energy to this phase gives the bound on . On the other hand Lemma C.1 shows that the after the adaptive phase, the pruned dendrogram contains at most blocks, which is precisely the dimensionality of the subspace we sense over in the passive phase of the procedure. To summarize, setting:

 α ≤ m6nlog2(4rd2log(rdL/δ)+k) (7) β ≤ m6rdLlog(rdL/δ)+2Lk (8)

ensures that the sensing budget is no more than .

We obtain the final result by plugging in the bounds on and . With these bounds, the first term is as long as:

 μσ=ω(ρdlog22nlog(ρdlog22n/δ)+klog2n√mk)

The second term is when:

 μσ=ω(1|Mmin|√nmlog2(ρd2log2nlog(ρdlog22n/δ)+k)log(log2n/δ))

Note that we can only apply Lemma C.3 if the right hand side of Equation 3 is . However if meets the above two requirements, then that quantity is going to zero, so for large enough this will certainly be the case.

Our analysis will focus on recovering maximal blocks , which are the largest blocks that contain only activated vertices. Formally, is maximal if and if ’s parent contains some unactivated vertices. We are also interested in identifying impure blocks, (blocks that partially overlap with ). Suppose there are such impure clusters.

The first lemma helps us bound the number empty nodes that we retain: Threshold at where and:

 q=√5−12dmax

Then with probability the pruned dendrogram contains at most blocks per level for a total of at most .

###### Proof.

For the first claim, we analyze the adaptive procedure on an empty dendrogram, showing that we retain no more than per level. The proof is by induction on the level . Let the inductive hypothesis be that where is the number of nodes retained at the th level. Then by the Chernoff bound,

 P[tl−Etl≥ϵ]≤exp{−ϵ23Etl}

can be bounded by since each of the blocks that we retain at the st level can have at most children and since we retain each block with probability in expectation. With a union bound across all levels, we have that with probability :

 tl≤dqtl−1+√3dqtl−1log(L/δ)

Applying the inductive hypothesis and the definition of :

 tl≤3log(L/δ)(dq+√dq)≤3log(L/δ)

Thus for each empty dendrogram, we retain at most .

Each of the impure clusters can spawn off at most empty subtrees in the dendrogram. Taking a union bound over each of these empty subtrees shows that at most empty blocks are retained. There are at most active blocks, which gives us a bound on the size of . ∎

Next we compute the probability that we fail to retain a maximal cluster: For any maximal cluster , the probability that is bounded by:

 P[M∉\Kcal]≤Lexp{−1/2(√α|M|μ/σ−z)2}

as long as .

###### Proof.

We fail to retain a maximal cluster if we throw away any of its ancestors in the dendrogram. All the ancestors of have at least activations so for each of ’s ancestors. All

have the same variance

. By a union bound and Gaussian tail inequality the failure probability is at most:

 P[M∉\Kcal]≤LP[yM<σz]≤Lexp{−1/2(√α|M|μ/σ−z)2}

To complete the adaptive phase, we must set so that we use at most half of the budget. The energy used in the adaptive phase is:

 α(3nlog2(4rd2log(rdL/δ)+|C⋆|))
###### Proof.

At level we retain at most empty blocks, so we sense on at most empty blocks (the at most impure blocks could spawn off up to empty ones). We also sense on at most impure blocks and also sense every completely active block. In total we sense on no more:

 3rd2log(rdL/δ)+dρ+|C⋆|≤4rd2log(rdL/δ)+|C⋆|

blocks () at the st level. Since each block at the th level has size at most we can bound the total energy as:

 αL∑l=0min{n,(4rd2log(rdL/δ)+|C⋆|)n2l} ≤ α(nlog2(4rd2log(rdL/δ)+|C⋆|)+∞∑l=0n2l) ≤ α(3nlog2(4rd2log(rdL/δ)+|C⋆|))

Here to arrive at the second line, we noticed that at the top levels, sensing on all of the blocks is a sharper bound than the one we computed which produces the first term. The second term comes from the fact that since we sense a constant number of blocks at each level, the budget is geometrically decreasing. ∎

In particular setting:

 α=m6nlog2(4rd2log(rdL/δ)+|C⋆|)

the budget for the adaptive phase is .

### c.2 The Passive Phase

In the passive phase, we need to compute two key quantities, (1) the energy of that remains in the span of and (2) the estimation error of the projection that we perform. Recall that the space we are interested in is and let be a basis for this subspace. Let denote the maximal clusters retained in the adaptive phase while denotes all of the maximal clusters. Throughout this section let .

Since is a subspace of we know that:

 ||\PcalU1C⋆||2≥∑M∈^\Mcal(|C⋆∩M|√|M|)2=∑M∈^\Mcal|M|

which means that (using Lemma C.1):

 E||(I−\PcalU)1C⋆||2 ≤ E∑M∉\Kcal|M|=∑M∈\Mcal|M|P[M∉\Kcal] ≤ ∑M∈\Mcal|M|Lexp{−1/2(√α|M|μ/σ−z)2} ≤ |C⋆|Lexp{−1/2(√α|Mmin|μ/σ−z)2}

Since is a constant, is also constant. If (this will be dominated by other restrictions on the SNR) then this expression is bounded by:

 ≤|C⋆|Lexp{−1/8α|Mmin|2μ2/σ2}

Applying Markov’s inequality, we have that with probability :

 ||(I−\PcalU)1C⋆||2≤|C⋆|Lδexp{−1/8α|Mmin|2μ2/σ2}

Now we study the passive sampling scheme. If where then:

 ^\xb=√1/βU\yb=\PcalU\xb+√1/βUϵ

So that:

 ||^\xb−\PcalU\xb||2=1β||Uϵ||2=1β||z||2

where is a

-dimensional Gaussian vector. Concentration results for Gaussian vectors (or Chi-squared distributions) show that there is a constant

such that for large enough with probability .

Putting these two bounds together gives us a high probability bound on the squared error (note that the cross term is zero since ):

 ||^\xb−\xb||2 ≤ ||^\xb−\PcalU\xb||2+||(I−\PcalU)\xb||2 ≤ cσ2|\Kcal|β+|C⋆|Lμ2δexp{−1/8α|Mmin|2μ2/σ2}

### c.3 Recovering C⋆

The error guarantee of the optimization phase is based on the following lemma: Let denote the solution to:

 argmaxC⊂[n]^\xbT1C||^\xb||√|C|

then if :

 d(^C,C⋆)≜1−|^C∩C⋆|√|^C||C⋆|≤4||^\xb−\xb||22μ2|C⋆|
###### Proof.

It is immediate from the definition of the -distance that:

 d(^C,C⋆)≜1−|^C∩C⋆|√|^C||C⋆|=12||1^C√|^C|−1C⋆√|C⋆|||2

By virtue of the fact that solves the optimization problem, we also know:

 ||1^C√|^C|−^\xb||^\xb||||2=2−2^\xbT1^C||^\xb||√|^C|≤||1C⋆√|C⋆|−^\xb||^\xb||||2

Which, when coupled with the first identity gives us:

This follows by adding and subtracting and applying the triangle inequality to . Call . Then we have:

 ||1C⋆√|C⋆|−^\xb||^\xb||||2 = 2−2cosθ ||\xb−^\xb||2 = ||\xb||2+||^\xb||2−2||\xb||||^\xb||cosθ ||\xb−^\xb||2||\xb||||^\xb|| = ||\xb||||^\xb||+||^\xb||||\xb||−2cosθ

On the right hand side is an expression of the form . This is for as it is in this case, and with this in mind we see that:

 d(^C,C⋆)≤2(2−2cosθ)≤2||\xb−^\xb||2||\xb||||^\xb||

Looking just at the denominator of the right hand side, we can lower bound by:

 ||\xb||||^\xb||≥||\xb||2−||\xb||||\xb−^\xb||≥12μ2|C⋆|

Where in the first step we used the triangle inequality, and in the second we used that and the assumption whereby . Plugging this in to the bound on concludes the proof. ∎

### c.4 Proof of Corollary 2.2

The proof of the corollary parallels that of the main theorem. In the adaptive phase, we instead show that with high probability we retain all clusters of size for some parameter . Then since we are not interested in recovering the smaller clusters, we can safely ignore the energy in that is orthogonal to . This means that the approximation error term from the previous proof can be ignored.

With probability we retain all clusters of size as long as:

 μσ≥1