DeepAI

# Iterative and greedy algorithms for the sparsity in levels model in compressed sensing

Motivated by the question of optimal functional approximation via compressed sensing, we propose generalizations of the Iterative Hard Thresholding and the Compressive Sampling Matching Pursuit algorithms able to promote sparse in levels signals. We show, by means of numerical experiments, that the proposed algorithms are successfully able to outperform their unstructured variants when the signal exhibits the sparsity structure of interest. Moreover, in the context of piecewise smooth function approximation, we numerically demonstrate that the structure promoting decoders outperform their unstructured variants and the basis pursuit program when the encoder is structure agnostic.

• 24 publications
• 14 publications
• 3 publications
02/20/2019

### Orthogonal Matching Pursuit with Tikhonov and Landweber Regularization

The Orthogonal Matching Pursuit (OMP) for compressed sensing iterates ov...
11/05/2017

### Stochastic Greedy Algorithms For Multiple Measurement Vectors

Sparse representation of a single measurement vector (SMV) has been expl...
12/21/2019

### Analysis of Optimal Thresholding Algorithms for Compressed Sensing

The optimal thresholding is a new technique that has recently been devel...
06/23/2020

### The benefits of acting locally: Reconstruction algorithms for sparse in levels signals with stable and robust recovery guarantees

The sparsity in levels model recently inspired a new generation of effec...
12/29/2017

### Sparse Polynomial Chaos Expansions via Compressed Sensing and D-optimal Design

In the field of uncertainty quantification, sparse polynomial chaos (PC)...
07/25/2012

### Optimal Sampling Points in Reproducing Kernel Hilbert Spaces

The recent developments of basis pursuit and compressed sensing seek to ...
08/23/2021

### Dynamic Orthogonal Matching Pursuit for Signal Reconstruction

Orthogonal matching pursuit (OMP) is one of the mainstream algorithms fo...

## 1 Introduction

In classical compressed sensing, one considers the recovery of an

-sparse vector

from noisy measurements . It is now well understood that classical sparsity – where the vector has at most nonzero components – is but one low-dimensional signal model, and that more sophisticated models may bring significant performance gains in practice [18, 7, 30]. Well known structured sparsity models include group or block sparsity, joint sparsity, weighted sparsity, connected tree sparsity and numerous others.

The focus of this paper is the so-called local sparsity in levels model[4]. In this model, a vector is divided in disjoint levels and separate sparsity allowed within each. Thus, one now has a vector of sparsities as opposed to a single sparsity . While simple, this model plays a crucial role in compressed sensing for imaging, where the local sparsities are typically related to the wavelet scales, and the target vector represents the approximately sparse wavelet coefficients of an image. The sparsity in levels model, and its corresponding compressed sensing theory, allows one to design sampling strategies which leverage the characteristic local sparsity structure of images (so-called asymptotic sparsity) thus giving significant practical benefits gains[29, 4, 5].

Imaging aside, the sparsity in levels model also arises naturally in other contexts. For instance, it can be used to model so-called sparse and distributed or sparse and balanced vectors, which occur in parallel acquisition problems[14, 15] and radar[17]. The specific case of two levels also arises in the sparse corruptions problem [2, 24].

The focus of past work on this model has been on convex optimization-based decoders such as Quadratically-Constrained Basis Pursuit (QCBP)

 minz∈CN∥z∥ℓ1 subject to ∥Az−y∥ℓ2≤η, (1)

or closely-related weighted variants[30], with the weights being used to promote the sparsity in levels structure. Both uniform[23] and nonuniform[4] recovery guarantees have been established for this decoder, with measurement conditions relating the number of measurements to the local sparsities . However, and perhaps surprisingly, no attention has been paid to alternatives to convex optimization – namely, greedy and iterative methods – despite these being quite widely used and studied both in the case of classical sparsity[20] and various other structured structured sparsity models[7].

In this paper, we introduce and study generalizations of the classical Iterative Hard Thresholding (IHT) and Compressive Sampling Matching Pursuit (CoSaMP) algorithms for the sparsity in levels model, known as IHT in Levels (IHTL) and CoSaMP in Levels (CoSaMPL) respectively. These generalizations are natural, and straightforward to implement. We then present a series of numerical experiments demonstrating the benefits of these decoders in the presence of sparsity in levels. Specifically, we highlight several settings in which promoting this additional structure leads to better recovery over the classical IHT and CoSaMP algorithms. The purpose of this paper is to establish this proof of concept. We defer a full theoretical analysis to an upcoming work. However, to provide some context, we do briefly discuss the relevant theoretical tools needed to analyze sparsity in levels and highlight existing results for the QCBP decoder.

This work was motivated by the question of optimal function approximation via compressed sensing[3]. Specifically, given a function class for which it is known the best -term approximation rate decays like for some , can one design compressed sensing measurements and a decoder so that the resulting approximation achieves an error? Moreover, can this be achieved by a black box, with a decoder that has polynomial runtime in ? The first question has been answered affirmatively for the class of piecewise -Hölder functions of one variable[3]. Note that sparsity in level is crucial for obtaining the optimal error rate. However, the decoder is based on a weighted minimization program[3], and thus does not give an affirmative answer to the second question. Striving for a true black box motivates one to consider non-optimization based approaches, such as the IHTL and CoSaMPL algorithms we introduce in this paper. With this motivation in mind, we conclude the paper with some experiments on approximation of piecewise smooth functions via compressed sensing, comparing the introduced algorithms to (1).

## 2 Classical compressed sensing

Recall that a vector is -sparse if it has at most nonzero entries: that is,

 |supp(x)|≤s,

where is the support of . Classical compressed sensing concerns the recovery of a sparse vector from noisy linear measurements

 y=Ax+e∈Cm,

where is the measurement matrix and is a noise vector.

### 2.1 IHT and CoSaMP

For a vector (not necessarily sparse), let be the index set of its largest entries in absolute value. The hard thresholding operator is, for , defined by

 Hs(x)=(Hs(x)i)Ni=1,Hs(x)i={xii∈Ls(x)0otherwise.

That is, is the vector of the largest entries of with all other entries set to zero. The classical Iterative Hard Thresholding (IHT) algorithm is now defined as follows:

Function
Inputs: , , sparsity
Initialization: (e.g. )
Iterate: Until some stopping criterion is met at , set

 x(n+1)=Hs(x(n)+A∗(y−Ax(n)))

Output:

The idea of the IHT algorithm is to combine the classical Landweber iteration[22] with the action of a hard thresholding operator to promote sparse solutions. Specifically, IHT corresponds to a Landweber iteration with constant unit step size followed by a pruning operation performed via hard thresholding. The power of IHT lies within the extreme simplicity and efficiency of its iteration.

The ancestors of IHT are the Iterated Shrkinage methods[19] and IHT was introduced in the context of compressed sensing in the late 2000s[9, 10]. In the first version of the IHT algorithm[9, 10] the step size of the Landweber iteration is constant with respect to the iteration. Accelerated versions of IHT with variable step size were introduced later.[11, 12] A generalization of IHT to the union of subspaces model was studied in the context of model-based compressed sensing.[7, 21]

The Compressive Sampling Matching Pursuity (CoSaMP) algorithm is:

Function
Inputs: , , sparsity
Initialization: (e.g. )
Iterate: Until some stopping criterion is met at , set

 U(n+1) =supp(x(n+1))∪L2s(A∗(y−Ax(n))) u(n+1) ∈argminz∈CN{∥y−Az∥2:supp(z)⊂U(n+1)} x(n+1) =Hs(u(n+1))

Output:

CoSaMP was proposed in the late 2000’s[26], inspired by the so-called regularized orthogonal matching pursuit algorithm.[27, 28] CoSaMP combines the principles of greedy (multiple) index selection and of orthogonal projection used in orthogonal matching pursuit with hard thresholding. It takes advantage of a greedy index selection principle typical of matching pursuit algorithms by looking for the columns of that are most correlated with the residual. After updating the support accordingly, CoSaMP performs a least-squares projection onto the active support followed by a hard thresholding to preserve sparsity. Similary to IHT, extensions of CoSaMP to the union of subspaces model were developed and analyzed in the context of model-based compressed sensing.[7, 21]

A potentially attractive feature of both the IHT and CoSaMP algorithms is that the reconstruction is exactly -sparse. This is not the case in general when is a minimizer of (1), i.e.

 ^x∈argminz∈CN∥z∥ℓ1 subject to ∥Az−y∥ℓ2≤η.

Note that IHT and CoSaMP require knowledge of , but no knowledge of the noise level , whereas (1) requires no knowledge of but knowledge of . In fact, although stable and robust recovery guarantees for QCBP can be shown for (under the restricted isometry property) and for (under the restricted isometry and the quotient properties), the optimal parameter tuning strategy for QCBP is .[13, 34]

### 2.2 Recovery guarantees

Much as with (1), recovery guarantees for IHT and CoSaMP are typically based on the RIP:

Let . The Restricted Isometry Constant (RIC) of a matrix is the smallest such that

 (1−δ)∥x∥2ℓ2≤∥Ax∥2ℓ2≤(1+δ)∥x∥2ℓ2,for all s-sparse x. (2)

If then is said to have the Restricted Isometry Property (RIP) of order .

Suppose that the -th RIC constant of satisfies . Then, for all , the sequence defined by with and satisfies, for any ,

 ∥x−x(n)∥ℓ1 ≤Cσs(x)ℓ1+D√s∥e∥2+2√sρn∥x∥2, ∥x−x(n)∥ℓ2 ≤C√sσs(x)ℓ1+D∥e∥2+ρn∥x∥2,

where , and are constants only depending on .

Suppose that the -th RIC constant of satisfies

 δ8s<√113−14≈0.478.

Then for , the sequence defined by with and , satisfies for any ,

 ∥x−x(n)∥ℓ1 ≤Cσs(x)ℓ1+D√s∥e∥2+2√sρn∥x∥2, ∥x−x(n)∥2 ≤C√sσs(x)ℓ1+D∥e∥2+2ρn∥x∥2,

where and are constants only depending on .

See [20, Thms. 6.21 & 6.28] respectively. Here is the -norm best -term approximation error:

 σs(x)ℓ1=min{∥x−z∥ℓ1:z is s-sparse}.

## 3 Compressed sensing with local structure

We now consider the local sparsity in levels model.

### 3.1 Local sparsity in levels

The sparsity in levels model divides a vector into separate levels, and then separately measures the sparsity within each one:

Let , , where and , where for , with . A vector is -sparse if

 ∣∣supp(x)∩{Mk−1+1,…,Mk}∣∣≤sk,k=1,…,r. (3)

We write for the set of -sparse vectors. We refer to as sparsity levels, and as local sparsities respectively.

In imaging applications, the levels typically correspond to wavelet scales, in which case the level has size roughly . In other applications, for instance, parallel acquisition problems, the levels are typically equally sized. In function approximation, as we will consider later, it is typical to consider an -sparse vector with a two level structure based on and . That is, the first coefficients are nonzero, and the remaining can be arbitrarily location within the indices .

### 3.2 Structured sampling or structured recovery

Having defined this structured sparsity model, we need a mechanism to exploit it. These fall into two broad categories. In structured sampling one seeks to design measurements to promote the given structure. Conversely, in structured recovery one designs a decoder to promote the structure. Note that the former is easily achieved, at least in theory. One simply constructs as the block diagonal matrix whose block , corresponding to the sparsity level, is an standard compressed sensing matrix. Recovery can then be achieved in a level-by-level manner using standard decoders. Of course, such a construction is generally not possible in practice, when the measurements are constrained by the physical sensing device (e.g. Fourier measurements in imaging applications).

### 3.3 The Restricted Isometry Property in Levels

In the sparsity in levels setting, the standard tool for establishing uniform recovery guarantees is the Restricted Isometry Property in Levels[8]:

Let be sparsity levels and be local sparsities. The Restricted Isometry Constant in Levels (RICL) of a matrix is the smallest such that

 (1−δ)∥x∥2ℓ2≤∥Ax∥2ℓ2≤(1+δ)∥x∥2ℓ2,∀x∈Σs,M. (4)

If then the matrix is said to have the Restricted Isometry Property in Levels (RIPL) of order .

Analogously to the classical setting where the RIP is used to guarantee recovery, the RIPL is sufficient for recovery with appropriate decoders. Specifically, if a matrix has the RIPL of suitable order, then stable and robust is ensured for the weighted QCBP decoder

 minz∈CN∥z∥ℓ1w subject to ∥Az−y∥ℓ2≤η,

where the weights are given by for [30]. Note that this is a type of structured decoder. One may also employ the unstructured unweighted QCBP decoder (1), provided the local sparsities do not differ too greatly, i.e. [8].

Measurement matrices that satisfy the RIPL can be readily designed. Note that a random matrix with independent normal entries having mean zero and variance

has the RIPL of order

with probability at least

, provided

 m≥Cδ−2(r∑k=1sklog(e(Mk−Mk−1)sk)+log(ϵ−1)).

Measurement conditions have also be shown for subsampled unitary matrices[23]. Specifically, let be unitary. Let be a vector of sampling levels, where , and be a vector of local numbers of measurements, where and . An -multilevel random sampling scheme is a set , where is defined as follows: if then , otherwise consists of values chosen uniformly and independently from . Consider the measurement matrix

 A=PΩDU∈Cm×N,

where is the row selector matrix, picking rows of corresponding to indices in , and is a diagonal scaling matrix with entry if . Then has the RIPL of order with probability at least , provided

 mk≥Cδ−2(Nk−Nk−1)(r∑k=1μk,lsl)(rlog2(s)log(m)log(N)+log(ϵ−1)),k=1,…,r.

Here , and is the coherence of the sublock of , defined as

 μk,l=maxNk−1

The main point is that one can use this guarantee, along with some understanding of the local coherences to design a sampling scheme that exploits the local sparsity in levels structure. An important instance of this setup is the case of Fourier sampling with wavelets[23], in which case corresponds to the frequencies sampled. Binary sampling with the Walsh–Hadamard transform has also been considered[6]. In both cases, designing the sampling scheme in this way to exploit the underlying structure can lead to significant benefits.[4, 29]

## 4 IHT and CoSaMP in Levels

We now introduce structured recovery algorithms for the sparsity in levels model, based on IHT and CoSaMP respectively.

### 4.1 Definitions

Fix sparsity levels . Note that any vector can be written uniquely as , where with . Now let be local sparsities. For , we write for the set

 Ls,M(x)=r⋃k=1Lsk(xk).

In other words, this is the index set consisting, in each level , of the largest absolute entries of in that level. With this in hand, we define the hard thresholding in levels operator by

 Hs,M(x)=(Hs,M(x)i)Ni=1,Hs,M(x)i={xii∈Ls,M(x)0otherwise,x=(xi)Ni=1∈CN.

That is, is the vector consisting of the largest entries of with all other entries set to zero.

The levels versions of the classical IHT and CoSaMP algorithms now follow simply by replacing the thresholding steps by the above levels versions. Specifically, IHT in Levels (IHTL) is defined by

Function
Inputs: , , local sparsities , sparsity levels
Initialization: (e.g. )
Iterate: Until some stopping criterion is met at , set

 x(n+1)=Hs,M(x(n)+A∗(y−Ax(n)))

Output:

and CoSaMP in Levels (CoSaMPL) is defined by

Function
Inputs: , , local sparsities , sparsity levels
Initialization: (e.g. )
Iterate: Until some stopping criterion is met at , set

 U(n+1) =supp(x(n+1))∪L2s,M(A∗(y−Ax(n))) u(n+1) ∈argminz∈CN{∥y−Az∥2:supp(z)⊂U(n+1)} x(n+1) =Hs,M(u(n+1))

Output:

### 4.2 Experiments

We now present a series of numerical experiments. Our aim is to demonstrate the benefits that the structure-promoting IHTL and CoSaMPL algorithms bring for sparse in levels vectors over the standard IHT and CoSaMP algorithms. To do this, we consider phase transition plots.

For each fixed total sparsity and number of measurements we generate an -sparse in levels vector of length with random support and unit normal random entries. Note that, as we shall see below, the local sparsities are related in some way to the total sparsity . We then compute its reconstruction using either IHT, IHTL, CoSaMP or CoSaMPL and calculate the relative error . This is repeated for 50 trials, and the empirical success probability calculated. A recovery is successful if . The measurement matrix

is a Gaussian random matrix (independent, normally distributed entries with mean zero and variance

). Each algorithm is halted when either the relative difference between and is less than a tolerance or if exceeds 1000 iterations. Moreover, we choose the initialization .

For IHT and IHTL, in order to obtain better performance we apply a rescaling by the factor and compute . Note that rescaling corresponds to changing the step size of the Landweber iteration before thresholding. We observe the convergence of IHT is guaranteed by the sufficient condition .[9]. Using random matrix theory,[33] it is possible to see that in the cases considered here with high probability, which leads to choosing the scaling factor . Other than this, our results consider vanilla versions of all algorithms: our goal is to examine the benefits of sparsity in levels over classical sparsity, rather than the intrinsic performance of the decoders themselves. Notice, as a general rule, that CoSaMP outperforms IHT.

Our first experiment, shown in Figure 1, considers the two-level case. The levels are chosen of equal size , where , and we use various different local sparsities. Namely, we consider , , and . As expected, when the local sparsities are there is no benefit to either IHTL or CoSaMPL over IHT or CoSaMP. However, as the local sparsities become more unbalanced one starts to see benefits. In the extreme case , CoSaMPL with achieves successful recovery with probability one from roughly 65 measurements, while CoSaMP requires roughly 90 measurements.

In Figure 2 we consider four levels, again equally-sized, with . We compare the IHT and CoSaMP algorithms with the IHTL and CoSaMPL algorithms. The local sparsities take the form , where , for different values of and . In this experiment we use two versions of IHTL and CoSaMPL, based on two levels or four levels. In the two levels algorithms we use the values and for the decoders. This is because a vector that is -sparse with the local sparsities is also -sparse. As one would expect, the 2-level algorithms give no benefit over the original (1-level) algorithms. Yet, as in the previous experiment, we see a significant benefit from the 4-level algorithms. This figure considers several fixed values of the total sparsity . In Figure 3 we give the full phase transitions for CoSaMP and CoSaMPL. We notice the significantly improved transition curve. Note that CoSaMPL achieves probability one recovery when for any . The reason for this is that in this case the levels either have no nonzero entries, or all their entries are nonzero. CoSaMPL exploits this structure, however CoSaMP cannot.

Next, in Figure 4 we consider a rather different setup. Here, given we consider two-level sparsity with levels taking the form , for some . In other words, the -sparse vectors that are generated are nonzero in their first entries, with the remaining entries being arbitrarily located among the indices . For succinctness we consider only CoSaMP in this experiment.

The purpose of this experiment is to model a typical scenario in compressed sensing with wavelet sparsifying transforms, where the first wavelet coefficients are ‘saturated’, i.e. all nonzero. We discuss this further in the next section. For now, however, we simply notice the benefits of CoSaMPL over standard CoSaMP. For example, if and of the coefficients are saturated, CoSaMP requires 33% more measurements to achieve successful recovery.

### 4.3 Function approximation via compressed sensing

Finally, we test the proposed IHTL and CoSaMPL algorithms in the context of function approximation via compressed sensing[3]. We aim to approximate a function . In this context, it is convenient to adopt the terminology of decoders and encoders. An encoder is a linear mapping corresponding to the measurement phase. A decoder is a mapping and corresponds to the recovery phase. We focus on the class of piecewise -Hölder functions, defined as the set of functions with a finite number of discontinuities and -Hölder continuous over the intervals of smoothness.

We compute an approximation of . Our goal is to find encoder-decoder pairs such that the approximation error decays at a rate as close as possible to the theoretically optimal .[25, 16, 3] Multilevel Fourier sampling sampling strategies have been recently showed to achieve a near-optimal approximation rate with for any , when combined with Daubechies’ wavelet approximation and with a decoder based on minimization[3] (more precisely, the so-called weighted square-root LASSO decoder[1]). This near-optimal result heavily relies on the sparsity in levels structure. In fact, the specific pattern of sparsities in levels of wavelet coefficients that leads to the optimal approximation error rate is captured by devising an ad hoc multilevel sampling strategy which saturates the lower frequencies bands and increasingly subsamples the higher ones.

The aim of the numerical experiments performed in this section is to investigate the different role played by structure in the encoder and in the decoder. Indeed, the existence of optimal and near-optimal encoder-decoder pairs has only been proved where the structure in levels is exploited in the encoder but not in the decoder. For this reason, we consider a structure-agnostic and a structure-promoting encoder based on random Gaussian sampling and on multilevel Fourier sampling, respectively. Let be the Haar wavelet basis and fix a truncation level and assume, for the sake of simplicity, and to be powers of 2 and . The two encoders are defined as follows:

Gaussian encoder (structure agnostic)

The action of “shuffles” the wavelet coefficients of via random Gaussian sampling. Namely, , where is an matrix whose entries are i.i.d. centered random Gaussian variables with mean and variance .

Fourier encoder (structure promoting)

It has the form , where

is the Fourier transform of

. The frequencies indices are sampled according to an -multilevel random sampling scheme (see Section 3.3), roughly defined as follows. The first samples are used to saturate the lowest dyadic frequency bands. The remaining samples are evenly divided among the higher dyadic frequency bands via subsampling.111More precisely, the set of frequency indices is partitioned into dyadic bands
Ordering the integers in as , and considering the first frequency bands, one obtains the vector of sampling levels defined by , for . The Fourier encoder saturates the first frequency bands, i.e., for with . In the higher bands , the local numbers of measurements are defined as
where, in the last frequency band, we let in order to reach a total budget of exactly measurements. In order to enforce symmetry, for every , we pick samples uniformly at random from the frequency semiband and we choose frequencies in the opposite semiband in a symmetric way.[3]

We test these two encoders when combined with five possible decoders. Namely, Basis Pursuit (BP) (i.e., (1) with ), IHT, CoSaMP, IHTL, and CoSaMPL. Of course, the first three decoders are structure agnostic and the last two ones are structure promoting. Since we want all the decoders to depend on only, we fix a relation between and their input parameters ( and , respectively). This leads to the definition of an auxiliary parameter such that

 s=round(m/C), (5)

which is used as input for IHT and CoSaMP. Moreover, we consider a two-level structure defined by

 M=(s/2,N),s=(s/2,s/2),

which are employed as input parameters for IHTL and CoSaMPL. To numerically solve BP, we utilize the function spg_bp from the Matlab toolbox SPGL1 [32, 31] with parameters , and . The IHT(L) and CoSaMP(L) are run with tolerance on the relative increment equal to and a maximum number of 1000 iterations. Moreover, we use a rescaling of for IHT(L), as in Section 4.2 and let .

We consider a piecewise smooth function with 10 discontinuities:

 f(x)=10∑i=1(−1)mod(i,5)xmod(i,3)sign(x−(1.3)i−9),0≤x≤1. (6)

Its plot is shown in Figure 5.

We compare all the encoder-decoder pairs for and for . We plot the relative error as a function of in Figures 6 and 7 for values of the auxiliary parameter . The results are averaged over 25 runs.

In all the experiments, adding the structure in levels to IHT or CoSaMP leads to improved or, in the worst case, comparable approximation accuracy. In particular, in the case of the structure-promoting Fourier encoder, neither IHTL nor CoSaMPL are able to outperform BP or IHT, but we observe that CoSaMPL is more robust than CoSaMP with respect to the choice of . This leads to an interesting conclusion: enforcing structure via the encoder and the decoder at the same time is seemingly redundant and does not lead to any additional benefit. We also note that CoSaMP and CoSaMPL are more sensitive to variations of the auxiliary parameter than IHT and IHTL. On the other hand, in the case of the structure-agnostic Gaussian encoder, we consistently witness the benefits of promoting the sparsity in levels structure in the decoder. Indeed, IHTL and CoSaMPL consistently outperform their unstructured variants and BP.

## 5 Conclusions and future work

We proposed two variants of the IHT and CoSaMP algorithms that promote sparse in levels signals, respectively called IHTL and CoSaMPL. Our numerical experiments show that IHTL and CoSaMPL outperform their unstructured variants when the unknown signal is sparse in levels and, especially, in the case where local sparsities are not uniformly distributed among the levels. The benefits of using a sparsity-in-levels decoder have also been shown in the case of function approximation via compressed sensing, which originally motivated this work. When a structure-promoting encoder based on multilevel Fourier sampling is employed, sparse-in-levels decoders are only able to achieve the same accuracy as

minimization, but not to outperform it. However, the CoSaMPL and IHTL decoders are able to outperform CoSaMP, IHT, and minimization when a structure-agnostic encoder based on random Gaussian sampling is employed.

The theoretical analysis of stable and robust recovery guarantees for IHTL and CoSaMPL will be presented in a subsequent paper. From the numerical viewpoint, open problems include, e.g., the study of adaptive strategies to update the step size in IHTL[11, 12] and devising recipes for the automatic choice of the auxiliary parameter used in (5). Moreover, a further topic of investigation is the generalization of other greedy and iterative methods, such as the orthogonal matching pursuit algorithm, to the sparsity in levels case.

###### Acknowledgements.

The authors extend their thanks to Kateryna Melnykova for useful suggestions and comments. S.B. acknowledges the support of the PIMS Postdoctoral Training Centre in Stochastics. This work was supported by the PIMS CRG in “High-dimensional Data Analysis” and by NSERC through grant R611675.

## References

• [1] B. Adcock, A. Bao, and S. Brugiapaglia (2019) Correcting for unknown errors in sparse high-dimensional function approximation. Numer. Math. (3), pp. 667–711. Cited by: §4.3.
• [2] B. Adcock, A. Bao, J. D. Jakeman, and A. Narayan (2018) Compressed sensing with sparse corruptions: Fault-tolerant sparse collocation approximations. SIAM/ASA J. Uncertain. Quantif. 6 (4), pp. 1424–1453. Cited by: §1.
• [3] B. Adcock, S. Brugiapaglia, and M. King–Roskamp (2019) Do log factors matter? on optimal wavelet approximation and the foundations of compressed sensing. arXiv:1905.10028. Cited by: §1, §4.3, §4.3, footnote 1.
• [4] B. Adcock, A. C. Hansen, C. Poon, and B. Roman (2017) Breaking the coherence barrier: a new theory for compressed sensing. Forum Math. Sigma 5. Cited by: §1, §1, §3.3.
• [5] B. Adcock, A. C. Hansen, and B. Roman (2015) The quest for optimal sampling: computationally efficient, structure-exploiting measurements for compressed sensing. In Compressed Sensing and Its Applications, Cited by: §1.
• [6] V. Antun, B. Adcock, and A. C. Hansen (2019) Uniform recovery in infinite-dimensional compressed sensing and applications to structured binary sampling. arXiv:1905.00126. Cited by: §3.3.
• [7] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hedge (2010) Model-based compressive sensing. IEEE Trans. Inform. Theory 56 (4), pp. 1982–2001. Cited by: §1, §1, §2.1, §2.1.
• [8] A. Bastounis and A. C. Hansen (2017) On the absence of uniform recovery in many real-world applications of compressed sensing and the restricted isometry property and nullspace property in levels. SIAM J. Imaging Sci. 10 (1), pp. 335–371. Cited by: §3.3, §3.3.
• [9] T. Blumensath and M. E. Davies (2008) Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, pp. 629–654. Cited by: §2.1, §4.2.
• [10] T. Blumensath and M. E. Davies (2009) Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27, pp. 265?274. Cited by: §2.1.
• [11] T. Blumensath and M. E. Davies (2010) Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4 (2). Cited by: §2.1, §5.
• [12] T. Blumensath (2012) Accelerated iterative hard thresholding. Signal Process. 92, pp. 752–756. Cited by: §2.1, §5.
• [13] S. Brugiapaglia and B. Adcock (2018) Robustness to unknown error in sparse regularization. IEEE Trans. Inform. Theory 64 (10), pp. 6638–6661. Cited by: §2.1.
• [14] I.–Y. Chun and B. Adcock (2017) Compressed sensing and parallel acquisition. IEEE Trans. Inform. Theory 63 (8), pp. 4860–4882. Cited by: §1.
• [15] I. Y. Chun and B. Adcock (2016) Optimal sparse recovery for multi-sensor measurements. In IEEE Inf. Theory Workshop (ITW) 2016, Cited by: §1.
• [16] R. A. DeVore, G. Kyriazis, D. Leviatan, and V. M. Tikhomirov (1993) Wavelet compression and nonlinear -widths. Adv. Comput. Math. 1 (2), pp. 197–214. Cited by: §4.3.
• [17] D. Dorsch and H. Rauhut (2016) Refined analysis of sparse mimo radar. J. Fourier Anal. Appl., pp. 1–45. Cited by: §1.
• [18] M. F. Duarte and Y. C. Eldar (2011) Structured compressed sensing: from theory to applications. IEEE Trans. Signal Process. 59 (9), pp. 4053–4085. Cited by: §1.
• [19] M. Elad, B. Matalon, J. Shtok, and M. Zibulevsky (2007) A wide-angle view at iterated shrinkage algorithms. In Wavelets XII, Vol. 6701, pp. 670102. Cited by: §2.1.
• [20] S. Foucart and H. Rauhut (2013) A mathematical introduction to compressive sensing. Birkhauser. Cited by: §1, §2.2.
• [21] C. Hegde, P. Indyk, and L. Schmidt (2015) Approximation algorithms for model-based compressive sensing. IEEE Trans. Inform. Theory 61 (9). Cited by: §2.1, §2.1.
• [22] L. Landweber (1951) An iterative formula for fredholm integrals of the first kind.. Am. J. Math. 73, pp. 615–624. Cited by: §2.1.
• [23] C. Li and B. Adcock (2019) Compressed sensing with local structure: uniform recovery guarantees for the sparsity in levels class. Appl. Comput. Harmon. Anal. 46, pp. 453––477. Cited by: §1, §3.3.
• [24] X. Li (2013) Compressed sensing and matrix completion with a constant proportion of corruptions. Constr. Approx. 37, pp. 73–99. Cited by: §1.
• [25] S. G. Mallat (2009) A Wavelet Tour of Signal Processing: The Sparse Way. 3 edition, Academic Press. Cited by: §4.3.
• [26] D. Needell and J. Tropp (2008) CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26 (3), pp. 301–321. Cited by: §2.1.
• [27] D. Needell and R. Vershynin (2009) Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Found. Comput. Math. 9 (3), pp. 317–334. Cited by: §2.1.
• [28] D. Needell and R. Vershynin (2010) Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Signal Process. 4 (2), pp. 310–316. Cited by: §2.1.
• [29] B. Roman, A. C. Hansen, and B. Adcock (2014) On asymptotic structure in compressed sensing. arXiv:1406.4178. Cited by: §1, §3.3.
• [30] Y. Traonmilin and R. Gribonval (2018) Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all. Appl. Comput. Harm. Anal. 45 (1), pp. 170–205. Cited by: §1, §1, §3.3.
• [31] E. van den Berg and M. P. Friedlander (2007-06) SPGL1: a solver for large-scale sparse reconstruction. Cited by: §4.3.
• [32] E. van den Berg and M. P. Friedlander (2008) Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31 (2), pp. 890–912. External Links: Cited by: §4.3.
• [33] R. Vershynin (2018)

High-dimensional probability: an introduction with applications in data science

.
Vol. 47, Cambridge University Press. Cited by: §4.2.
• [34] P. Wojtaszczyk (2010) Stability and instance optimality for Gaussian measurements in compressed sensing. Found. Comput. Math. 10 (1), pp. 1–13. External Links: Document, ISBN 1615-3375, ISSN 16153375 Cited by: §2.1.