# Accuracy guaranties for ℓ_1 recovery of block-sparse signals

We introduce a general framework to handle structured models (sparse and block-sparse with possibly overlapping blocks). We discuss new methods for their recovery from incomplete observation, corrupted with deterministic and stochastic noise, using block-ℓ_1 regularization. While the current theory provides promising bounds for the recovery errors under a number of different, yet mostly hard to verify conditions, our emphasis is on verifiable conditions on the problem parameters (sensing matrix and the block structure) which guarantee accurate recovery. Verifiability of our conditions not only leads to efficiently computable bounds for the recovery error but also allows us to optimize these error bounds with respect to the method parameters, and therefore construct estimators with improved statistical properties. To justify our approach, we also provide an oracle inequality, which links the properties of the proposed recovery algorithms and the best estimation performance. Furthermore, utilizing these verifiable conditions, we develop a computationally cheap alternative to block-ℓ_1 minimization, the non-Euclidean Block Matching Pursuit algorithm. We close by presenting a numerical study to investigate the effect of different block regularizations and demonstrate the performance of the proposed recoveries.

## Authors

• 15 publications
• 8 publications
• 12 publications
• 1 publication
07/04/2012

### On unified view of nullspace-type conditions for recoveries associated with general sparsity structures

We discuss a general notion of "sparsity structure" and associated recov...
06/11/2020

### The high-order block RIP for non-convex block-sparse compressed sensing

This paper concentrates on the recovery of block-sparse signals, which i...
06/11/2020

### Sparse recovery by reduced variance stochastic approximation

In this paper, we discuss application of iterative Stochastic Optimizati...
05/23/2017

### Exponential error rates of SDP for block models: Beyond Grothendieck's inequality

In this paper we consider the cluster estimation problem under the Stoch...
08/18/2020

### A Two Stage Generalized Block Orthogonal Matching Pursuit (TSGBOMP) Algorithm

Recovery of an unknown sparse signal from a few of its projections is th...
01/04/2012

### Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation

We examine the recovery of block sparse signals and extend the framework...
11/03/2021

### On role extraction for digraphs via neighbourhood pattern similarity

We analyse the recovery of different roles in a network modelled by a di...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### The problem

Our goal in this paper is to estimate a linear transform

of a vector

from the observations

 y=Ax+u+ξ. (1)

Here is a given sensing matrix, is a given matrix, and is the observation error; in this error, is an unknown nuisance known to belong to a given compact convex set symmetric w.r.t. the origin, and is random noise with known distribution .

We assume that the space where lives is represented as , so that a vector is a block vector: with blocks , .333We use MATLAB notation: is the horizontal concatenation of matrices of common height, while is the vertical concatenation of matrices of common width. All vectors are column vectors. In particular, with matrices , . While we do not assume that the vector is sparse in the usual sense, we do assume that the linear transform to be estimated is -block sparse, meaning that at most a given number, , of the blocks , , are nonzero.

The recovery routines we intend to consider are based on block- minimization, that is, the estimate of is , where is obtained by minimizing the norm over signals with “fitting,” in a certain precise sense, the observations . Above, are given in advance norms on the spaces where the blocks of take their values.

In the sequel we refer to the given in advance collection as the representation structure (r.s.). Given such a representation structure and a sensing matrix , our ultimate goal is to understand how well one can recover the -block-sparse transform by appropriately implementing block- minimization.

### Related Compressed Sensing research

Our situation and goal form a straightforward extension of the usual sparsity/block sparsity framework of Compressed Sensing. Indeed, the standard representation structure with , , and , , leads to the standard Compressed Sensing setting—recovering a sparse signal from its noisy observations (1) via minimization. The case of nontrivial block structure and is generally referred to as block-sparse, and has been considered in numerous recent papers. Block-sparsity (with ) arises naturally (see, e.g., EldarMishali09 and references therein) in a number of applications such as multi-band signals, measurements of gene expression levels or estimation of multiple measurement vectors sharing a joint sparsity pattern. Several methods of estimation and selection extending the “plain” -minimization to block sparsity were proposed and investigated recently. Most of the related research focused so far on block regularization schemes—group Lasso recovery of the form

 ˆx(y)∈Argminz=[z1;…;zK]∈Rn=Rn1×⋯×RnK{∥Az−y∥22+λK∑k=1∥∥z[k]∥∥2}

(here is the Euclidean norm of the block). In particular, the literature on “plain Lasso” (the case of ) has an important counterpart on group Lasso; see, for example, Bach08GroupLasso , Ben-HaimEldar10 , ChesneauHebiri08 , DuarteBajwaCalderbank11 , EldarKuppingerBolcskei10 , EldarMishali09 , GribonvalNielsen03 , HuangZhang10 , LiuZhang09 , MeiervandeGeerBuhlmann08 , NardiRinaldo08 , Obozinskietal11 , VikaloParvaresh07 , StojnicParvareshHassibi09 , YuanLin06 and the references therein. Another celebrated technique of sparse recovery, the Dantzig selector, originating from CandesTao07 , has also been extended to handle block-sparse structures JamesRadchenkoLv09 , groupDantzig10 . Most of the cited papers focus on bounding recovery errors in terms of the magnitude of the observation noise and “-concentration” of the true signal  (the distance from the space of signals with at most nonzero blocks—the sum of magnitudes of all but the largest in magnitude blocks in ). Typically, these results rely on natural block analogy (“Block RIP;” see, e.g., EldarMishali09 ) of the celebrated Restricted Isometry Property introduced by Candés and Tao CandesTaorip05 , Candes08note or on block analogies Lounicietal10

of the Restricted Eigenvalue Property introduced in

BickelRitovTsybakov08 . In addition to the usual (block)-sparse recovery, our framework also allows to handle group sparse recovery with overlapping groups by properly defining the corresponding matrix.

### Contributions of this paper

The first (by itself, minor) novelty in our problem setting is the presence of the linear mapping . We are not aware of any preceding work handling the case of a “nontrivial” (i.e., different from the identity) . We qualify this novelty as minor, since in fact the case of a nontrivial can be reduced to the one of .444Assuming, for example, that is an “onto” mapping, we can treat as our signal, the observations being , where is the projector onto the orthogonal complement to the linear subspace in ; with , we have with an explicitly given matrix . However, “can be reduced” is not the same as “should be reduced,” since problems with nontrivial mappings arise in many applications. This is the case, for example, when

is the solution of a linear finite-difference equation with a sparse right-hand side (“evolution of a linear plant corrected from time to time by impulse control”), where

is the matrix of the corresponding finite-difference operator. Therefore, introducing adds some useful flexibility (and as a matter of fact costs nothing, as far as the theoretical analysis is concerned).

We believe, however, that the major novelty in what follows is the emphasis on verifiable conditions on matrix and the r.s. which guarantee good recovery of the transform from noisy observations of , provided that the transform in question is nearly -block sparse, and the observation noise is low. Note that such efficiently verifiable guarantees cannot be obtained from the “classical” conditions555Note that it has been recently proved in PfetschTillmann12 that computing the parameters involved in verification of Nullspace condition as well as RIP for sparse recovery is NP-hard. used when studying theoretical properties of block-sparse recovery (with a notable exception of the Mutual Block-Incoherence condition of EldarKuppingerBolcskei10 ). For example, given and , one cannot answer in any reasonable time if the (Block-) Restricted Isometry or Restricted Eigenvalue property holds with given parameters. While the efficient verifiability is by no means necessary for a condition to be meaningful and useful, we believe that verifiability has its value and is worthy of being investigated. In particular, it allows us to design new recovery routines with explicit confidence bounds for the recovery error and then optimize these bounds with respect to the method parameters. In this respect, the current work extends the results of JNCS , JKNCS , JNnoisy , where recovery of the “usual” sparse vectors was considered (in the first two papers—in the case of uncertain-but-bounded observation errors, and in the third—in the case of Gaussian observation noise). Specifically, we propose here new routines of block-sparse recovery which explicitly utilize a contrast matrix, a kind of “validity certificate,” and show how these routines may be tuned to attain the best performance bounds. In addition to this, verifiable conditions pave the way of efficiently designing sensing matrices which possess certifiably good recovery properties for block-sparse recovery (see JKNCSDesign for implementation of such an approach in the usual sparsity setting).

The main body of the paper is organized as follows: in Section 2 we formulate the block-sparse recovery problem and introduce our core assumption—a family of conditions , , which links the representation structure and sensing matrix with a contrast matrix . Specifically, given and and a norm , the condition on an contrast matrix requires such that

 ∀(x∈Rn)Ls,q(Bx)≤s1/q∥∥HTAx∥∥+κs1/q−1L1(Bx)

holds, where for and ,

 Lp(w)=∥∥[∥∥w[1]∥∥(1);…;∥∥w[K]∥∥(K)]∥∥p

and

 Ls,p(w)=∥∥[∥∥w[1]∥∥(1);…;∥∥w[K]∥∥(K)]∥∥s,p,

where is the norm on defined as follows: we zero out all but the largest in magnitude entries in vector , and take the -norm of the resulting -sparse vector. Then, by restricting our attention to the standard representation structures, we study the relation between condition and the usual assumptions used to validate block-sparse recovery, for example, Restricted Isometry/Eigenvalue Properties and their block versions.

In Section 3 we introduce two recovery routines based on the norm:

• regular recovery [cf. (block-) Dantzig selector]

 ˆxreg(y)∈Argminz∈Rn{L1(Bz)\dvtx∥∥HT(y−Az)∥∥∞≤ρ},

where with probability

, is an upper bound on the -norm of the observation error;

• penalized recovery [cf. (block-) Lasso]

 ˆxpen(y)∈Argminz∈Rn[L1(Bz)+2s∥∥HT(y−Az)∥∥∞],

where is our guess for the number of nonvanishing blocks in the true signal .

Under condition , we establish performance guarantees of these recoveries, that is, explicit upper bounds on the size of confidence sets for the recovery error , . Our performance guarantees have the usual natural interpretation—as far as recovery of transforms with small -block concentration666-block concentration of a block vector is defined as . is concerned, everything is as if we were given the direct observations of contaminated by noise of small magnitude.

Similar to the usual assumptions from the literature, conditions are generally computationally intractable, nonetheless, we point out a notable exception in Section 4. When all block norms are , the condition , the strongest among our family of conditions, is efficiently verifiable. Besides, in this situation, the latter condition is “fully computationally tractable,” meaning that one can optimize efficiently the bounds for the recovery error over the contrast matrices satisfying to design optimal recovery routines. In addition to this, in Section 4.2, we establish an oracle inequality which shows that existence of the contrast matrix satisfying condition is not only sufficient but also necessary for “good recovery” of block-sparse signals in the -norm when .

In Section 5 we provide a verifiable sufficient condition for the validity of for general , assuming that is -r.s. [i.e., , ], and, in addition, . This sufficient condition can be used to build a “quasi-optimal” contrast matrix . We also relate this condition to the Mutual Block-Incoherence condition of EldarKuppingerBolcskei10 developed for the case of -r.s. with . In particular, we show in Section 5.4 that the Mutual Block-Incoherence is more conservative than our verifiable condition, and thus is “covered” by the latter. “Limits of performance” of our verifiable sufficient conditions are investigated in Section 5.3.

In Section 6 we describe a computationally cheap alternative to block- recoveries—a non-Euclidean Block Matching Pursuit (NEBMP) algorithm. Assuming that is either -, or -r.s. and that the verifiable sufficient condition is satisfied, we show that this algorithm (which does not require optimization) provides performance guarantees similar to those of regular/penalized recoveries.

We close by presenting a small simulation study in Section 7.

Proofs of all results are given in the supplementary article JKNP-suppl .

## 2 Problem statement

### Notation

In the sequel, we deal with:

• signals—vectors , and an sensing matrix ;

• representations of signals—block vectors , and the representation matrix , ; the representation of a signal is the block vector with the blocks .

From now on, the dimension of is denoted by :

 N=n1+⋯+nK.

The factors of the representation space are equipped with norms ; the conjugate norms are denoted by . A vector from is called -block-sparse, if the number of nonzero blocks in is at most . A vector will be called -block-sparse, if its representation is so. We refer to the collection as the representation structure (r.s. for short). The standard r.s. is given by , , and , , and an -r.s. is the r.s. with , .

For , we call the number the magnitude of the th block in and denote by the representation vector obtained from by zeroing out all but the largest in magnitude blocks in (with the ties resolved arbitrarily). For and a representation vector , denotes the vector obtained from by keeping intact the blocks with and zeroing out all remaining blocks. For and , we denote by the -norm of the vector , so that is a norm on with the conjugate norm where . Given a positive integer , we set . Note that is a norm on . We define the -block concentration of a vector as .

### Problem of interest

Given an observation

 y=Ax+u+ξ, (2)

of unknown signal , we want to recover the representation of , knowing in advance that this representation is “nearly -block-sparse,” that is, the representation can be approximated by an -block-sparse one; the -error of this approximation, that is, the -block concentration, , will be present in our error bounds.

In (2) the term is the observation error; in this error, is an unknown nuisance known to belong to a given compact convex set symmetric w.r.t. the origin, and is random noise with known distribution .

### Condition Qs,q(κ)

We start with introducing the condition which will be instrumental in all subsequent constructions and results. Let a sensing matrix and an r.s. be given, and let be a positive integer, and . We say that a pair , where and is a norm on , satisfies the condition associated with the matrix and the r.s. , if

 ∀x∈RnLs,q(Bx)≤s1/q∥∥HTAx∥∥+κs1/q−1L1(Bx). (3)

The following observation is evident:

###### Observation 2.1

Given and an r.s. , let satisfy . Then satisfies for all and . Besides this, if is a positive integer, satisfies . Furthermore, if satisfies , and , a positive integer , and are such that , then satisfies . In particular, when , the fact that satisfies implies that satisfies .

### Relation to known conditions for the validity of sparse ℓ1 recovery

Note that whenever

 S=(B,n1,…,nK,∥⋅∥(1),…,∥⋅∥(K))

is the standard r.s., the condition reduces to the condition introduced in JNnoisy . On the other hand, condition is closely related to other known conditions, introduced to study the properties of recovery routines in the context of block-sparsity. Specifically, consider an r.s. with , and let us make the following observation:

Let satisfy and let be the maximum of the Euclidean norms of columns in . Then

 ∀x∈RnLs,q(x)≤ˆλs1/q∥Ax∥2+κs1/q−1L1(x). (4)

Let us fix the r.s. . Condition (4) with plays a crucial role in the performance analysis of the group-Lasso and Dantzig Selector. For example, the error bounds for Lasso recovery obtained in Lounicietal10 rely upon the Restricted Eigenvalue assumption as follows: there exists such that

 L2(xs)≤1ϰ∥Ax∥2% whenever 3L1(xs)≥L1(x−xs).

In this case whenever , so that

 ∀x∈RnLs,1(x)≤s1/2ϰ∥Ax∥2+14L1(x), (5)

which is exactly (4) with , and (observe that (5) is nothing but the “block version” of the Compatibility condition from BuhlmannvandeGeer09 ).

Recall that a sensing matrix satisfies the Block Restricted Isometry Property (see, e.g., EldarMishali09 ) with and a positive integer if for every with at most nonvanishing blocks one has

 (1−δ)∥x∥22≤xTATAx≤(1+δ)∥x∥22. (6)
###### Proposition 2.1

Let satisfy for some and positive integer . Then:

The pair satisfies the condition associated with and the r.s. .

The pair satisfies the condition associated with and the r.s. .

Our last observation here is as follows: let satisfy for the r.s. given by , and let . Then satisfies for the r.s. given by .

## 3 Accuracy bounds for ℓ1 block recovery routines

Throughout this section we fix an r.s. and a sensing matrix .

### 3.1 Regular ℓ1 recovery

We define the regular recovery as

 ˆxreg(y)∈Argminu{L1(Bu)\dvtx∥∥HT(Au−y)∥∥≤ρ}, (7)

where the contrast matrix , the norm and are parameters of the construction.

###### Theorem 3.1

Let be a positive integer, , . Assume that the pair satisfies the condition associated with and r.s. , and let

 Ξ=Ξρ,U={ξ\dvtx∥∥HT(u+ξ)∥∥≤ρ ∀u∈U}. (8)

Then for all , and one has

 Lp(B[ˆxreg(Ax+u+ξ)−x]) (9) ≤4(2s)1/p1−2κ[ρ+12sL1(Bx−[Bx]s)],1≤p≤q.

The above result can be slightly strengthened by replacing the assumption that satisfies , , with a weaker, by Observation 2.1, assumption that satisfies with and satisfies with some (perhaps large) :

###### Theorem 3.2

Given , r.s. , integer , and , assume that satisfies the condition with and the condition with some , and let be given by (8). Then for all , , and , it holds

 Lp(B[ˆxreg(Ax+u+ξ)−x]) (10) ≤4(2s)1/p[1+κ−ϰ]q(p−1)/(p(q−1))1−2ϰ[ρ+L1(Bx−[Bx]s)2s].

### 3.2 Penalized ℓ1 recovery

The penalized recovery is

 ˆxpen(y)∈Argminu{L1(Bu)+λ∥∥HT(Ax−y)∥∥}, (11)

where , and a positive real are parameters of the construction.

###### Theorem 3.3

Given , r.s. , integer , and , assume that satisfies the conditions and with and .

Let . Then for all , it holds for

 Lp(B[ˆxpen(y)−x]) ≤4λ1/p1−2ϰ[1+κλ2s−ϰ]q(p−1)/(p(q−1)) (12) ×[∥∥HT(Ax−y)∥∥+12sL1(Bx−[Bx]s)].

In particular, with we have for

 Lp(B[ˆxpen(y)−x]) ≤4(2s)1/p1−2ϰ[1+κ−ϰ]q(p−1)/(p(q−1)) (13) ×[∥∥HT(Ax−y)∥∥+12sL1(Bx−[Bx]s)].

Let and be given by (8). Then for all , and all one has for

 λ ≥ 2s⇒Lp(B[ˆxpen(Ax+u+ξ)−x]) (14) ≤4λ1/p1−2ϰ[1+κλ2s−ϰ]q(p−1)/(p(q−1)) ×[ρ+12sL1(Bx−[Bx]s)], λ = 2s⇒Lp(B[ˆxpen(Ax+u+ξ)−x]) ≤4(2s)1/p1−2ϰ[1+κ−ϰ]q(p−1)/(p(q−1)) ×[ρ+12sL1(Bx−[Bx]s)].

#### Discussion

Let us compare the error bounds of the regular and the penalized recoveries associated with the same pair satisfying the condition with . Given , let

 ρε[H,∥⋅∥]=min{ρ\dvtxProb{ξ\dvtx∥∥HT(u+ξ)∥∥≤ρ ∀u∈U}≥1−ε}; (15)

this is nothing but the smallest such that

 Prob(ξ∈Ξρ,ε)≥1−ε (16)

[see (8)] and, thus, the smallest for which the error bound (3.1) for the regular recovery holds true with probability (or at least the smallest for which the latter claim is supported by Theorem 3.1). With , the regular recovery guarantees (and that is the best guarantee one can extract from Theorem 3.1) that

(!) For some set , , of “good” realizations of the random component of the observation error, one has

 Lp(B[ˆx(Ax+u+ξ)−x]) (17) ≤4(2s)1/p1−2κ[ρε[H,∥⋅∥]+L1(Bx−[Bx]s)2s],1≤p≤q,

whenever , and .

The error bound (3.3) [where we can safely set , since implies ] says that (!) holds true for the penalized recovery with . The latter observation suggests that the penalized recovery associated with and is better than its regular counterpart, the reason being twofold. First, in order to ensure (!) with the regular recovery, the “built in” parameter of this recovery should be set to , and the latter quantity is not always easy to identify. In contrast to this, the construction of the penalized recovery is completely independent of a priori assumptions on the structure of observation errors, while automatically ensuring (!) for the error model we use. Second, and more importantly, for the penalized recovery the bound (3.2) is no more than the “worst, with confidence , case,” and the typical values of the quantity which indeed participates in the error bound (3.3) are essentially smaller than . Our numerical experience fully supports the above suggestion: the difference in observed performance of the two routines in question, although not dramatic, is definitely in favor of the penalized recovery. The only potential disadvantage of the latter routine is that the penalty parameter should be tuned to the level of sparsity we aim at, while the regular recovery is free of any guess of this type. Of course, the “tuning” is rather loose—all we need (and experiments show that we indeed need this) is the relation , so that a rough upper bound on will do; note, however, that the bound (3.3) deteriorates as grows.

## 4 Tractability of condition Qs,∞(κ), ℓ∞-norm of the blocks

We have seen in Section 3 that given a sensing matrix and an r.s. such that the associated conditions are satisfiable, we can validate the recovery of nearly -block-sparse signals, specifically, we can point out -type recoveries with controlled (and small, provided so are the observation error and the deviation of the signal from an -block-sparse one). The bad news here is that, in general, condition , as well as other conditions for the validity of recovery, like Block RE/RIP, cannot be verified efficiently. The latter means that given a sensing matrix and a r.s. , it is difficult to verify that a given candidate pair satisfies condition associated with and . Fortunately, one can construct “tractable approximations” of condition , that is, verifiable sufficient conditions for the validity of . The first good news is that when all are the uniform norms and, in addition, [which, by Observation 2.1, corresponds to the strongest among the conditions and ensures the validity of (3.1) and (3.3) in the largest possible range of values of ], the condition becomes “fully computationally tractable.” We intend to demonstrate also that the condition is in fact necessary for the risk bounds of the form (3.1)–(14) to be valid when .

### 4.1 Condition Qs,∞(κ): Tractability and the optimal choice of the contrast matrix H

#### Notation

In the sequel, given and a matrix , we denote by the norm of the linear operator induced by the norms and on the argument and the image spaces:

 ∥M∥r,θ=maxu\dvtx∥u∥r≤1∥Mu∥θ.

We denote by the norm of the linear mapping induced by the norms , on the argument and on the image spaces. Further, stands for the transpose of the th row of and stands for th column of . Finally, is the -norm of the vector obtained from a vector by zeroing all but the largest in magnitude entries in .

#### Main result

Consider r.s. . We claim that in this case the condition becomes fully tractable. Specifically, we have the following.

###### Proposition 4.1

Let a matrix , the r.s. , a positive integer and reals , be given.

Assume that a triple , where , is a norm on , and , is such that

(!) satisfies , and the set satisfies .

Given , , , one can find efficiently vectors in and block matrix (the blocks of are matrices) such that

 {(a)}B=VB+[h1,…,hN]TA, {(b)}∥∥Vkℓ∥∥∞,∞≤s−1κ∀k,ℓ≤K, (18)

(note that the matrix norm is simply the maximum -norm of the rows of ).

Whenever vectors and a matrix with blocks satisfy (4.1), the matrix , the norm on and form a triple satisfying .

#### Discussion

Let a sensing matrix and a r.s. be given, along with a positive integer , an uncertainty set , a distribution of and . Theorems 3.1 and 3.3 say that if a triple is such that satisfies with and are such that the set given by (8) satisfies (16), then for the regular recovery associated with and for the penalized recovery associated with and , the following holds:

 ∀(x∈Rn,u∈U,ξ∈Ξ) (19) \eqntext1≤p≤∞. (20)

Proposition 4.1 states that when applying this result, we lose nothing by restricting ourselves with triples , , , which can be augmented by an appropriately chosen matrix to satisfy relations (4.1). In the rest of this discussion, it is assumed that we are speaking about triples satisfying the just defined restrictions.

The bound (4.1) is completely determined by two parameters— (which should be ) and ; the smaller are these parameters, the better are the bounds. In what follows we address the issue of efficient synthesis of matrices with “as good as possible” values of and .

Observe first that and should admit an extension by a matrix to a solution of the system of convex constraints (4.1)(a), (4.1)(b). In the case of the best choice of , given , is

 ρ=maxiμU(hi)where μU(h)=maxu∈UuTh.

Consequently, in this case the “achievable pairs” , form a computationally tractable convex set

 Gs = {(κ,ρ)\dvtx∃H=[h1,…,hN]∈Rm×N,