Our goal in this paper is to estimate a linear transform
of a vectorfrom the observations
Here is a given sensing matrix, is a given matrix, and is the observation error; in this error, is an unknown nuisance known to belong to a given compact convex set symmetric w.r.t. the origin, and is random noise with known distribution .
We assume that the space where lives is represented as , so that a vector is a block vector: with blocks , .333We use MATLAB notation: is the horizontal concatenation of matrices of common height, while is the vertical concatenation of matrices of common width. All vectors are column vectors. In particular, with matrices , . While we do not assume that the vector is sparse in the usual sense, we do assume that the linear transform to be estimated is -block sparse, meaning that at most a given number, , of the blocks , , are nonzero.
The recovery routines we intend to consider are based on block- minimization, that is, the estimate of is , where is obtained by minimizing the norm over signals with “fitting,” in a certain precise sense, the observations . Above, are given in advance norms on the spaces where the blocks of take their values.
In the sequel we refer to the given in advance collection as the representation structure (r.s.). Given such a representation structure and a sensing matrix , our ultimate goal is to understand how well one can recover the -block-sparse transform by appropriately implementing block- minimization.
Related Compressed Sensing research
Our situation and goal form a straightforward extension of the usual sparsity/block sparsity framework of Compressed Sensing. Indeed, the standard representation structure with , , and , , leads to the standard Compressed Sensing setting—recovering a sparse signal from its noisy observations (1) via minimization. The case of nontrivial block structure and is generally referred to as block-sparse, and has been considered in numerous recent papers. Block-sparsity (with ) arises naturally (see, e.g., EldarMishali09 and references therein) in a number of applications such as multi-band signals, measurements of gene expression levels or estimation of multiple measurement vectors sharing a joint sparsity pattern. Several methods of estimation and selection extending the “plain” -minimization to block sparsity were proposed and investigated recently. Most of the related research focused so far on block regularization schemes—group Lasso recovery of the form
(here is the Euclidean norm of the block). In particular, the literature on “plain Lasso” (the case of ) has an important counterpart on group Lasso; see, for example, Bach08GroupLasso , Ben-HaimEldar10 , ChesneauHebiri08 , DuarteBajwaCalderbank11 , EldarKuppingerBolcskei10 , EldarMishali09 , GribonvalNielsen03 , HuangZhang10 , LiuZhang09 , MeiervandeGeerBuhlmann08 , NardiRinaldo08 , Obozinskietal11 , VikaloParvaresh07 , StojnicParvareshHassibi09 , YuanLin06 and the references therein. Another celebrated technique of sparse recovery, the Dantzig selector, originating from CandesTao07 , has also been extended to handle block-sparse structures JamesRadchenkoLv09 , groupDantzig10 . Most of the cited papers focus on bounding recovery errors in terms of the magnitude of the observation noise and “-concentration” of the true signal (the distance from the space of signals with at most nonzero blocks—the sum of magnitudes of all but the largest in magnitude blocks in ). Typically, these results rely on natural block analogy (“Block RIP;” see, e.g., EldarMishali09 ) of the celebrated Restricted Isometry Property introduced by Candés and Tao CandesTaorip05 , Candes08note or on block analogies Lounicietal10
of the Restricted Eigenvalue Property introduced inBickelRitovTsybakov08 . In addition to the usual (block)-sparse recovery, our framework also allows to handle group sparse recovery with overlapping groups by properly defining the corresponding matrix.
Contributions of this paper
The first (by itself, minor) novelty in our problem setting is the presence of the linear mapping . We are not aware of any preceding work handling the case of a “nontrivial” (i.e., different from the identity) . We qualify this novelty as minor, since in fact the case of a nontrivial can be reduced to the one of .444Assuming, for example, that is an “onto” mapping, we can treat as our signal, the observations being , where is the projector onto the orthogonal complement to the linear subspace in ; with , we have with an explicitly given matrix . However, “can be reduced” is not the same as “should be reduced,” since problems with nontrivial mappings arise in many applications. This is the case, for example, when
is the solution of a linear finite-difference equation with a sparse right-hand side (“evolution of a linear plant corrected from time to time by impulse control”), whereis the matrix of the corresponding finite-difference operator. Therefore, introducing adds some useful flexibility (and as a matter of fact costs nothing, as far as the theoretical analysis is concerned).
We believe, however, that the major novelty in what follows is the emphasis on verifiable conditions on matrix and the r.s. which guarantee good recovery of the transform from noisy observations of , provided that the transform in question is nearly -block sparse, and the observation noise is low. Note that such efficiently verifiable guarantees cannot be obtained from the “classical” conditions555Note that it has been recently proved in PfetschTillmann12 that computing the parameters involved in verification of Nullspace condition as well as RIP for sparse recovery is NP-hard. used when studying theoretical properties of block-sparse recovery (with a notable exception of the Mutual Block-Incoherence condition of EldarKuppingerBolcskei10 ). For example, given and , one cannot answer in any reasonable time if the (Block-) Restricted Isometry or Restricted Eigenvalue property holds with given parameters. While the efficient verifiability is by no means necessary for a condition to be meaningful and useful, we believe that verifiability has its value and is worthy of being investigated. In particular, it allows us to design new recovery routines with explicit confidence bounds for the recovery error and then optimize these bounds with respect to the method parameters. In this respect, the current work extends the results of JNCS , JKNCS , JNnoisy , where recovery of the “usual” sparse vectors was considered (in the first two papers—in the case of uncertain-but-bounded observation errors, and in the third—in the case of Gaussian observation noise). Specifically, we propose here new routines of block-sparse recovery which explicitly utilize a contrast matrix, a kind of “validity certificate,” and show how these routines may be tuned to attain the best performance bounds. In addition to this, verifiable conditions pave the way of efficiently designing sensing matrices which possess certifiably good recovery properties for block-sparse recovery (see JKNCSDesign for implementation of such an approach in the usual sparsity setting).
The main body of the paper is organized as follows: in Section 2 we formulate the block-sparse recovery problem and introduce our core assumption—a family of conditions , , which links the representation structure and sensing matrix with a contrast matrix . Specifically, given and and a norm , the condition on an contrast matrix requires such that
holds, where for and ,
where is the norm on defined as follows: we zero out all but the largest in magnitude entries in vector , and take the -norm of the resulting -sparse vector. Then, by restricting our attention to the standard representation structures, we study the relation between condition and the usual assumptions used to validate block-sparse recovery, for example, Restricted Isometry/Eigenvalue Properties and their block versions.
In Section 3 we introduce two recovery routines based on the norm:
regular recovery [cf. (block-) Dantzig selector]
where with probability, is an upper bound on the -norm of the observation error;
penalized recovery [cf. (block-) Lasso]
where is our guess for the number of nonvanishing blocks in the true signal .
Under condition , we establish performance guarantees of these recoveries, that is, explicit upper bounds on the size of confidence sets for the recovery error , . Our performance guarantees have the usual natural interpretation—as far as recovery of transforms with small -block concentration666-block concentration of a block vector is defined as . is concerned, everything is as if we were given the direct observations of contaminated by noise of small magnitude.
Similar to the usual assumptions from the literature, conditions are generally computationally intractable, nonetheless, we point out a notable exception in Section 4. When all block norms are , the condition , the strongest among our family of conditions, is efficiently verifiable. Besides, in this situation, the latter condition is “fully computationally tractable,” meaning that one can optimize efficiently the bounds for the recovery error over the contrast matrices satisfying to design optimal recovery routines. In addition to this, in Section 4.2, we establish an oracle inequality which shows that existence of the contrast matrix satisfying condition is not only sufficient but also necessary for “good recovery” of block-sparse signals in the -norm when .
In Section 5 we provide a verifiable sufficient condition for the validity of for general , assuming that is -r.s. [i.e., , ], and, in addition, . This sufficient condition can be used to build a “quasi-optimal” contrast matrix . We also relate this condition to the Mutual Block-Incoherence condition of EldarKuppingerBolcskei10 developed for the case of -r.s. with . In particular, we show in Section 5.4 that the Mutual Block-Incoherence is more conservative than our verifiable condition, and thus is “covered” by the latter. “Limits of performance” of our verifiable sufficient conditions are investigated in Section 5.3.
In Section 6 we describe a computationally cheap alternative to block- recoveries—a non-Euclidean Block Matching Pursuit (NEBMP) algorithm. Assuming that is either -, or -r.s. and that the verifiable sufficient condition is satisfied, we show that this algorithm (which does not require optimization) provides performance guarantees similar to those of regular/penalized recoveries.
We close by presenting a small simulation study in Section 7.
Proofs of all results are given in the supplementary article JKNP-suppl .
2 Problem statement
In the sequel, we deal with:
signals—vectors , and an sensing matrix ;
representations of signals—block vectors , and the representation matrix , ; the representation of a signal is the block vector with the blocks .
From now on, the dimension of is denoted by :
The factors of the representation space are equipped with norms ; the conjugate norms are denoted by . A vector from is called -block-sparse, if the number of nonzero blocks in is at most . A vector will be called -block-sparse, if its representation is so. We refer to the collection as the representation structure (r.s. for short). The standard r.s. is given by , , and , , and an -r.s. is the r.s. with , .
For , we call the number the magnitude of the th block in and denote by the representation vector obtained from by zeroing out all but the largest in magnitude blocks in (with the ties resolved arbitrarily). For and a representation vector , denotes the vector obtained from by keeping intact the blocks with and zeroing out all remaining blocks. For and , we denote by the -norm of the vector , so that is a norm on with the conjugate norm where . Given a positive integer , we set . Note that is a norm on . We define the -block concentration of a vector as .
Problem of interest
Given an observation
of unknown signal , we want to recover the representation of , knowing in advance that this representation is “nearly -block-sparse,” that is, the representation can be approximated by an -block-sparse one; the -error of this approximation, that is, the -block concentration, , will be present in our error bounds.
In (2) the term is the observation error; in this error, is an unknown nuisance known to belong to a given compact convex set symmetric w.r.t. the origin, and is random noise with known distribution .
We start with introducing the condition which will be instrumental in all subsequent constructions and results. Let a sensing matrix and an r.s. be given, and let be a positive integer, and . We say that a pair , where and is a norm on , satisfies the condition associated with the matrix and the r.s. , if
The following observation is evident:
Given and an r.s. , let satisfy . Then satisfies for all and . Besides this, if is a positive integer, satisfies . Furthermore, if satisfies , and , a positive integer , and are such that , then satisfies . In particular, when , the fact that satisfies implies that satisfies .
Relation to known conditions for the validity of sparse recovery
Note that whenever
is the standard r.s., the condition reduces to the condition introduced in JNnoisy . On the other hand, condition is closely related to other known conditions, introduced to study the properties of recovery routines in the context of block-sparsity. Specifically, consider an r.s. with , and let us make the following observation:
Let satisfy and let be the maximum of the Euclidean norms of columns in . Then
Let us fix the r.s. . Condition (4) with plays a crucial role in the performance analysis of the group-Lasso and Dantzig Selector. For example, the error bounds for Lasso recovery obtained in Lounicietal10 rely upon the Restricted Eigenvalue assumption as follows: there exists such that
In this case whenever , so that
Recall that a sensing matrix satisfies the Block Restricted Isometry Property (see, e.g., EldarMishali09 ) with and a positive integer if for every with at most nonvanishing blocks one has
Let satisfy for some and positive integer . Then:
The pair satisfies the condition associated with and the r.s. .
The pair satisfies the condition associated with and the r.s. .
Our last observation here is as follows: let satisfy for the r.s. given by , and let . Then satisfies for the r.s. given by .
3 Accuracy bounds for block recovery routines
Throughout this section we fix an r.s. and a sensing matrix .
3.1 Regular recovery
We define the regular recovery as
where the contrast matrix , the norm and are parameters of the construction.
Let be a positive integer, , . Assume that the pair satisfies the condition associated with and r.s. , and let
Then for all , and one has
The above result can be slightly strengthened by replacing the assumption that satisfies , , with a weaker, by Observation 2.1, assumption that satisfies with and satisfies with some (perhaps large) :
Given , r.s. , integer , and , assume that satisfies the condition with and the condition with some , and let be given by (8). Then for all , , and , it holds
3.2 Penalized recovery
The penalized recovery is
where , and a positive real are parameters of the construction.
Given , r.s. , integer , and , assume that satisfies the conditions and with and .
Let . Then for all , it holds for
In particular, with we have for
Let and be given by (8). Then for all , and all one has for
Let us compare the error bounds of the regular and the penalized recoveries associated with the same pair satisfying the condition with . Given , let
this is nothing but the smallest such that
[see (8)] and, thus, the smallest for which the error bound (3.1) for the regular recovery holds true with probability (or at least the smallest for which the latter claim is supported by Theorem 3.1). With , the regular recovery guarantees (and that is the best guarantee one can extract from Theorem 3.1) that
(!) For some set , , of “good” realizations of the random component of the observation error, one has
whenever , and .
The error bound (3.3) [where we can safely set , since implies ] says that (!) holds true for the penalized recovery with . The latter observation suggests that the penalized recovery associated with and is better than its regular counterpart, the reason being twofold. First, in order to ensure (!) with the regular recovery, the “built in” parameter of this recovery should be set to , and the latter quantity is not always easy to identify. In contrast to this, the construction of the penalized recovery is completely independent of a priori assumptions on the structure of observation errors, while automatically ensuring (!) for the error model we use. Second, and more importantly, for the penalized recovery the bound (3.2) is no more than the “worst, with confidence , case,” and the typical values of the quantity which indeed participates in the error bound (3.3) are essentially smaller than . Our numerical experience fully supports the above suggestion: the difference in observed performance of the two routines in question, although not dramatic, is definitely in favor of the penalized recovery. The only potential disadvantage of the latter routine is that the penalty parameter should be tuned to the level of sparsity we aim at, while the regular recovery is free of any guess of this type. Of course, the “tuning” is rather loose—all we need (and experiments show that we indeed need this) is the relation , so that a rough upper bound on will do; note, however, that the bound (3.3) deteriorates as grows.
4 Tractability of condition , -norm of the blocks
We have seen in Section 3 that given a sensing matrix and an r.s. such that the associated conditions are satisfiable, we can validate the recovery of nearly -block-sparse signals, specifically, we can point out -type recoveries with controlled (and small, provided so are the observation error and the deviation of the signal from an -block-sparse one). The bad news here is that, in general, condition , as well as other conditions for the validity of recovery, like Block RE/RIP, cannot be verified efficiently. The latter means that given a sensing matrix and a r.s. , it is difficult to verify that a given candidate pair satisfies condition associated with and . Fortunately, one can construct “tractable approximations” of condition , that is, verifiable sufficient conditions for the validity of . The first good news is that when all are the uniform norms and, in addition, [which, by Observation 2.1, corresponds to the strongest among the conditions and ensures the validity of (3.1) and (3.3) in the largest possible range of values of ], the condition becomes “fully computationally tractable.” We intend to demonstrate also that the condition is in fact necessary for the risk bounds of the form (3.1)–(14) to be valid when .
4.1 Condition : Tractability and the optimal choice of the contrast matrix
In the sequel, given and a matrix , we denote by the norm of the linear operator induced by the norms and on the argument and the image spaces:
We denote by the norm of the linear mapping induced by the norms , on the argument and on the image spaces. Further, stands for the transpose of the th row of and stands for th column of . Finally, is the -norm of the vector obtained from a vector by zeroing all but the largest in magnitude entries in .
Consider r.s. . We claim that in this case the condition becomes fully tractable. Specifically, we have the following.
Let a matrix , the r.s. , a positive integer and reals , be given.
Assume that a triple , where , is a norm on , and , is such that
(!) satisfies , and the set satisfies .
Given , , , one can find efficiently vectors in and block matrix (the blocks of are matrices) such that
(note that the matrix norm is simply the maximum -norm of the rows of ).
Whenever vectors and a matrix with blocks satisfy (4.1), the matrix , the norm on and form a triple satisfying .
Let a sensing matrix and a r.s. be given, along with a positive integer , an uncertainty set , a distribution of and . Theorems 3.1 and 3.3 say that if a triple is such that satisfies with and are such that the set given by (8) satisfies (16), then for the regular recovery associated with and for the penalized recovery associated with and , the following holds:
Proposition 4.1 states that when applying this result, we lose nothing by restricting ourselves with triples , , , which can be augmented by an appropriately chosen matrix to satisfy relations (4.1). In the rest of this discussion, it is assumed that we are speaking about triples satisfying the just defined restrictions.
The bound (4.1) is completely determined by two parameters— (which should be ) and ; the smaller are these parameters, the better are the bounds. In what follows we address the issue of efficient synthesis of matrices with “as good as possible” values of and .