# On unified view of nullspace-type conditions for recoveries associated with general sparsity structures

We discuss a general notion of "sparsity structure" and associated recoveries of a sparse signal from its linear image of reduced dimension possibly corrupted with noise. Our approach allows for unified treatment of (a) the "usual sparsity" and "usual ℓ_1 recovery," (b) block-sparsity with possibly overlapping blocks and associated block-ℓ_1 recovery, and (c) low-rank-oriented recovery by nuclear norm minimization. The proposed recovery routines are natural extensions of the usual ℓ_1 minimization used in Compressed Sensing. Specifically we present nullspace-type sufficient conditions for the recovery to be precise on sparse signals in the noiseless case. Then we derive error bounds for imperfect (nearly sparse signal, presence of observation noise, etc.) recovery under these conditions. In all of these cases, we present efficiently verifiable sufficient conditions for the validity of the associated nullspace properties.

There are no comments yet.

## Authors

• 15 publications
• 8 publications
• 12 publications
06/19/2016

### Tight Performance Bounds for Compressed Sensing With Group Sparsity

Compressed sensing refers to the recovery of a high-dimensional but spar...
11/10/2011

### Accuracy guaranties for ℓ_1 recovery of block-sparse signals

We introduce a general framework to handle structured models (sparse and...
04/05/2018

### Sparse Recovery of Fusion Frame Structured Signals

Fusion frames are collection of subspaces which provide a redundant repr...
10/19/2020

### Sparse Recovery Analysis of Generalized J-Minimization with Results for Sparsity Promoting Functions with Monotonic Elasticity

In this paper we theoretically study exact recovery of sparse vectors fr...
11/22/2013

### Finding sparse solutions of systems of polynomial equations via group-sparsity optimization

The paper deals with the problem of finding sparse solutions to systems ...
12/29/2015

### Error Bounds for Compressed Sensing Algorithms With Group Sparsity: A Unified Approach

In compressed sensing, in order to recover a sparse or nearly sparse vec...
03/03/2019

### Deterministic Analysis of Weighted BPDN With Partially Known Support Information

In this paper, with the aid of the powerful Restricted Isometry Constant...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We address the problem of recovering a representation of an unknown signal via noisy observations

 y=Ax+ξ

of . Here are Euclidean spaces, and are given linear sensing and representation maps, and is “uncertain-but-bounded” observation error satisfying ( is a given norm on , and is a given error bound). We consider, for instance, the recovering routine of the form

 y↦ˆx(y)∈Argminu∈X{∥Bu∥:ϕ(Au−y)≤ϵ}↦ˆw(y)=Bˆx(y),

and we want this recovery to behave well provided that is sparse in some prescribed sense. In this note, we introduce a rather general notion of sparsity structure on the representation space which, under some structural restriction on the norm , allows to point out “nullspace type” conditions for the recovery to be precise provided that is “-sparse” with respect to our structure. It also allows for explicit error bounds for “imperfect recovery” (noisy observations, near -sparsity instead of the exact one, etc.) The motivation behind this construction is to present a simple unified general framework, which allows, for instance, for a simple treatment of three important particular cases:

• recovering -sparse in the usual sense (at most nonzero entries) signals via minimization (the corresponding nullspace property goes back to [4, 13])

• recovering -block-sparse signals via block- minimization, and

• recovering matrices of low rank via nuclear norm minimization.

We present the respective sparsity structures and provide verifiable sufficient conditions for the validity of associated nullspace properties (and thus – for the validity of the corresponding recovery routines); the prototypes of our verifiable conditions can be found in [5, 6, 7, 8].

## 2 Problem description and recovery routines

### 2.1 Situation

Let , be Euclidean spaces, be a linear sensing map, and be a linear representation map. We are interested in the problem as follows:

(!) Given a noisy observation

 y=Ax+ξ (1)

of an unknown signal , we want to recover the representation of .

#### Sparsity structure

We will focus on the case when a priori information on is expressed in terms of properly defined sparsity of the representation of . To this end, we define a sparsity structure on , specifically, as follows:

Let be a norm on , be the conjugate norm, and be a family of linear maps of into itself, such that

• Every is a projector: .

• Every is assigned a nonnegative weight and a linear map on such that ;

• Whenever and , one has

 ∥P∗f+¯¯¯¯P∗g∥∗≤max[∥f∥∗,∥g∥∗], (2)

where for a linear map acting from a Euclidean space into a Euclidean space , is the conjugate mapping acting from to .

A collection of the just introduced entities satisfying the requirements A.1-3 will be referred to as a sparsity structure on .

From now on, given a sparsity structure, we set for a nonnegative real

 Ps={P∈P:ν(P)≤s}.

Given

, we call a vector

-sparse, if there exists such that and . We call a vector as -sparse, if so is its representation .

We are about to present some instructive examples of the just outlined situation. Given a finite set , we denote by the space of real vectors with entries indexed by elements of and equip this space with the standard inner product. For a subset , we set , and refer to as a coordinate subspace of . By , , we denote the coordinate projector – the natural projector of onto .

### 2.2 Examples

#### Example I.a: ℓ1 recovery

In this example,

• with the standard inner product;

• The representation map is the identity: ;

• is comprised of projectors onto all coordinate subspaces of , ;

• ;

• .

Properties A.1-3 clearly take place, -sparsity of as defined above is the usual sparsity (at most nonzero entries), and (3) is the usual recovery.

#### Example I.b: Group ℓ1 recovery

Now we want to model the “block-sparse” situation as follows. , and the index set is represented as the union of nonempty (and possibly overlapping) subsets , so that to every one can associate blocks , , which are natural projections of onto . Assuming the subsets to be assigned with positive weights , we define the sparsity of a signal as the “weighted number of nonzero blocks ,” that is, the quantity . This can be modeled by the following representation and sparsity structure:

• , so that is a block vector with , and ;

• is comprised of orthoprojectors onto the subspaces , associated with subsets of the index set , and , where Id stands for the identity mapping on ;

• ;

• The norm is defined as follows. For every , we equip with a norm , and set

 ∥[z1;...;zK]∥=K∑ℓ=1∥zℓ∥(ℓ)

Verification of A.1-A.3 is immediate. Note that when do not overlap and for all , we find ourselves in the standard block-sparse situation: can be naturally identified with , making the identity, vectors from are split into non-overlapping blocks, the norm is the block- norm, and -sparsity of means that has at most nonzero blocks.

#### Example II: Nuclear norm recovery

In this example,

• with and the Frobenius inner product, is the identity;

• is comprised of the mappings , where and are orthoprojectors, and ;111To avoid notational ambiguity, in the situation of Example II we denote the image of under a mapping by rather than by , to avoid collision with the notation for matrix products like .

• is the nuclear norm: , where

are the singular values of

.

The assumptions A.1-3 clearly take place.

To verify A.3, observe that the norm conjugate to is the spectral norm . We now have

 ∥f∥2,2≤1,  ∥g∥2,2≤1 ⇒ ∥P∗(f)∥2,2=∥P\tiny leftfP\tiny right∥2,2≤1,  ∥¯¯¯¯P∗(g)∥2,2=∥(Ip−P\tiny left% )g(Iq−P\tiny right)∥2,2≤1 ⇒ ∥P∗(f)+¯¯¯¯P∗(g)∥2,2≤1,

where the last follows from the fact that the orthogonal complements to the kernels of and , same as the images of these matrices, are orthogonal to each other.

In this case, by assigning the projectors with weights according to

 ν(P)=max[\rm Rank(P\tiny left),\rm Rank(P% \tiny right)],

we arrive at the situation where -sparse signals are exactly the matrices of rank .

## 3 Main results

### 3.1 Recovery routines

Let be a norm on the image space of , and let be an a priori upper bound on the -norm of the observation error, see (1). In order to recover the representation of signal underlying the observation (1), we use regular recovery – the “standard” recovery by -minimization as follows:

 y↦ˆx\rm\scriptsize reg(y)∈Argminu∈X{∥Bu∥:ϕ(Au−y)≤ϵ}↦ˆw% \rm\scriptsize reg(y)=Bˆx\rm\scriptsize reg(y) (3)

and we treat

as the resulting estimate of

.

We say that the sensing map is -good if the above recovery in the noiseless case () reproduces exactly the representation of every -sparse signal .

We also consider an alternative to (3), specifically, the penalized recovery introduced in [7, 8]. This recovery routine is given by

 y↦ˆx\rm\scriptsize pen(y)∈Argminu∈X{∥Bu∥+λϕ(Au−y)}↦ˆw\rm% \scriptsize pen(y)=Bˆx\rm\scriptsize pen(y) (4)

where is a penalty parameter.

### 3.2 s-goodness and nullspace property

We start with the following immediate observation:

###### Lemma 3.1

In the situation of section 2.1, let , be such that . Then for every one has

 ∥w+Bz∥≥∥w∥+∥¯¯¯¯PBz∥−∥PBz∥. (5)

In particular, if the following “nullspace property” holds true:

 ∀P∈Ps,z∈\rm Ker(A),Bz≠0: ∥¯¯¯¯PBz∥>∥PBz∥, (6)

then is -good.

Proof. Let and be such that . We have

 ∀(f,g:max{∥f∥∗,∥g∗}≤1):  ∥P∗f+¯¯¯¯P∗g∥∗≤1   \ [by A.3]

which implies

 ⇒∥w+Bz∥ ≥∥w+Bz∥∥P∗f+¯¯¯¯P∗g∥∗ ≥⟨P∗f+¯¯¯¯P∗g,w+Bz⟩=⟨f,Pw+PBz⟩+⟨g,¯¯¯¯Pw+¯¯¯¯PBz⟩ ≥⟨f,w⟩+⟨f,PBz⟩+⟨g,¯¯¯¯PBz⟩   \ [since w=Pw and therefore ¯¯¯¯Pw=¯¯¯¯PPw=0] ≥⟨f,w⟩−∥PBz∥+⟨g,¯¯¯¯PBz⟩.

When choosing to be such that and , , we get

 ∥w+Bz∥≥∥w∥−∥PBz∥+∥¯¯¯¯PBz∥,

as claimed in (5).

Now let (6) take place, and let us prove that is -good. All we need to verify is that if is -sparse and , see (3), then . Setting , so that , choosing such that and and applying (5) with , we get

 ∥Bˆx∥=∥Bx+Bz∥≥∥Bx∥+∥¯¯¯¯PBz∥−∥PBz∥,

while by the origin of we have . It follows that , which, by the nullspace property (6), is possible only when (recall that and ).

###### Remark 3.1

Independently of any assumptions on , an evident necessary condition for -goodness of is:

Whenever , , are such that , and , , there exists , , such that and ,

When modifying this condition by replacing in the conclusion with , this necessary condition for -goodness becomes sufficient. This, by the way, immediately implies the necessity and sufficiency of the standard nullspace property [4, 13] for the validity of minimization and the translation of it to the matrix case given by [9, 10]. Moreover, recently, [2] has established the sufficiency of this condition in the case of decomposable norms, i.e., sparse, non-overlapping block-sparse and low-rank recovery following a unified view based on subdifferentials.

### 3.3 Error bounds for imperfect ∥⋅∥ recovery

#### Conditions Cs(γ,β;ϕ) and C+s(γ,β;ϕ).

In the sequel, we shall use the following two conditions (where , , and is a norm on the image space of ):

:

 ∀(z∈X,P∈Ps):  ∥PBz∥+∥Bz∥−∥¯¯¯¯PBz∥≤βϕ(Az)+γ∥Bz∥

(from now on, and when ).

: there exists a (semi)norm on such that

 (a)∀(z∈X,P∈Ps):  ∥PBz∥+∥Bz∥−∥¯¯¯¯PBz∥≤∥Bz∥Ps(b)∀z∈X:  ∥Bz∥Ps≤βϕ(Az)+γ∥Bz∥. (7)

Let us make the following immediate observations:

###### Remark 3.2

(i) The validity of condition with some , and some , implies the validity of the nullspace property (6);

(ii) implies ;

(iii) The (semi)norm satisfies (7.) (since by Triangle inequality we have );

(iv) Whenever , and one has , and similarly for the condition ;

There exists a (semi)norm on such that

 (a) ∀(z∈X,P∈Ps):  ∥PBz∥+∥Bz∥−∥¯PBz∥≤∥Bz∥Ps (b) ∀z∈\rm Ker(A):  ∥Bz∥Ps≤γ∥Bz∥.

if this condition is satisfied, then for every there exists such that the condition is satisfied.

#### Error bounds for imperfect regular ∥⋅∥ recovery

are stated in the following

###### Proposition 3.1

In the situation of section 2.1, let a sparsity level be given, and let the condition take place for some and . Given a signal along with its observation , where , let be “nearly -sparse” and be “nearly ,” specifically, for given nonnegative tolerances , , and one has

• there exists such that (“near -sparsity of ”);

• one has

 (a)ϕ(Aˆx−[Ax+ξ])≤ϵ+δϕ(b)∥Bˆx∥≤\rm Opt+δ, \rm Opt:=minu{∥Bu∥:ϕ(Au−y)≤ϵ}. (8)

(“ is a nearly feasible nearly optimal solution to the optimization problem specifying ”).

Then

 ∥Bˆx−Bx∥≤β[2ϵ+δϕ]+δ+2δx1−γ. (9)

Proofs of the results of this section are put in the appendix.

#### Error bounds for penalized ∥⋅∥ recovery

###### Proposition 3.2

In the situation of section 2.1, let , , and be such that the condition is satisfied. Let also the penalty parameter in (4) be . Finally, let the signal underlying the observations be “nearly -sparse,” meaning that there exists such that , and let be a near-optimal solution to the optimization problem specifying , namely,

 λϕ(Aˆx−y)+∥Bˆx∥≤minz{λϕ(Az−y)+∥Bz∥}+δ.

Then

 ∥Bˆx−Bx∥≤2δx+δ+2λϕ(ξ)1−γ (10)

where is the observation noise.

#### Comment:

As compared to the plain -recovery (3), the penalized recovery requires a priori knowledge of a such that the condition takes place; indeed, in order for the error bound (10) to be applicable, we should ensure . As a compensation, the penalized recovery does not require any a priori information on the level of observation error, and as such is well suited for the case when the latter is random (or a sum of a random and a bounded components).

## 4 Application examples

In the rest of this note, we are interested in the particular forms taken by the nullspace sufficient condition for -goodness of (Lemma 3.1) and the error bound for imperfect -recovery (Proposition 3.1) in the examples described in section 2.1.

### 4.1 Example I.a: ℓ1 recovery

In the situation of Example I.a, the nullspace property (6) reads

 γs(A):=maxx{∥x∥s,1:x∈\rm Ker(A),∥x∥1≤1}<1/2, (11)

where is the sum of the largest magnitudes of entries in . This is a well-known necessary and sufficient condition for the validity of the standard sparse recovery [4, 13]. It is immediately seen that condition is satisfied if and only if it is satisfied when setting , and in this case the condition is equivalent to . The latter condition reads

 ∀(z∈Rn):  ∥z∥s,1≤β2ϕ(Az)+γ2∥z∥1. (12)

Validity of this relation is equivalent to the fact that the quantity introduced in [5] satisfies (see [5, Theorem 2.2]. What is denoted by in the latter reference, is now ), and with this in mind, error bounds (9) and (10) recover the bounds in [5, Theorem 3.1]. Beside this, one can find in [5] verifiable sufficient conditions for the validity of , their relations to Restricted Isometry Property, etc.

### 4.2 Example I.b: Group ℓ1 recovery

In the situation of Example I.b, given a positive real , let us define the norm on as

 πs(u)=2maxη{K∑ℓ=1ηℓ|uℓ|:ηℓ∈{0,1}∀ℓ, K∑ℓ=1χℓηℓ≤s},

and let be the norm on given by

 ∥w∥1,s=πs([∥w1∥(1);∥w2∥(2);...;∥wK∥(K)]).

Observe that for every and every we have

 ∥PBz∥+∥Bz∥−∥¯¯¯¯PBz∥ = ∑ℓ∈I∥(Bz)ℓ∥(ℓ)+K∑ℓ=1∥(Bz)ℓ∥(ℓ)−∑ℓ∉I∥(Bz)ℓ∥(ℓ) = 2∑ℓ∈I∥(Bz)ℓ∥(ℓ)≤∥Bz∥1,s,

where the concluding is given by , so that . Thus, with set to the condition (7) is satisfied. We have arrived at the following

###### Proposition 4.1

In the situation of Example I.b, for every positive reals , , the condition

 ∀z∈X:  ∥Bz∥1,s≤βϕ(Az)+γ∥Bz∥ (13)

is sufficient for the validity of , and thus — for the validity of . As a result, condition (13) with is sufficient for -goodness of and for the applicability of the error bounds (9) and (10).

#### A verifiable sufficient condition for (13)

Condition (13) is difficult to verify. We are about to present a verifiable sufficient condition for the validity of (13) inspired by [8]. For a linear map , let be the norm of the map induced by the norms and on the argument and the image spaces:

 ∥Qkℓ∥(ℓk)=maxu{∥Qkℓu∥(k):∥u∥(ℓ)≤1}.

Let also be the dimension of , and let be the cardinalities of , so that can be identified with . For an matrix , let be the blocks of associated with the direct product representation of , and let be the matrix with the entries , .

###### Proposition 4.2

In the situation of Example I.b, given , let matrix and matrix satisfy the relations

 (a)B=WB+HTA(b)maxℓ≤Kπs(\rm Colℓ(Ω[W]))≤γ, (14)

where is the -th column of matrix . Then the relation (13) holds true with

 β=β[H]:=maxv∈Rm{∥HTv∥1,s: ϕ(v)≤1}.

Proof. Let . Under the premise of the proposition, we have

 (Bz)k = (WBz)k+(HTAz)k=K∑ℓ=1Wkℓ(Bz)ℓ+(HTAz)k ⇒∥(Bz)k∥(k) ≤ K∑ℓ=1∥Wkℓ∥(ℓk)∥(Bz)ℓ∥(ℓ)+∥(HTAz)k∥(k) ⇒[∥(Bz)1∥(1);...;∥(Bz)K∥(K)] ≤ K∑ℓ=1∥(Bz)ℓ∥(ℓ)\rm Colℓ(Ω[W])+[∥(HTAz)1∥(1);...;∥(HTAz)K∥(K)] ⇒∥Bz∥1,s ≤ ∥Bz∥maxℓ≤Kπs(\rm Colℓ(Ω[W]))+∥HTAz∥1,s

and the desired conclusion follows.

#### Discussion

The sufficient condition for the validity of (13) stated in Proposition 4.2 reduces to solving a system of convex constrains in matrix variables and scalars , , namely,

 B=WB+HTA, πs(\rm Colℓ(Ω[W]))≤γ∀ℓ, Ψs(H):=maxv:ϕ(v)≤1∥HTv∥1,s≤β. (15)

This system, although convex, still can be difficult to process, since the convex functions , and can be difficult to compute. In such a case, one can replace these functions with their efficiently computable upper bounds (for details, “solvable cases,” etc., in the case of see [8]). For instance, (15) is computationally tractable when

• all norms are either , or , or norms (this makes the matrix efficiently computable);

• in appropriate scale, all weights are integers from a once for ever fixed (or polynomially growing with problem’s sizes) range, which makes the norm efficiently computable. Note that one can always replace with a reasonably tight upper bound on , specifically, the norm

 ˆπs(u)=2maxη{K∑ℓ=1ηℓ|uℓ|:0≤ηℓ≤min[1,Floor(s/χℓ)]∀ℓ,K∑ℓ=1χℓηℓ≤s};
• is the norm.

The last assumption is indeed restrictive. It, however, is responsible solely for the tractability of the constraint , and is crucial only if one’s objective is to compute an upper bound on . On the other hand, it is of primary importance to ensure , since otherwise Proposition 3.1 provides no error bound at all.

Finally, we refer the reader to [8] for details on relationship of the derived conditions with block-RIP and other sufficient conditions used in the Compressive Sensing literature.

### 4.3 Example II: Nuclear norm recovery

For a matrix with , let

 Σk(z)=k∑i=1σi(z),1≤k≤q.

Observe that in the situation of Example II, where is the identity, and the sparsity parameter can be w.l.o.g. restricted to be a nonnegative integer, we have (everywhere in this section, is the nuclear norm).

 ∀(z∈Rp×q,P∈Ps):  ∥P(z)∥≤Σs(z), ∥¯¯¯¯P(z)∥≥∥z∥−Σ2s(z).

Indeed, let , so that and . Then by the Singular Value Interlacement Theorem. Since the matrix differs from by matrix of rank at most , by the same Singular Values Interlacement Theorem we have , whence .

We have arrived at the following

###### Proposition 4.3

In the situation of Example II, the norm

 ∥z∥Ps:=Σs(z)+Σ2s(z)

on satisfies the condition (7.), so that the condition

 ∀z∈Rp×q: Σs(z)+Σ2s(z)≤βϕ(Az)+γ∥z∥ (16)

is sufficient for the validity of , and thus — for the validity of . As a result, condition (16) with is sufficient for -goodness of and for applicability of the error bounds (9) and (10).

Clearly there is a gap between the above sufficient condition and the necessary nullspace condition for exact low-rank matrix recovery, which is

 2Σs(z)<∥z∥    for all z∈\rm Ker(A), z≠0.

On the other hand our sufficient condition is stronger than, the only known to us, sufficient condition given in [10], which requires

 2Σ2s(z)<∥z∥    for all z∈\rm Ker(A), z≠0.

#### A verifiable sufficient condition for (16)

Following the same exposition scheme as in Examples I.a-b, it is now time to point out a verifiable sufficient condition for the validity of (16). Let be a linear map from into , so that , Id being the identity mapping on , is a linear map from into . Assume that satisfies the requirement

 ∀z∈X:  Σs(Wz)+Σ2s(Wz)≤γ∥z∥; (17)

we claim that then (16) holds true with

 β=β[H]=maxv{Σs(H∗v)+Σ2s(H∗v): ϕ(v)≤1}. (18)

Indeed, let . Assuming (17), and setting , , we have

 π(z)=π(Wz+H∗Az)≤π(Wz)+π(H∗Az)≤γ∥z∥+β[H]ϕ(Az)

as required in (16).

The question is, how to efficiently certify the validity of (17). The proposed answer is as follows. Note that

 Σk(w)=maxh{\rm Tr(whT):∥h∥≤k,∥h∥2,2≤1},

therefore (17) is exactly

 γ≥\rm Opt[W] := maxz∈Rp×q{Σs(Wz)+Σ2s(Wz): ∥z∥≤1} = maxu,v,z∈Rp×q{\rm Tr([Wz][u+v]T): ∥z∥≤1,∥u∥≤s,∥u∥2,2≤1,∥v∥≤2s,∥v∥2,2≤1}.

Now,

 \rm Tr([Wz]hT)=pq∑i,j=1(Θ[W])ij(hT⊗z)ij,

where is a properly defined linear in matrix, and is the Kronecker product; in other words is the matrix with blocks , , . We conclude that

 \rm Opt[W]≤maxU,V{pq∑i,j=1(Θ[W])ij[Uij+Vij]:U∈Zs,V∈Z2s},

where