Generalized notions of sparsity and restricted isometry property. Part I: A unified framework

06/28/2017 ∙ by Marius Junge, et al. ∙ 0

The restricted isometry property (RIP) is an integral tool in the analysis of various inverse problems with sparsity models. Motivated by the applications of compressed sensing and dimensionality reduction of low-rank tensors, we propose generalized notions of sparsity and provide a unified framework for the corresponding RIP, in particular when combined with isotropic group actions. Our results extend an approach by Rudelson and Vershynin to a much broader context including commutative and noncommutative function spaces. Moreover, our Banach space notion of sparsity applies to affine group actions. The generalized approach in particular applies to high order tensor products.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The restricted isometry property

(RIP) has been used as a universal tool in the analysis of many modern inverse problems with sparsity prior models. Indeed, the RIP implies that certain linear maps act as near isometries when restricted to “nice” (or sparse) vectors. Motivated from emerging big data applications such as compressed sensing or dimensionality reduction of massively sized data with a low-rank tensor structure, we provide a unified framework for the RIP allowing a generalized notion of sparsity and extend the existing theory to a much broader context.

Let us recall that in compressed sensing the RIP played a crucial role in providing guarantees for the recovery of sparse vectors from a small number of observations. Moreover, these guarantees were achieved by practical polynomial time algorithms (e.g., [CT05, RV08]

). In machine learning, the RIP enabled a fast and guaranteed dimensionality reduction of data with a sparsity structure. The notion of sparsity has been shown for various sparsity models and in many cases the RIP turns out to be nearly optimal in terms of scaling of parameters for several classes of random linear operators. For example, a linear map with random subgaussian entries satisfies a near optimal RIP for the canonical sparsity model

[CT05, BDDW08, KMR14], low-rank matrix model [RFP10, CP11b], and low-rank tensor model [RSS16]. Baraniuk et al. [BDDW08] provided an alternative elementary derivation that combines exponential concentration of a subgaussian quadratic form and standard geometric argument with union bounds.

Linear operators with special structures such as subsampled Fourier transform arise in practical applications. These structures are naturally given by the physics of applications (e.g., Fourier imaging) and subsampled versions of these structured linear operators can be implemented within existing physical systems. Furthermore, structured linear operators also enable scalable implementation at low computational cost, which is highly desirable for dimensionality reduction. It has been shown that a partial Fourier operator satisfies a near optimal RIP for the canonical sparsity model in the context of compressed sensing

[CT06, RV08, Rau10]. For another example, quantum tomography, the linear operator for randomly subsampled Pauli measurements was shown to satisfy a near optimal RIP for a low-rank matrix model [Liu11].

There are applications whose setup doesn’t fit in the existing theory because the classical sparsity model does not hold and/or assumptions on the linear operator are not satisfied. Motivated by such applications, in this paper, we extend the notion of sparsity and RIP for structured linear operator in several ways described below.

1.1. Generalized notion of sparsity

First, we generalize the notion of sparsity. Let be a Hilbert space and be a centered convex body. We will consider the Banach space obtained by completing the linear span of with the norm given as the Minkowski functional defined by .

Definition 1.1.

We say that a vector is -sparse if

where is the Banach space with unit ball .

The set of -sparse unit-norm vector in , denoted by , is geometrically given as the intersection of and the unit sphere . Then the set of -sparse vectors, denoted by , is the star-shaped nonconvex cone given by (or if the scalar field is complex). These two sets are visualized in Figure 1. For example, if and , then corresponds to the set of approximately -sparse vectors with respect to the canonical basis. The authors of this paper showed that existing near optimal RIP results extend from the exact canonical sparsity model to this approximately sparse model [JL15]. This generalized notion of sparsity covers a wider class of models beyond the classical atomic model. For example, in a companion Part II paper [JL17, Section 4], we demonstrate a case where a sparse vector is not represented as a finite linear combination of atoms. It also allows a machinery that optimizes sample complexity for the RIP of a given atomic sparsity model by choosing an appropriate Banach space (see [JL17, Section 2]). In a special case, where the sparsity level is 1, our theory covers an arbitrary set.111Note that taking the convex hull of a given set does not increase the number of measurements for RIP. Therefore, the convex set can be considered as the convex hull of a given set of interest in this case.

(a)
(b)
Figure 1. Visualization of an abstract sparsity model using a convex set and the unit sphere in a Hilbert space . Left: Set of -sparse unit-norm vectors (red). Right: Set of -sparse vectors (gray-shaded).

1.2. Vector-valued measurements

Second, we consider vector-valued measurements which generalize the conventional scalar valued measurements. This situation arises in several practical applications. For example, in medical imaging and multi-dimensional signal acquisition, measurements are taken by sampling transform of the input not individually but in blocks. The performance of norm minimization has been analyzed in this setup [PDG15, BBW16] and it was shown that block sampling scheme, enforced by applications, adds a penalty to the number of measurements for the recovery. This analysis extends the noiseless part of the analogous theory for the scalar valued measurements [CP11a], which relies on a property called local isometry, which is a weaker version of the RIP. For stable recovery from noisy measurements, one essentially needs the RIP of the measurement operator but block sampling setup does not fit to existing RIP results for structured linear operators. In this paper, we will consider general vector-valued measurements in a Hilbert space and generalize the notion of incoherence and other properties accordingly. This extension, in particular combined with a generalized sparsity model, requires the use of theory of factorization of a linear operator in Banach spaces [Pis86a].

1.3. Sparsity with enough symmetries and group structured RIP

We also generalize the theory of the RIP for partial Fourier measurement operators to more general group structured measurement operators, which will exploit the inherent structure in the Banach space that determines a sparsity model. The canonical sparsity model is an examples of our general sparsity model in Section 1.1, where the convex set has a special structure called enough symmetries. Let be a group and be an affine representation that maps an element in to the orthogonal group in . An affine representation is isotropic if averaging the conjugate actions on any linear operator becomes a scalar multiple of the identity. A convex set has enough symmetries if there exists an isotropic affine representation such that for all . Finite dimensional Banach spaces with enough symmetries have been studied extensively (see [TJ89, DF92, Pis86a]). Our original motivation for this problem comes from considering the low-rank tensor product in . In fact a nice feature of spaces with enough symmetries comes from their stability under tensor products.

Under the enough symmetry of , we consider a linear operator given as the composition map given by sampling the (adjoint) orbit of , i.e. for and , where . For certain class of groups, the group structured measurement operator has fast implementation. For example, if , then reduces to a partial discrete Fourier transform. If the group actions consist of circular shifts in the canonical basis and in the Fourier basis, then corresponds to a partial quantum Fourier transform, which is a special case of the Gabor transform. We will demonstrate the RIP of this group structured measurement operators when the group elements are randomly selected.

Again, the group structured measurement operator is a natural extension of a partial Fourier operator. Unlike the other extension to subsampled bounded orthogonal system [Rau10], the group structured measurement operator is tightly connected to a given sparsity model.

1.4. Main results

We illustrate our main results in the general setup with two concrete examples in the two theorems below. These theorems provide the RIP of a random group structured measurement operator respectively for the corresponding stylized sparsity models. Both theorems assume that and the convex set , which determines the set of -sparse vectors , has enough symmetry with an isotropic affine representation and is the Banach space induced from so that is the unit ball in as before. A set of random measurements are obtained by using random group actions. Specifically, we assume that

are independent copies of a Haar-distributed random variable

on .

The first theorem demonstrates our main result in the case where is a polytope given as an absolute convex hull of finitely many vectors.

Theorem 1.2 (Polytope).

Suppose that be an -dimensional Banach space where its unit ball is an absolute convex hull of points. Let be defined from as before and satisfy . Then

holds with high probability for

.

Theorem 1.2 generalizes the RIP result of a partial Fourier operator (e.g., [RV08]) in the three ways discussed above. The operator norm of in Theorem 1.2 generalizes the notion of incoherence in existing theory. Most interestingly, combined with a clever net argument, Theorem 1.2 enables the RIP of a random group structured measurement operator for low-rank tensors (See Section 6).

The second theorem deals with the sparsity model with respect to a “nice” Banach space whose norm dual has type [Pis99]. (Details are explained in Section 3.3.) Here for simplicity we only demonstrate an example where .

Theorem 1.3 (Dual of type 2).

Suppose that is an -dimensional Banach space such that its norm dual has type 2. Let be defined from the unit ball in as before and satisfy . Then

holds with high probability for , where denotes the type 2 constant of .

Theorem 1.3 covers many known results on the RIP of structured random linear operator and should be considered as an umbrella result for this theory. Importantly Theorem 1.3 applies to noncommutative cases such as Schatten classes and the previous result for a partial Pauli operator applied to low-rank matrices [Liu11] is a special example.

In fact, Theorems 1.2 and 1.3 are just exemplar of the main result in full generality in Theorem 2.1. In the Part II paper [JL17], we also demonstrate that Theorem 2.1 provides theory for the RIP for infinite dimensional sparsity models.

1.5. Notation

In this paper, the symbols and will be reserved for numerical constants, which might vary from line to line. We will use notation for various Banach spaces and norms. The operator norm will be denoted by . We will use the shorthand notation for the -norm for . For , the unit ball in will be denoted by . For set , let denote the canonical basis for . The index set should be clear from the context. The identity operator will be denoted by . For set of linear operators, the commutant, denoted by , refers to the set of linear operators those commute with all elements in , i.e.

1.6. Organization

The rest of this paper is organized as follows. The main theorems are proved in Section 2. We discuss the complexity of the convex set for various sparsity models in Section 3. After a brief review of affine group representations and enough symmetries in Section 4, by collecting the results from the previous sections, we illustrate the implication of the main results for prototype sparsity models in Section 5. Finally, we conclude the paper with the application of the main results for a low-rank tensor model in Section 6.

2. Rudelson-Vershynin method

In this section, we derive a unified framework that identifies a sufficient number of measurements for the RIP of structured random operators in the general setup introduced in Sections 1.1 and 1.2. We will start with the statement of the property in the general setup, followed by the proof.

2.1. RIP in the general setup

Let be a Hilbert spaces and be a centrally symmetric convex set and be the Banach space with unit ball as before. Let denote the set of -sparse vectors and be the intersection of and the unit sphere in . Let be independent random linear operator from to . For notational simplicity, we let denote the composite map defined by for . Then the measurement operator , defined by , generates a set of vector valued linear measurements in .

Our results are stated for a class of incoherent random measurement operators. We adopt the arguments by Candes and Plan [CP11a] to describe these measurement operators. In the special case of and , Candes and Plan considered a class of linear operators given by measurement maps satisfying the following two key properties. i) Isotropy: for all , where

is the identity matrix of size

; ii) Incoherence: is upper bounded by a numerical constant (deterministically or with high probability). In our setup, the isotropy extends to

(1)

But sometimes we will also consider the case where

(2)

hold with satisfying

(3)

Obviously, the isotropy is a sufficient condition for the relaxed properties in (2) and (3). For the incoherence, we generalize it using an 1-homogeneous function that maps a bounded map from to to a nonnegative number. A natural choice of is the operator norm, which is consistent with the above example of and . The operator norm of in this case reduces to . However, in certain scenarios, there exists a better choice of than the operator norm that further reduces the sample complexity that identifies a sufficient number of measurements for the RIP. One such example is demonstrated for the windowed Fourier transform in the Part II paper [JL17, Section 2].

Under the relaxed isotropy conditions in (2) and (3), with a slight abuse of terminology, we say that satisfies the RIP on with constant if

(4)

In the special case where the isotropy () is satisfied, the deviation inequality in (4) reduces to the conventional RIP. Note that is a nonnegative operator by construction. If is a positive operator, then is a weighted norm of and (4) preserves this weighted norm through with a small perturbation proportional to .

Our main result is a far reaching generalization of the RIP of a partial Fourier operator by Rudelson and Vershynin [RV08]. We adapt their derivation that consists of the following two steps: The first step is to show that the expectation of the restricted isometry constant is upper bounded by the functional [Tal96] of the restriction set, then by an integral of the metric entropy number by Dudley’s theorem [LT13]. Later in this section, we show that the first step extends to the general setup with the upper bound given by

(5)

where denotes the dyadic entropy number of [CS90]. The second step is where our theory deviates significantly from the previous work [RV08]. Rudelson and Vershynin used a variation of Maurey’s empirical method [Car85] to get an upper bound on the integral in (5) for being the unit ball in , which in turn provided a near optimal sample complexity up to a logarithmic factor. Liu [Liu11] later extended the result by Rudelson and Vershynin [RV08] to the case of a partial Pauli operator applied to low-rank matrices via the dual entropy argument by Guédon et al. [GMPTJ08].

Our result is further generalization of these results. In particular, our result provides flexibility that can address the vector-valued measurement case and optimize sample complexity over the choice of the 1-homogeneous function on . In the general setup, we need to adopt other tools in Banach space theory to get an analogous upper bound. For this purpose, we introduce a property of the convex set , defined as follows: We say that is of entropy type if there exists a constant such that

(6)

holds for any composite map , where the exponent function is defined by for and for for technical reasons. In this paper, will denote the smallest constant that satisfies (6). Note that generalizes the notion of incoherence and represents the complexity of a given sparsity model, which is discussed in more details in Section 3.

Our main theorem below identifies a sufficient number of measurements for RIP of random linear operator in the general setup.

Theorem 2.1.

Let , , , be defined as above. Suppose that is of entropy type . Let be independent random maps from to satisfying (2) and (3). Let and . Then there exists a numerical constant such that satisfies the RIP on with constant with probability provided

(7)
and
(8)

The moment terms in (

7) and (8) are essentially probabilistic or deterministic upper bounds on and , respectively.222Indeed, a tail bound implies moment bounds by the Markov inequality and the converse can be shown by direct calculation with Stirling’s approximation of the gamma function. (e.g., see [FR13, Chapter 7].) These two terms extend the notion of incoherence of measurement functionals with respect to the given sparsity model. On the other hand, describes the complexity of sparsity model. In many of well-known examples, reduces to a logarithmic factor. However, if the convex set that determines the sparsity model has a bad geometry, there will be a penalty given by larger . These incoherence and complexity parameters are controlled by a choice of the parameter and the 1-homogeneous function .

Remark 2.2.

A natural choice for the parameters in Theorem 2.1 is and . Then the conditions in (7) and (8) reduces to

However, as shown in [JL17, Section 2], there are cases where we can further reduce the number of measurements for the RIP in (7) by optimizing over , , and .

2.2. Proof of Theorem 2.1

Next we prove Theorem 2.1. In the course of the proof, we show that the Rudelson-Vershynin argument [RV08] to derive a near optimal RIP of partial Fourier operators generalizes to a flexible method. Let us start with recalling the relevant notation. Let be a linear map and be a subset of . The dyadic entropy number [CS90] is defined by

For , we use the shorthand notation . The following equivalence between metric and dyadic entropy numbers is well known (see e.g., [Pis99]).

Lemma 2.3 ([Pis99]).

.

Note that since is a nondecreasing sequence, coincides with the norm of in the Lorentz sequence space [BL76]. Therefore, we will use the shorthand notation to denote .

The following lemma provides a key estimate in proving Theorem 

2.1.

Lemma 2.4.

Let , , , and be defined as before. Let . Let be linear maps from to and denote the composite map. Let be independent copies of . Then for all

Proof.

Define for and . Let . Then is a subgaussian process indexed by . By the tail bound result via generic chaining [Dir15, Theorem 3.2] and Dudley’s inequality [LT13], for all we have

(9)

We first compute an upper bound on the first summand in the right-hand-side of (9). Let

(10)

Then we note that for we have

Let denote the maximal family of elements in such that . Then it follows that . This implies that

Using a change of variables, this implies

Next, we compute an upper bound on the second summand in the right-hand-side of (9). By Khintchine’s inequality for all

Therefore

Combining these estimates yields the assertion. ∎

Corollary 2.5.

Suppose the hypothesis of Lemma 2.4. Then

Proof.

By the polarization identity, we have

where . Then we apply the argument for . Note that

Then by the Cauchy-Schwartz inequality for we have

where is defined in (10). Thus, the assertion follows by replacing by . ∎

Proposition 2.6.

Let , , and be defined as before. Let and . Let be independent random maps from to satisfying (2) with and (3). Let denote the composite map. Suppose that

  1. The linear operator satisfies .

  2. The random linear operator satisfies

    (11)

    for an absolute constant .

Then

(12)
Proof.

Let denote the left-hand side of the inequality in (12). Let be independent copies of . By the standard symmetrization, we have

By conditioning on , we deduce from Lemma 2.4 that

Let be the factor before , then we have . Since was arbitrary, a consequence of the Markov inequality [Dir15, Lemma A.1] implies that there exists a numerical constant such that

holds with probability . The condition in (11) implies . ∎

Proof of Theorem 2.1.

Since is of entropy type , for every we have

Then we get

and hence

By Proposition 2.6, it suffices to satisfy

and

3. Complexity of sparsity models

Our generalized sparsity model is given by scaled versions of a convex set . A sufficient number of measurements for the RIP is determined by the geometry of the resulting Banach space . In this section, we discuss the complexity of given in terms of for various sparsity models.

3.1. Relaxed canonical sparsity

In many applications sparsity is implicitly controlled by the norm. A relaxed canonical sparsity model, which includes exactly sparse vectors and their approximation with small perturbation, is defined by . Then the corresponding Banach space is . We derive an upper bound on by using a well known application of Maurey’s empirical method, which is given in the following lemma.

Lemma 3.1.

(Maurey’s empirical method) Let , be a Hilbert space, and . Then

In particular, for ,

Proof.

The first part is a direct consequence of Maurey’s empirical method [Car85, Proposition 2]. Let and , which satisfies . Since has type 2 with constant , it follows from [Car85, Proposition 2] that

Choosing yields the first assertion. Next we prove the second assertion. For , the first assertion implies

For , by the standard volume argument ([Pis86b, Lemma 1.7]), we have . Therefore,

This completes the proof. ∎

Proposition 3.2.

Let . Then

Proof.

According to Lemma 3.1 we have

and

The assertion follows from the definition in (6). ∎

3.2. Relaxed atomic sparsity with finite dictionary

We say that a vector is atomic -sparse if is represented as a finite linear combination of a given set of atoms, which is called a dictionary. Here we consider a special case where the dictionary is a finite set . A relaxed atomic sparsity model is defined by the convex hull of the dictionary, i.e. . As mentioned in the introduction, it is important to observe if the point ’s are in the unit sphere , then the “sparse” set is no longer in a convex hull of a set with few vectors from the unit sphere. Then the complexity of is upper bounded by the following corollary.

Corollary 3.3.

Let and . Then

Proof.

Let . Then we have