The restricted isometry property
(RIP) has been used as a universal tool in the analysis of many modern inverse problems with sparsity prior models. Indeed, the RIP implies that certain linear maps act as near isometries when restricted to “nice” (or sparse) vectors. Motivated from emerging big data applications such as compressed sensing or dimensionality reduction of massively sized data with a low-rank tensor structure, we provide a unified framework for the RIP allowing a generalized notion of sparsity and extend the existing theory to a much broader context.
Let us recall that in compressed sensing the RIP played a crucial role in providing guarantees for the recovery of sparse vectors from a small number of observations. Moreover, these guarantees were achieved by practical polynomial time algorithms (e.g., [CT05, RV08]
). In machine learning, the RIP enabled a fast and guaranteed dimensionality reduction of data with a sparsity structure. The notion of sparsity has been shown for various sparsity models and in many cases the RIP turns out to be nearly optimal in terms of scaling of parameters for several classes of random linear operators. For example, a linear map with random subgaussian entries satisfies a near optimal RIP for the canonical sparsity model[CT05, BDDW08, KMR14], low-rank matrix model [RFP10, CP11b], and low-rank tensor model [RSS16]. Baraniuk et al. [BDDW08] provided an alternative elementary derivation that combines exponential concentration of a subgaussian quadratic form and standard geometric argument with union bounds.
Linear operators with special structures such as subsampled Fourier transform arise in practical applications. These structures are naturally given by the physics of applications (e.g., Fourier imaging) and subsampled versions of these structured linear operators can be implemented within existing physical systems. Furthermore, structured linear operators also enable scalable implementation at low computational cost, which is highly desirable for dimensionality reduction. It has been shown that a partial Fourier operator satisfies a near optimal RIP for the canonical sparsity model in the context of compressed sensing[CT06, RV08, Rau10]. For another example, quantum tomography, the linear operator for randomly subsampled Pauli measurements was shown to satisfy a near optimal RIP for a low-rank matrix model [Liu11].
There are applications whose setup doesn’t fit in the existing theory because the classical sparsity model does not hold and/or assumptions on the linear operator are not satisfied. Motivated by such applications, in this paper, we extend the notion of sparsity and RIP for structured linear operator in several ways described below.
1.1. Generalized notion of sparsity
First, we generalize the notion of sparsity. Let be a Hilbert space and be a centered convex body. We will consider the Banach space obtained by completing the linear span of with the norm given as the Minkowski functional defined by .
We say that a vector is -sparse if
where is the Banach space with unit ball .
The set of -sparse unit-norm vector in , denoted by , is geometrically given as the intersection of and the unit sphere . Then the set of -sparse vectors, denoted by , is the star-shaped nonconvex cone given by (or if the scalar field is complex). These two sets are visualized in Figure 1. For example, if and , then corresponds to the set of approximately -sparse vectors with respect to the canonical basis. The authors of this paper showed that existing near optimal RIP results extend from the exact canonical sparsity model to this approximately sparse model [JL15]. This generalized notion of sparsity covers a wider class of models beyond the classical atomic model. For example, in a companion Part II paper [JL17, Section 4], we demonstrate a case where a sparse vector is not represented as a finite linear combination of atoms. It also allows a machinery that optimizes sample complexity for the RIP of a given atomic sparsity model by choosing an appropriate Banach space (see [JL17, Section 2]). In a special case, where the sparsity level is 1, our theory covers an arbitrary set.111Note that taking the convex hull of a given set does not increase the number of measurements for RIP. Therefore, the convex set can be considered as the convex hull of a given set of interest in this case.
1.2. Vector-valued measurements
Second, we consider vector-valued measurements which generalize the conventional scalar valued measurements. This situation arises in several practical applications. For example, in medical imaging and multi-dimensional signal acquisition, measurements are taken by sampling transform of the input not individually but in blocks. The performance of norm minimization has been analyzed in this setup [PDG15, BBW16] and it was shown that block sampling scheme, enforced by applications, adds a penalty to the number of measurements for the recovery. This analysis extends the noiseless part of the analogous theory for the scalar valued measurements [CP11a], which relies on a property called local isometry, which is a weaker version of the RIP. For stable recovery from noisy measurements, one essentially needs the RIP of the measurement operator but block sampling setup does not fit to existing RIP results for structured linear operators. In this paper, we will consider general vector-valued measurements in a Hilbert space and generalize the notion of incoherence and other properties accordingly. This extension, in particular combined with a generalized sparsity model, requires the use of theory of factorization of a linear operator in Banach spaces [Pis86a].
1.3. Sparsity with enough symmetries and group structured RIP
We also generalize the theory of the RIP for partial Fourier measurement operators to more general group structured measurement operators, which will exploit the inherent structure in the Banach space that determines a sparsity model. The canonical sparsity model is an examples of our general sparsity model in Section 1.1, where the convex set has a special structure called enough symmetries. Let be a group and be an affine representation that maps an element in to the orthogonal group in . An affine representation is isotropic if averaging the conjugate actions on any linear operator becomes a scalar multiple of the identity. A convex set has enough symmetries if there exists an isotropic affine representation such that for all . Finite dimensional Banach spaces with enough symmetries have been studied extensively (see [TJ89, DF92, Pis86a]). Our original motivation for this problem comes from considering the low-rank tensor product in . In fact a nice feature of spaces with enough symmetries comes from their stability under tensor products.
Under the enough symmetry of , we consider a linear operator given as the composition map given by sampling the (adjoint) orbit of , i.e. for and , where . For certain class of groups, the group structured measurement operator has fast implementation. For example, if , then reduces to a partial discrete Fourier transform. If the group actions consist of circular shifts in the canonical basis and in the Fourier basis, then corresponds to a partial quantum Fourier transform, which is a special case of the Gabor transform. We will demonstrate the RIP of this group structured measurement operators when the group elements are randomly selected.
Again, the group structured measurement operator is a natural extension of a partial Fourier operator. Unlike the other extension to subsampled bounded orthogonal system [Rau10], the group structured measurement operator is tightly connected to a given sparsity model.
1.4. Main results
We illustrate our main results in the general setup with two concrete examples in the two theorems below. These theorems provide the RIP of a random group structured measurement operator respectively for the corresponding stylized sparsity models. Both theorems assume that and the convex set , which determines the set of -sparse vectors , has enough symmetry with an isotropic affine representation and is the Banach space induced from so that is the unit ball in as before. A set of random measurements are obtained by using random group actions. Specifically, we assume that
are independent copies of a Haar-distributed random variableon .
The first theorem demonstrates our main result in the case where is a polytope given as an absolute convex hull of finitely many vectors.
Theorem 1.2 (Polytope).
Suppose that be an -dimensional Banach space where its unit ball is an absolute convex hull of points. Let be defined from as before and satisfy . Then
holds with high probability for
holds with high probability for.
Theorem 1.2 generalizes the RIP result of a partial Fourier operator (e.g., [RV08]) in the three ways discussed above. The operator norm of in Theorem 1.2 generalizes the notion of incoherence in existing theory. Most interestingly, combined with a clever net argument, Theorem 1.2 enables the RIP of a random group structured measurement operator for low-rank tensors (See Section 6).
The second theorem deals with the sparsity model with respect to a “nice” Banach space whose norm dual has type [Pis99]. (Details are explained in Section 3.3.) Here for simplicity we only demonstrate an example where .
Theorem 1.3 (Dual of type 2).
Suppose that is an -dimensional Banach space such that its norm dual has type 2. Let be defined from the unit ball in as before and satisfy . Then
holds with high probability for , where denotes the type 2 constant of .
Theorem 1.3 covers many known results on the RIP of structured random linear operator and should be considered as an umbrella result for this theory. Importantly Theorem 1.3 applies to noncommutative cases such as Schatten classes and the previous result for a partial Pauli operator applied to low-rank matrices [Liu11] is a special example.
In this paper, the symbols and will be reserved for numerical constants, which might vary from line to line. We will use notation for various Banach spaces and norms. The operator norm will be denoted by . We will use the shorthand notation for the -norm for . For , the unit ball in will be denoted by . For set , let denote the canonical basis for . The index set should be clear from the context. The identity operator will be denoted by . For set of linear operators, the commutant, denoted by , refers to the set of linear operators those commute with all elements in , i.e.
The rest of this paper is organized as follows. The main theorems are proved in Section 2. We discuss the complexity of the convex set for various sparsity models in Section 3. After a brief review of affine group representations and enough symmetries in Section 4, by collecting the results from the previous sections, we illustrate the implication of the main results for prototype sparsity models in Section 5. Finally, we conclude the paper with the application of the main results for a low-rank tensor model in Section 6.
2. Rudelson-Vershynin method
In this section, we derive a unified framework that identifies a sufficient number of measurements for the RIP of structured random operators in the general setup introduced in Sections 1.1 and 1.2. We will start with the statement of the property in the general setup, followed by the proof.
2.1. RIP in the general setup
Let be a Hilbert spaces and be a centrally symmetric convex set and be the Banach space with unit ball as before. Let denote the set of -sparse vectors and be the intersection of and the unit sphere in . Let be independent random linear operator from to . For notational simplicity, we let denote the composite map defined by for . Then the measurement operator , defined by , generates a set of vector valued linear measurements in .
Our results are stated for a class of incoherent random measurement operators. We adopt the arguments by Candes and Plan [CP11a] to describe these measurement operators. In the special case of and , Candes and Plan considered a class of linear operators given by measurement maps satisfying the following two key properties. i) Isotropy: for all , where
is the identity matrix of size; ii) Incoherence: is upper bounded by a numerical constant (deterministically or with high probability). In our setup, the isotropy extends to
But sometimes we will also consider the case where
hold with satisfying
Obviously, the isotropy is a sufficient condition for the relaxed properties in (2) and (3). For the incoherence, we generalize it using an 1-homogeneous function that maps a bounded map from to to a nonnegative number. A natural choice of is the operator norm, which is consistent with the above example of and . The operator norm of in this case reduces to . However, in certain scenarios, there exists a better choice of than the operator norm that further reduces the sample complexity that identifies a sufficient number of measurements for the RIP. One such example is demonstrated for the windowed Fourier transform in the Part II paper [JL17, Section 2].
In the special case where the isotropy () is satisfied, the deviation inequality in (4) reduces to the conventional RIP. Note that is a nonnegative operator by construction. If is a positive operator, then is a weighted norm of and (4) preserves this weighted norm through with a small perturbation proportional to .
Our main result is a far reaching generalization of the RIP of a partial Fourier operator by Rudelson and Vershynin [RV08]. We adapt their derivation that consists of the following two steps: The first step is to show that the expectation of the restricted isometry constant is upper bounded by the functional [Tal96] of the restriction set, then by an integral of the metric entropy number by Dudley’s theorem [LT13]. Later in this section, we show that the first step extends to the general setup with the upper bound given by
where denotes the dyadic entropy number of [CS90]. The second step is where our theory deviates significantly from the previous work [RV08]. Rudelson and Vershynin used a variation of Maurey’s empirical method [Car85] to get an upper bound on the integral in (5) for being the unit ball in , which in turn provided a near optimal sample complexity up to a logarithmic factor. Liu [Liu11] later extended the result by Rudelson and Vershynin [RV08] to the case of a partial Pauli operator applied to low-rank matrices via the dual entropy argument by Guédon et al. [GMPTJ08].
Our result is further generalization of these results. In particular, our result provides flexibility that can address the vector-valued measurement case and optimize sample complexity over the choice of the 1-homogeneous function on . In the general setup, we need to adopt other tools in Banach space theory to get an analogous upper bound. For this purpose, we introduce a property of the convex set , defined as follows: We say that is of entropy type if there exists a constant such that
holds for any composite map , where the exponent function is defined by for and for for technical reasons. In this paper, will denote the smallest constant that satisfies (6). Note that generalizes the notion of incoherence and represents the complexity of a given sparsity model, which is discussed in more details in Section 3.
Our main theorem below identifies a sufficient number of measurements for RIP of random linear operator in the general setup.
The moment terms in (7) and (8) are essentially probabilistic or deterministic upper bounds on and , respectively.222Indeed, a tail bound implies moment bounds by the Markov inequality and the converse can be shown by direct calculation with Stirling’s approximation of the gamma function. (e.g., see [FR13, Chapter 7].) These two terms extend the notion of incoherence of measurement functionals with respect to the given sparsity model. On the other hand, describes the complexity of sparsity model. In many of well-known examples, reduces to a logarithmic factor. However, if the convex set that determines the sparsity model has a bad geometry, there will be a penalty given by larger . These incoherence and complexity parameters are controlled by a choice of the parameter and the 1-homogeneous function .
2.2. Proof of Theorem 2.1
Next we prove Theorem 2.1. In the course of the proof, we show that the Rudelson-Vershynin argument [RV08] to derive a near optimal RIP of partial Fourier operators generalizes to a flexible method. Let us start with recalling the relevant notation. Let be a linear map and be a subset of . The dyadic entropy number [CS90] is defined by
For , we use the shorthand notation . The following equivalence between metric and dyadic entropy numbers is well known (see e.g., [Pis99]).
Lemma 2.3 ([Pis99]).
Note that since is a nondecreasing sequence, coincides with the norm of in the Lorentz sequence space [BL76]. Therefore, we will use the shorthand notation to denote .
Let , , , and be defined as before. Let . Let be linear maps from to and denote the composite map. Let be independent copies of . Then for all
We first compute an upper bound on the first summand in the right-hand-side of (9). Let
Then we note that for we have
Let denote the maximal family of elements in such that . Then it follows that . This implies that
Using a change of variables, this implies
Next, we compute an upper bound on the second summand in the right-hand-side of (9). By Khintchine’s inequality for all
Combining these estimates yields the assertion. ∎
Suppose the hypothesis of Lemma 2.4. Then
By the polarization identity, we have
where . Then we apply the argument for . Note that
Then by the Cauchy-Schwartz inequality for we have
where is defined in (10). Thus, the assertion follows by replacing by . ∎
Let denote the left-hand side of the inequality in (12). Let be independent copies of . By the standard symmetrization, we have
By conditioning on , we deduce from Lemma 2.4 that
Let be the factor before , then we have . Since was arbitrary, a consequence of the Markov inequality [Dir15, Lemma A.1] implies that there exists a numerical constant such that
holds with probability . The condition in (11) implies . ∎
3. Complexity of sparsity models
Our generalized sparsity model is given by scaled versions of a convex set . A sufficient number of measurements for the RIP is determined by the geometry of the resulting Banach space . In this section, we discuss the complexity of given in terms of for various sparsity models.
3.1. Relaxed canonical sparsity
In many applications sparsity is implicitly controlled by the norm. A relaxed canonical sparsity model, which includes exactly sparse vectors and their approximation with small perturbation, is defined by . Then the corresponding Banach space is . We derive an upper bound on by using a well known application of Maurey’s empirical method, which is given in the following lemma.
(Maurey’s empirical method) Let , be a Hilbert space, and . Then
In particular, for ,
Choosing yields the first assertion. Next we prove the second assertion. For , the first assertion implies
For , by the standard volume argument ([Pis86b, Lemma 1.7]), we have . Therefore,
This completes the proof. ∎
Let . Then
3.2. Relaxed atomic sparsity with finite dictionary
We say that a vector is atomic -sparse if is represented as a finite linear combination of a given set of atoms, which is called a dictionary. Here we consider a special case where the dictionary is a finite set . A relaxed atomic sparsity model is defined by the convex hull of the dictionary, i.e. . As mentioned in the introduction, it is important to observe if the point ’s are in the unit sphere , then the “sparse” set is no longer in a convex hull of a set with few vectors from the unit sphere. Then the complexity of is upper bounded by the following corollary.
Let and . Then