Generalized notions of sparsity and restricted isometry property. Part II: Applications

06/28/2017 ∙ by Marius Junge, et al. ∙ 0

The restricted isometry property (RIP) is a universal tool for data recovery. We explore the implication of the RIP in the framework of generalized sparsity and group measurements introduced in the Part I paper. It turns out that for a given measurement instrument the number of measurements for RIP can be improved by optimizing over families of Banach spaces. Second, we investigate the preservation of difference of two sparse vectors, which is not trivial in generalized models. Third, we extend the RIP of partial Fourier measurements at optimal scaling of number of measurements with random sign to far more general group structured measurements. Lastly, we also obtain RIP in infinite dimension in the context of Fourier measurement concepts with sparsity naturally replaced by smoothness assumptions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The restricted isometry property (RIP) has been used as a universal tool in data recovery. In a companion Part I paper [JL17], we introduced the generalized notion of sparsity and provided a far reaching generalization of the RIP theory by Rudelson and Vershynin [RV08] and subsequent improvements [Rau10, Dir15] in a unified framework. In this paper we explore how the RIP results on generalized sparsity models in the Part I paper [JL17] apply to challenging scenarios not covered by existing theory. Specifically, we illustrate our findings with the examples below.

1.1. Optimizing RIP with families of Banach spaces

The first example considers the RIP for the canonical sparsity model in , which is determined by counting the number of nonzero elements. Here measurements are obtained as inner products with functionals for

, which are given as the translates of the quantum Fourier transform of a fixed measurement instrument

expressed by

The special case of has been well studied as partial Fourier measurements (see e.g. [CT06, RV08]). We are interested in a scenario where each spectral measurement is taken with a finitely supported window having a specific decaying pattern. The following theorem, obtained as a consequence of the main results in the part I paper [JL17], shows how the number of measurements for the RIP can be optimized over a choice of Banach spaces.

Theorem 1.1.

Let . Suppose that . Then

holds with high probability for independently chosen random pairs

provided

(1)

By optimizing in (1) over , one obtains the assertion if

where the sparsity parameter is defined by

For the flat vector the choice appears optimal. However, for a short window such that the magnitudes show a polynomial decay of order for some , the optimal choice is given by . Our main tool in this analysis is a flexible framework that derives the RIP for various sparsity models defined by a family of Banach spaces [JL17]. We recall that is -sparse if

where is a Banach space with unit ball . For , we see that -sparse vectors are -sparse and more generally -sparse for , where denotes the unit ball of . We derive Theorem 1.1 with , where is a conjugate pair such that .

1.2. Preserving distance of sparse vectors in generalized models

The conventional notion of sparsity is given by the geometry of a union of subspaces and provides a special feature that the sparsity level is sub-additive. Unfortunately, this property does not hold for our modified sparsity model, which is given by a nonconvex cone. Particularly, compared to the conventional sparsity model, a central drawback in the generalization is that the difference of two -sparse vectors and is no longer -sparse. In fact, the adversarial instance of can attain the maximum (trivial) sparsity level. Therefore the RIP does not necessarily imply that the distance of sparse vectors is preserved and one may not distinguish two generalized sparse vectors from their images in low dimension. Instead, we provide a weaker substitute for the generalized model that allows to preserve the distance in certain sense.

Here we adopt the group action arguments to generate measurements [JL17]. Let be a finite group with an affine isotropic representation (see Section 3 for a precise definition, but we may work with the transformations given by shifts and modulations as above).

Theorem 1.2.

Let and be as above. Let

be independent copies of a Haar-distributed random variable on

. Let with . Suppose that

Then for all

holds with high probability for all -sparse vectors and such that

Moreover, the following results hold with high probability for all unit-norm and -sparse vectors and . If

then

Otherwise, .

The first part of Theorem 1.2 shows that two -sparse vectors may be distinguished from a small number of their measurements if the difference is sparse up to a certain level (much higher than for small ). On the other hand, the second part of Theorem 1.2 implies that one can distinguish two sparse unit-norm vectors if the distance between their measurements is larger than a certain threshold. Otherwise, the vectors are contained in a neighborhood of radius . These results are weaker than the analogous versions with the subadditivity of the sparsity level, but they can be still useful in some applications such as locality-sensitive hashing.

1.3. Improving group structured measurements with further randomization

In the third illustration, we discuss variations of group structured measurements combined with further randomization. The number of group structured measurements for the RIP, derived in the Part I paper [JL17], may scale worse compared to the optimal subgaussian case given by Gordon’s lemma [Gor88]

. In fact, this was the case with the low-rank tensor example. We propose two different ways to improve the RIP results for group structured measurements with more randomness.

The first approach uses the composition of the group structure measurement system followed by a subgaussian matrix, where the second step further compresses the data. An application of Rosenthal’s inequality shows that this composition system also achieves the same optimal scaling as one by a pure subgaussian measurement system. In fact, the group structured measurement system has already reduced the data dimension significantly and the subsequent system given as a small subgaussian matrix requires much less computation compared to the case of applying a single large subgaussian matrix.

The second approach, inspired by a recent RIP result by Oymak et al. [ORS15], achieves the optimal scaling by preprocessing the data with multiplication with a random sign pattern before applying the group structured measurement system. This composition system is also interpreted as a single group structured measurement system that employs a larger group for the group actions.

We compare the RIP results for the modified group structured measurements in the example of low-rank tensors. Here the Banach space defining a sparsity model is given as the -fold tensor product of with the largest tensor norm, which is a natural extension of the bilinear sparsity model given by Schatten 1-class.

1.4. RIP for infinite dimensional sparsity models

Lastly, we illustrate the RIP for infinite dimensional sparsity models, which are motivated by compressive Fourier imaging. Our goal in this example is to construct a sparsity model in the function space without discretization and establish the RIP theory on this model. In Fourier imaging, the measurements are obtained as

where denotes the complex sinusoid defined by for and is a signal in . When the support of is restricted to , the sequence corresponds to the Fourier series and the map from to is bijective.

We consider measurements in the form of

(2)

where denotes the circular shift operator. We would like to preserve the norm of by measurements from translates of , which are given in (2) for . Ideally, in the noise-free case, shifts can provide unique identification of . However, in this case, the measurement system is ill-conditioned. However, as we show in the following theorem, a sparsity model in allows that the -norm of the subsequence is preserved by fewer measurements.

Theorem 1.3.

Let be a seminorm on defined by

and

for and , where denote the Lebesgue measure. Suppose that are independent copies of a uniform random variable on . Then suffices to satisfy

with high probability.

The sparsity model is determined by two parameters and . Sparsity is measured as the relative occupancy in the Lebesgue measure and smoothness is measured by the other parameter . The seminorm is preserved by fewer translates for smaller . Particularly, the number of measurements can be sublinear for .

Our theory improves on existing results in several ways described below. First, unlike the conventional compressed Fourier imaging (e.g., [CRT06]), our measurement model does not involve any discretization and is consistent with the physics in acquisition systems. Second, our setup employs a more flexible sparsity model and considers a realistic scenario where only finitely many measurements are available. Sub-Nyquist sampling of multiband signals [FB96, Fen98, ME09, MEE11] is considered as compressed sensing of analog sparse signals. The multiband sparse signal model in is defined so that the support is restricted to be on a few active blocks of . This is far more restrictive than our infinite dimensional sparsity model.

1.5. Organization

Thus far we have demonstrated snapshots of our results in their simplified forms. Full results in detail on each topic are presented in later sections as follows. Section 2 discusses how the main results in the Part I paper [JL17] can be utilized to optimize the number of measurements for the RIP over the choice of Banach spaces and 1-homogeneous functions. This result is illustrated with the example of partial windowed Fourier transform and its noncommutative version. Section 3 provides a general framework that preserves the distance of sparse vectors without the subadditivity of the sparsity level. Section 4 proposes two ways to further improve the number of group structured measurements for the RIP with more randomness. Obtained results are compared over the low-rank tensor example. Lastly, in Section 5, we illustrate how (semi)norms are preserved by finitely many measurements using various infinite dimensional sparsity models and identify the number of measurements in each scenario.

1.6. Notation

In this paper, the symbols and will be reserved for numerical constants, which might vary from line to line. We will use notation for various Banach spaces and norms. The norm of a Banach space is denoted by . For example, denote the norm that defines . We will use the shorthand notation for the -norm for . The operator norm will be denoted by . For , the unit ball in will be denoted by . The identity operator will be denoted by .

2. Optimizing Restricted Isometry Property

In this section, we present two examples where one can optimize the number of measurements for the RIP with respect to given conventional sparsity models. Specifically, we consider the canonical sparsity model and low-rank matrix model. In the literature, relaxation of these models with corresponding Banach spaces ( for the canonical sparsity model and for the low-rank matrix model) provided the RIP from a near optimal number of incoherent measurements. However, in practice, there exist physical constraints on designing the measurement system and ideally incoherent instruments are not always available. In this situation, we demonstrate that the number of measurements for the RIP can be optimized via our general framework in the Paper I [JL17].

2.1. RIP of subsampled short-time discrete Fourier transform

The first example provides the RIP of a partial short-time Fourier transform, which can be considered as a non-ideal version of a partial Fourier operator. Let be a window function. The windowed discrete Fourier transform of is given by

(3)

where the time indices are modulo . Let , where is the orthogonal group, be an isotropic affine representation given by

where is the usual modulation defined by

and is the circular shift modulo such that

Here, are the standard basis vectors in . Let and . Then the windowed DFT coefficient in (3) is a group measurement given by

In signal processing, particularly for large , it is usual to take a spectral measurement from a finite block of the signal . Then the resulting measurements correspond to short-time discrete Fourier transform (STDFT) of with a given window function . Typically, to avoid the leakage due to the discontinuity at the boundary, windows are designed with decaying magnitudes.

We consider the RIP of subsampled STDFT measurements on -sparse signals with respect to the canonical sparsity model, i.e. on the set , where counts the number of nonzero elements. In our generalized notion of sparsity, there exist convex sets such that every -sparse vector in is -sparse, i.e. , where is the Banach space with unit ball . For example, one can choose with but this is not the only choice. By using our general framework from the Part I paper [JL17], it is possible to optimize the number of measurements for the RIP on over the choice of the convex set and the parameter as shown in the following theorem.

Theorem 2.1 (Partial STDFT with a decaying window).

Let be independent copies of a uniform random variable on . For , let denotes the rearrangement of in the non-increasing order. Suppose that

for and , where is a constant such that . Then there exists a numerical constant such that

(4)

provided

Remark 2.2.

Alternatively, applying [JL17, Theorem 5.1] with provides the same RIP as in Theorem 2.1 for

(5)

Note that the optimized number of measurements for the RIP in Theorem 2.1 is smaller than that in (5) by factor .

Proof of Theorem 2.1.

Let and . Then it follows that since , where , is of type 2 and the type 2 constant is upper bounded by [Car85, Lemma 3]. Let . Then each -sparse satisfies . Therefore, by [JL17, Theorem 5.3], the assertion in (4) holds if

(6)

It remains to compute . First of all, the normalization constant satisfies

Next we compute in the following three cases for .

Case 1:

Case 2:

(7)

Case 3:

Note that the second case has the smallest upper bound. Applying (7) to (6) completes the proof. ∎

2.2. Deterministic instrument for Schatten classes

Next, we show an analogous result in the noncommutative case. Here, we consider the RIP of the group structured measurements restricted to the set of rank- matrices, where the affine representation corresponds to the double quantum Fourier-transform, i.e.

Let be a Haar-distributed random variable on . Then

holds for all linear operator on . Hence the affine representation is isotropic.

We first recall the RIP result for an arbitrary instrument in this case.

Proposition 2.3.

Let and be a fixed vector such that . Then

holds provided

Proof.

The Schatten class has type with constant [TJ89]. Therefore the main technical result in [JL17, Theorem 5.3] applies here with . ∎

Moreover, for a rank- matrix we always have

and hence may apply the previous result for and get the following corollary.

Corollary 2.4.

Let be a fixed vector such that . Then

holds provided

where

The parameter in Corollary 2.4 denotes the optimized sparsity level, which is determined by the rank and the instrument . Next we demonstrate the number of group structured measurements for the RIP given by Corollary 2.4 for particular choices of the instrument . The first example considers an ideal instrument that generates incoherent measurements.

Example 2.5.

Let , where is the -by-identity matrix. Then . For this particular choice of , we also have

Therefore, the number of distinct elements in the orbit of is instead of . In other words, we sample from possible measurements. On the other hand, since , choosing gives . The factor accounts for geometry of the Schatten class and will disappear in the commutative case.

Similar to the partial STFT example in the commutative case, the second example considers a non-ideal instrument with fast decaying singular values.

Example 2.6.

Let satisfy and

for , where denotes the th singular value of in the non-increasing order. Similar to the proof of Theorem 2.1, we show

We see that the function is decreasing for and the function is increasing for . Thus is the best choice and we deduce that . This upper bound on the number of measurements for RIP is smaller than the choice of . Unlike the previous example, the instrument produces less incoherent measurements, which can be compensated with a penalty in the number of measurements for small .

3. RIP on difference of sparse vectors

In the conventional sparsity models given by a union of subspaces, the sparsity level measured by either the number of nonzeros or the rank satisfies the sub-additivity. An important implication is that the difference of two sparse vectors is still sparse up to the sum of the sparsity levels of each sparse vector. However, this is not the case for generalized sparsity models given by a nonconvex cone. Therefore, the RIP does not automatically preserve the distance between sparse vectors in the general setup. However, with some careful arguments, one can show that the distance is still preserved in some weaker sense. In this section, we will discuss this problem using the notion of the multiresolution restricted isometry property (MRIP).

The MRIP was originally proposed for the canonical sparsity model by Oymak et al. [ORS15]. We generalize it to general sparsity models with a slight modification. Let be a Hilbert space and be a Banach space with unit ball , where denotes the unit ball in . Note that if has a non-empty interior there exists a number

(8)

such that holds for all . For example, if and , then . Given the definition of , we state the definition of the MRIP as follows.

Definition 3.1 (Multiresolution restricted isometry property).

Let be a Hilbert space, be a convex set, be a Banach space with unit ball , and be a constant defined in (8). We say that satisfies the MRIP with distortion at sparsity level if

holds for all .

The following lemma shows that the MRIP can preserve the distance of two sparse vectors when the sparsity level of the difference is below a certain threshold.

Lemma 3.2.

Let , , , and be defined as above, , and . Suppose that satisfies the MRIP with distortion at sparsity level . Then for all

Moreover for any

(9)

provided that are -sparse and

Remark 3.3.

Lemma 3.2 preserves the distance of two -sparse vectors by (9) if the sparsity level of is below the threshold , which is higher than the sparsity levels of and for small

. The estimate in (

9) implies that the distortion is strictly less than , which implies a local injectivity.

Since the Hilbert space norm is preserved by the RIP up to a small distortion , we can always compare two spare vectors after normalization. Suppose that . Then (9) also implies that the distortion is no bigger than . Although this distortion bound is more conservative than , which is available if the sparsity level is subadditive, it can be still useful for certain applications. For example, similar deviation bounds have been used in the analysis of iterative optimization algorithms for matrix completion (see [CW15, Lemma 5] and [ZL16, Lemma 8]). We expect that this weak preservation of the difference of two sparse vectors can be useful in generalizing existing theory to a wider class of sparsity model.

Proof of Lemma 3.2.

Let denote the difference between and . We may find such that and

Then is -sparse, where , and hence

where . Let us first assume that . Then we get

(10)

In case we get

(11)

This proves the first assertion. Next, for the second assertion, we assume that are -sparse. Then we have

(12)

If additionally satisfies

then

which implies . Therefore, we apply the estimate in (10) with the upper bound in (12). ∎

The following corollary of Lemma 3.2 shows that two unit-norm and -sparse vectors can be distinguished if they are well separated in the measurement domain. Otherwise, and are close in the original space. This result is weaker than the preservation of the distance but applies to a wider class of models. This result can be used in some applications such as locality-sensitive hashing.

Corollary 3.4.

Suppose the hypothesis of Lemma 3.2. Let be unit-norm and -sparse vectors. If

(13)

then

(14)

Otherwise,

Proof.

Let . Then by Lemma 3.2, we have

(15)

We proceed the proof in the following two complementary cases.

Case 1: .
It follows that and the maximum in (15) is attained in the first term. Thus by (13) and (15), we have

which implies . Then by (13) and (15) we have

which implies (14).

Case 2: .
By (15) and the upper estimate for we deduce

which implies (14). Thus the first part is proved.

For the second part, we suppose that and hold simultaneously. By (15) and the fact that the first term achieves the maximum in in (15),

Then it follows that

which is a contradiction. Therefore, implies . This completes the proof.∎

Remark 3.5.

We did not optimize the constants in Corollary 3.4. More generally, we find a conditional RIP property below. Fix . Let satisfy . If , then

Otherwise, . One can optimize the constants and to tighten the estimate.

Remark 3.6.

Corollary 3.4 implies that if the distance between and for two unit-norm sparse vectors and is larger than , then the distance between and in is equivalent to up to a constant factor. In other words, one can distinguish and from their linear measurements. However, if and are close by satisfying , then Corollary 3.4 only confirms that is less than , i.e. one cannot distinguish two similar sparse vectors and from their measurements. Note that we did not optimize the constants in Corollary 3.4. Obviously, this result is weaker than the uniform preservation of distance of any two sparse vectors (regardless of the amount of distance) given by RIP with respect to an exact sparsity model. However, this weak property will be still useful in applications. For example, in clustering sparse vectors, if the centroids of clusters are well separated via the dimensionality reduction via , then one can compute clustering in the compressed domain.

Remark 3.7.

Corollary 3.4 provides a recovery guarantee by a nonconvex programming. Suppose that satisfies and . Let be the solution to

(16)

Since is feasible for the program in (16), we have . Moreover, also satisfies . Therefore, by Corollary 3.4, it follows that . Without the unit-norm constraint, the optimization in (16) becomes a convex program. We will pursue a guarantee for this convex program in a future work.

Remark 3.8.

We may even replace this by

Then we conclude that , and hence we can allow small random errors.

Next we show that the MRIP holds for group structured measurements. To avoid redundancy, we illustrate a single example where the sparsity model is given with a polytope .

Lemma 3.9.

Let and be an absolute convex hull of points, and have enough symmetry with an isotropic affine representation , be the Banach with unit ball , be independent copies of a Haar-distributed random variable on , be defined in (8), and satisfy . Then there exists a numerical constant such that satisfies the MRIP with distortion at level with probability provided

Proof.

Let . Note that

Therefore, by [JL17, Theorem 5.1],

holds provided

Since was arbitrary, applying the union bound over all satisfying gives the assertion.∎

4. Group structured measurements with further randomization

In the Part I paper [JL17], we have shown that the number of randomly sampled group structured measurements for the RIP scales near optimally for certain sparsity models (e.g., sparsity models with respect to Banach spaces or ). However, in general, one needs a larger number of measurements for the group structured case than that for Gaussian measurements. In this section, we propose two different ways to improve the sample complexity for the RIP of randomly sampled group structured measurements. To this end, we first recall the optimal result on the number of Gaussian measurements for the RIP. Gordon’s lemma [Gor88] shows that a minimal number of Gaussian measurements provide the dimensionality reduction of an arbitrary set consisting of unit-norm vectors.

Lemma 4.1 (Gordon’s escape through the mesh [Gor88]).

Let , be a subset of the unit sphere , and be independent copies of a standard Gaussian random vector . Then

holds provided

<