# On the Restricted Isometry Property of Centered Self Khatri-Rao Products

In this work we establish the Restricted Isometry Property (RIP) of the centered column-wise self Khatri-Rao (KR) products of n× N matrix with iid columns drawn either uniformly from a sphere or with iid sub-Gaussian entries. The self KR product is an n^2× N-matrix which contains as columns the vectorized (self) outer products of the columns of the original n× N-matrix. Based on a result of Adamczak et al. we show that such a centered self KR product with independent heavy tailed columns has small RIP constants of order s with probability at least 1-C(-cn) provided that s≲ n^2/^2(eN/n^2). Our result is applicable in various works on covariance matching like in activity detection and MIMO gain-estimation.

## Authors

• 7 publications
• 31 publications
• ### A Sherman-Morrison-Woodbury Identity for Rank Augmenting Matrices with Application to Centering

Matrices of the form A + (V_1 + W_1)G(V_2 + W_2)^* are considered where ...
03/28/2018 ∙ by Kurt S. Riedel, et al. ∙ 0

• ### Randomized Approximation of the Gram Matrix: Exact Computation and Probabilistic Bounds

Given a real matrix A with n columns, the problem is to approximate the ...
10/05/2013 ∙ by John T. Holodnak, et al. ∙ 0

• ### Recognizing Cartesian products of matrices and polytopes

The 1-product of matrices S_1 ∈R^m_1 × n_1 and S_2 ∈R^m_2 × n_2 is the m...
02/06/2020 ∙ by Manuel Aprile, et al. ∙ 0

• ### Slack matrices, k-products, and 2-level polytopes

In this paper, we study algorithmic questions concerning products of mat...
06/24/2021 ∙ by Manuel Aprile, et al. ∙ 0

• ### Restricted Isometry Property under High Correlations

Matrices satisfying the Restricted Isometry Property (RIP) play an impor...
04/11/2019 ∙ by Shiva Prasad Kasiviswanathan, et al. ∙ 0

• ### The Tensor Quadratic Forms

We consider the following data perturbation model, where the covariates ...
08/07/2020 ∙ by Shuheng Zhou, et al. ∙ 0

• ### Column randomization and almost-isometric embeddings

The matrix A:ℝ^n →ℝ^m is (δ,k)-regular if for any k-sparse vector x, ...
03/09/2021 ∙ by Shahar Mendelson, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In estimation and recovery problems related to empirical second moments, e.g. covariance matching, one often observes a matrix, which is a noisy linear combination of outer products

of random but known vectors . The goal is then to estimate the unknown coefficients from this observed

matrix. In the prototypical example this is an empirical covariance matrix. This problem appears in matrix and tensor recovery problems and many recent applications, see e.g.

[Romera:2016, Dasarathy:2015, Ma:2010, Duan:2016, Haghighatshoar:MVV:arxiv:18] and massive MIMO [Fengler:MassiveAccess:arxiv:2019]. In the case of sparse linear combinations this yields a compressed sensing [Candes2005, Donoho2006a] problem with a random but structured measurement matrix.

There are several known properties which ensure robust and stable -recovery guarantees of the vector of unknown but sparse (or compressible) coefficients. Among them is the restricted isometry property (RIP) of order which ensures that a measurement matrix maps -sparse vectors almost-isometrically, i.e., there exists such that

 (1−δs)∥x∥22≤∥Ax∥22≤(1+δs)∥x∥22 (1)

holds for all –sparse vectors . Several upper bounds on have been established to ensure stable and robust recovery for certain algorithms, see e.g. [candes:rip2008, Foucart2013]. For example, it is known that that -based convex recovery algorithms succeed if [Cai2014]

. For a random matrix with iid. sub-Gaussian components it is known that this property holds with overwhelming probability for

, see for example [Foucart2013] and also [dirksen:ripgap] for a further discussions about its relation to the e.g. the nullspace property. When imposing additional structure on a random measurement matrix often more measurements are required to ensure robust recovery guarantees. However, for many random ensembles it has been shown that it is still possible to achieve, up to log-factors, a linear relation between sparsity and number of measurements , meaning that .

In this work we show that the structure imposed by the problem above indeed also allows robust and stable recovery in the regime since . This addresses a conjecture raised in [Khanna:correction] for non-centered KR product. Some of the essential proof steps have been sketched already in [Hag:isit18].

More precisely, let be a random matrix with independent columns , . The (column-wise) self Khatri-Rao product of the matrix is defined as

 (A⊙A)i:=vec(ai⊗ai)=vec(aiaTi) (2)

where the matrix is the outer product111 with is the vectorization operation identifying a matrix with a vector. of the vector and .

We assume the columns to be normalized in expectation such that and are drawn from an isotropic distribution, i.e.

 E{ai⊗ai}=In. (3)

First results222Note that the results have been corrected in v3 of the preprint on the RIP property for self KR products have been established in [Khanna:KRRIP:arxivv3, Khanna:KRRIP], [Khanna:correction]. In this work it has been shown that for (meaning that ) the -dimensional KR product of a centered iid. sub-Gaussian matrix has RIP with high probability, see [Khanna:KRRIP:arxivv3, Theorem 3]. Thus, the number of measurements scales quadratically in the sparsity . However, we will show below that the scaling is indeed linear when centering the KR product.

We will use the work of [Ada2011] to prove a bound on the RIP constant of the centered and normalized KR product :

 Ai:=κ(n)vec(ai⊗ai−In) (4)

It is easy to see that . is a normalization factor to ensure that the columns of are still normalized after centering: . In general

 κ(n)=n2E{∥vec(ai⊗ai−In)∥22}. (5)

Note that for a vector one has

 ∥vec(a⊗a−In)∥22=n∑i,j=1(aiaj−δij)2=n∑i≠ja2ia2j+n∑i=1(a2i−1)2=(n∑i=1a2i)2−2n∑i=1a2i+n (6)
###### Example 1 (iid with normalized 2. moment).

Let be a random vector with components being independent copies of with . Then

 E{∥vec(a⊗a−In)∥22}=n(n−2+E{a4}) (7)
###### Example 2 (Constant Amplitude).

Let be a random vector such that it’s components

are independent copies of a Rademacher random variable

, i.e., with fixed amplitude

and uniformly distributed sign. From the previous example it follows that

 κ(n)=n2n(n−1)=nn−1. (8)
###### Example 3 (Spherical Distribution).

Let be drawn uniformly from the sphere with radius . Then and it can easily be checked that (6) gives:

 κ(n)=nn−1. (9)

## Ii RIP for Centered KR Products

The -norm for of a real-valued random variable can333this definitions are not unique in the literature such that these norms may differ in constants formally be defined as:

 ∥Y∥ψα=inf{K>0:Eexp(|Y|α/Kα)≤2} (10)

Note that for . If is satisfied, the random variable is called sub-Gaussian for and sub-exponential for . The definitions above extend in a canonical way to random vectors. The -norm of a random vector is defined as the best uniform bound on the -norm of its marginals:

 ∥X∥ψα:=sup∥x∥2=1∥⟨X,x⟩∥ψα. (11)

random variables and vectors for are often called heavy tailed. Note that this terminology is also important if the -norm of a random vectors grows with its dimension.

### Ii-a RIP for Independent Heavy-tailed Columns

As introduced above, the KR product of a random matrix with independent sub-Gaussian isotropic columns is itself a matrix with heavy tailed (sub-exponential) independent columns having a special structure. The RIP properties for the column-independent model with normalized sub-Gaussian isotropic columns have been established in [vershynin_2018]. In a series of works [Ada2011, Guedon2014:heavy:columns, Guedon2017] the heavy tailed column independent model has been further investigated and concrete results can be found for various ensembles. However, the previously investigated ensembles not explicitly discuss the structure imposed by KR products. Thus, we make use of the following generic RIP result from [Ada2011, Theorem 3.3] for matrices with iid sub-exponential columns:

###### Theorem 1 (Theorem 3.3 in [Ada2011]).

Let and be integers such that . Let be independent random vectors normalized such that and let . Let , and set . Then for the matrix with columns ,

 δs(A√m)≤Cξ2√smlog⎛⎜ ⎜⎝eNs√sm⎞⎟ ⎟⎠+θ′ (12)

holds with probability larger then

 1 −exp⎛⎜ ⎜⎝−cK√slog⎛⎜ ⎜⎝eNs√sm⎞⎟ ⎟⎠⎞⎟ ⎟⎠ (13) −P(maxi≤N∥Xi∥2≥K′√m)−P(maxi≤N∣∣ ∣∣∥Xi∥22m−1∣∣ ∣∣≥θ′), (14)

where are universal constants.

We shall use this theorem for and . The key to get a good bound from Theorem 1 is to

1. Show the marginals of the columns of have sub-exponential tails with a sub-exponential norm, which is independent of the dimension .

2. Show that the norm of the columns of concentrate well around their mean.

If the columns of are exactly normalized, then the second point is trivially fulfilled, the latter two terms of (14) vanish and we can choose and to be arbitrary small. We can use the following corollary for matrices with constant norm:

###### Corollary 1.

Let all parameters be as in Theorem 1 with the additional requirement that . Additionally we assume that . Then the RIP constant of order of satisfies

 δs(A√m)<δ (15)

with probability at least as long as

 s≤cξ,δmlog2(eNcξ,δm). (16)

Where and are some universal constants.

###### Proof.

Let us abbreviate . Since , the last two terms in (14) vanish for all and .

 δs≤Cξ2√smlog(eNs√s/m)=:D (17)

with probability larger than

 P(δs≤D)≥1 −exp(−cK√slog(eNs√s/m)) (18)

Let for any . Note that the conditions and guarantee that . Plugging into (17) we see that the RIP-constant satisfies

 δs ≤Cξ2√clog(e(Ncm)3/2log3(eNcm))log(eNcm) (19) =Cξ2√c(32+3loglogeNcmlogeNcm) (20) ≤Cξ2√c(32+3e) (21) ≤3Cξ2√c (22)

where in the first line we made use of and in the last line we used . This bound fails with probability:

 P(δs>D) ≤exp(−^cK√slog(eN√ms3/2)) (23) ≤exp(−^cK√slog(eNm)) (24) ≤exp(−^cK√c√m) (25)

where in the second line it was used that . The statement of the Corollary follows by choosing small enough such that . ∎

### Ii-B The Case of Sub-Gaussian iid Columns ai

We will show here that Corollary 1 holds almost unchanged if where are the columns of the centered self KR product of a matrix with sub-Gaussian iid entries as defined in (4) and Example 1. First we need to show, that the columns are sub-exponential with a -norm independent of . This is a consequence of the Hanson-Wright inequality, which states that every centered quadratic form of independent sub-Gaussian random variables is sub-exponential:

###### Theorem 2 (Hanson-Wright inequality).

Let be a random vector with independent components which satisfy and . Let be a matrix. Then, for every ,

 P{|X⊤YX−EX⊤YX|>t}≤2exp(−cmin(t2B4∥Y∥2F,tB2∥Y∥)) (26)
###### Proof.

See [Rud2013]

With and we denote here operator norm and Frobenius norm of the matrix . Note that a RV with such a mixed tail behavior is especially sub-exponential. This can be seen by bounding its moments. Let be a RV with

 P(|Z|>t)≤2exp(−cmin(t2B4∥Y∥2F,tB2∥Y∥)) (27)

Since , we have for . It follows

 E|Z|p=∫∞0P(|Z|p>u)du=p∫∞0P(|Z|>t)tp−1dt≤2p(B2∥Y∥)p(∫10e−x2xp−1dx+∫∞1e−xxp−1dx)≤2p(B2∥Y∥)p(Γ(p/2)+Γ(p))≤4p(B2∥Y∥)pΓ(p)≤4p(pB2∥Y∥)p (28)

where is the Gamma function. So

 (E|Z|p)1p≤cpB2∥Y∥ (29)

which is equivalent to by elementary properties of sub-exponential random variables.

###### Theorem 3.

Let be a random matrix with sub-Gaussian iid entries, satisfying and and normalized such that . Let be the ’th column of the corresponding centered self KR product as defined in (4) and Example 1. Then

 ∥Ai∥ψ1=sup∥y∥2=1∥⟨Ai,y⟩∥ψ1≤cB2 (30)

for some absolute constant .

###### Proof.

Note that we can rewrite

 ⟨Ai,y⟩ =∑j,kκ(n)(AijAikYjk−E{AijAik}Yjk) (31) =κ(n)(a⊤iYai−Ea⊤iYai) (32)

where the matrix is chosen such that , and therefore . With this, it follows immediately from Theorem 2 that is sub-exponential with

 ∥⟨Ai,y⟩∥ψ1≤cκ(n)B2∥Y∥ (33)

for some absolute constant . It holds that . We see from example 1 that . By Jensen inequality , so . ∎

To apply Theorem 1 to we need to show that the norm of it’s columns concentrate well around their mean. This is the subject of the following theorem.

###### Theorem 4.

Let be the columns of the centered self KR product of a centered, normalized sub-Gaussian iid matrix as in Theorem 3. Let denote the distribution of the entries of . Then for it holds:

 P(maxi≤N∣∣ ∣∣∥Ai∥22n2−1∣∣ ∣∣≥t)≤Cexp(logN−cB2√tn) (34)

if satisfies

 n≥1+(Ea4−1)(3/t−1) (35)
###### Proof.

By union bound we have that

 P(maxi≤N∣∣ ∣∣∥Ai∥22n2−1∣∣ ∣∣≥t)≤NP(∣∣ ∣∣∥Ai∥22n2−1∣∣ ∣∣≥t) (36)

Furthermore, with the abbreviation , we have (see example 1)

 ∥Ai∥22=κ(n)(S2−2S+n) (37)

which can be rewritten as

 ∥Ai∥22=κ(n)((S−n)2+2(n−1)(S−n)+n(n−1)). (38)

Example 1 shows that , thus

 ∥Ai∥22n2−1 =κ(n)(S−n)2+2(n−1)(S−n)−n(Ea4−1))n2 (39) =:a+b+c (40)

with , and . We can estimate the one sided tail by

 P(a+b+c>t) ≤P(a>t3)+P(b>t3)+P(c>t3) (41)

Therefore is a sum of independent zero mean sub-exponential random variables with , as a centering argument and the identity for sub-Gaussian random variables shows, e.g. [vershynin_2018, Ch. 2.7]. Therefore the elemental Bernstein inequality gives that

 P(|S−n|>nt)≤2exp(−~cnmin(t2B4,tB2)) (42)

Then the same argument as in (29) shows that

 P(|S−n|>nt)≤2exp(−cnt/B2) (43)

for some constant . So in particular

 P(b>t3) ≤2exp(−ctn26B2κ(n)(n−1)) (44) ≤2exp(−c′B2tn) (45)

where in the last step we used that and . The probability of deviation of can be bound as follows:

 P(a>t3) =P(|S−n|>√n2t3κ(n)) (46) ≤2exp(−cB2√tn23κ(n)) (47) ≤2exp(−c′√tn/B2) (48)

Finally

 P(c>t3)={0 if n≥1+(Ea4−1)(3/t−1)1 o.w. (49)

For the other tail , notice that and are non-negative. For this is obvious, for it follows from Jensen inequality and . Therefore . ∎

The third term in (14) is the probability of a one sided deviation of and therefore we can bound it by the same term as in theorem 4:

 P(maxi≤N∥Xi∥2≥K′n) ≤NP(∥Xi∥2n2−1>K′2−1) ≤NP(∣∣∣∥Xi∥2n2−1∣∣∣>K′2−1) (50)

Now we can state that the result of Corollary 1 holds almost unchanged, except for different constants, for the self KR product of an iid sub-Gaussian matrix:

###### Theorem 5.

Let and be integers such that and . Let be a random matrix with sub-Gaussian iid entries, distributed according to , with , and . Let be the centered and rescaled self-KR product of as defined in (4). Then the RIP constant of order of satisfies

 δs(An)<δ (51)

for any with probability larger then

 P(δs≥δ)≥1−Cexp(−cn/B2) (52)

as long as

 s≤cξ,δn2log2(eNcξ,δn2) (53)

and

 n≥max(c1logN,1+c2B2(6/δ−1)). (54)

Where with . For some universal constants .

###### Proof.

Theorem 3 shows that the columns of are sub-exponential. So the prerequisites of Theorem 1 are fulfilled with , for some absolute constant , and . We set and . Furthermore we can set , such that . Theorem 4, with and (II-B) show that there exist constants

 P(maxi≤N∥Xi∥2≥K′n)+P(maxi≤N∣∣ ∣∣∥Ai∥22n2−1∣∣ ∣∣≥θ′)≤Cexp(logN−~cB2n) (55)

if . (the latter is simply a constant, since is bounded by for sub-Gaussian .) Choosing in the condition (54) large enough, such that , we can guarantee that

 exp(logN−~cn/B2)≤exp(−cn/B2). (56)

with . So Theorem 1 gives that

 δs(An)≤Cξ2√sn2log⎛⎜ ⎜⎝eNs√sn2⎞⎟ ⎟⎠ (57)

holds with probability larger then

 1 −Cexp⎛⎜ ⎜⎝−c√slog⎛⎜ ⎜⎝eNs√sn2⎞⎟ ⎟⎠⎞⎟ ⎟⎠−~Cexp(−~cn/B2) (58)

Then, the same calculation as in the proof of Corollary 1 shows, that there is a constant such that setting leads to the result of this theorem. ∎

### Ii-C Spherical Columns ai

Let be a matrix such that its columns are drawn iid from a sphere with radius . See Example 3. Since the columns are now exactly normalized we can apply Corollary 1, if we can show that columns of the centered self-KR product have sub-exponential marginals, with a sub-exponential norm independent of the dimension. For this we can use the following result from [Ada2015] which states that a random vector which satisfies the convex concentration property also satisfies the Hanson-Wright inequality:

###### Theorem 6 (Theorem 2.5 in [Ada2015]).

Let be a mean zero random vector in , which satisfies the convex concentration property with constant , then for any matrix and every ,

 P{|X⊤YX −EX⊤YX|>t} ≤2exp(−cmin(t22B4∥Y∥2F,tB2∥Y∥)) (59)

The convex concentration property is defined as follows

###### Definition 1 (Convex Concentration Property).

Let be a random vector in . has the convex concentration property with constant if for every 1-Lipschitz convex function , we have and for every ,

 P{|ϕ(x)−Eϕ(X)|≥t}≤2exp(−t2/K2) (60)

A classical result states that a spherical random variable has the even stronger (non-convex) concentration property (e.g. [vershynin_2018, Theorem 5.1.4]):

###### Theorem 7 (Concentration on the Sphere).

Let be uniformly distributed on the Euclidean sphere of radius . Then there is an absolute constant , such that for every 1-Lipschitz function

 P{f(X)−Ef(X)}≤2exp(−ct2) (61)

So in particular has the convex concentration property with constant and it follows by Theorem 6 that it also satisfies the tail bound of the Hanson-Wright inequality. As shown in (29), this implies that the columns of are sub-exponential with for some absolute constant . With this we can apply Corollary 1.

###### Remark 1.

In this section we did not specifically use the property that the columns of are drawn iid from the sphere, but only their convex concentration property. So the results also hold for the larger class of normalized columns with dependent entries, i.e. those which satisfy the convex concentration property. E.g. it is known that satisfies the convex concentration property if its entries are drawn iid without replacement from some fixed set of numbers with . For more examples see [Ada2015]. Also note that the sub-Gaussian iid case of section II-B is not covered by Theorem 6, since with sub-Gaussian iid does not, in general, have the convex concentration property with a constant independent of dimension [Ada2015].

## Acknowledgments

We thank Fabian Jänsch, Radoslaw Adamczak, Saeid Haghighatshoar and Giuseppe Caire for fruitful discussions. PJ has been supported by DFG grant JU 2795/3.