In estimation and recovery problems related to empirical second moments, e.g. covariance matching, one often observes a matrix, which is a noisy linear combination of outer productsof random but known vectors . The goal is then to estimate the unknown coefficients from this observed
matrix. In the prototypical example this is an empirical covariance matrix. This problem appears in matrix and tensor recovery problems and many recent applications, see e.g.[Romera:2016, Dasarathy:2015, Ma:2010, Duan:2016, Haghighatshoar:MVV:arxiv:18] and massive MIMO [Fengler:MassiveAccess:arxiv:2019]. In the case of sparse linear combinations this yields a compressed sensing [Candes2005, Donoho2006a] problem with a random but structured measurement matrix.
There are several known properties which ensure robust and stable -recovery guarantees of the vector of unknown but sparse (or compressible) coefficients. Among them is the restricted isometry property (RIP) of order which ensures that a measurement matrix maps -sparse vectors almost-isometrically, i.e., there exists such that
holds for all –sparse vectors . Several upper bounds on have been established to ensure stable and robust recovery for certain algorithms, see e.g. [candes:rip2008, Foucart2013]. For example, it is known that that -based convex recovery algorithms succeed if [Cai2014]
. For a random matrix with iid. sub-Gaussian components it is known that this property holds with overwhelming probability for, see for example [Foucart2013] and also [dirksen:ripgap] for a further discussions about its relation to the e.g. the nullspace property. When imposing additional structure on a random measurement matrix often more measurements are required to ensure robust recovery guarantees. However, for many random ensembles it has been shown that it is still possible to achieve, up to log-factors, a linear relation between sparsity and number of measurements , meaning that .
In this work we show that the structure imposed by the problem above indeed also allows robust and stable recovery in the regime since . This addresses a conjecture raised in [Khanna:correction] for non-centered KR product. Some of the essential proof steps have been sketched already in [Hag:isit18].
More precisely, let be a random matrix with independent columns , . The (column-wise) self Khatri-Rao product of the matrix is defined as
where the matrix is the outer product111 with is the vectorization operation identifying a matrix with a vector. of the vector and .
We assume the columns to be normalized in expectation such that and are drawn from an isotropic distribution, i.e.
First results222Note that the results have been corrected in v3 of the preprint on the RIP property for self KR products have been established in [Khanna:KRRIP:arxivv3, Khanna:KRRIP], [Khanna:correction]. In this work it has been shown that for (meaning that ) the -dimensional KR product of a centered iid. sub-Gaussian matrix has RIP with high probability, see [Khanna:KRRIP:arxivv3, Theorem 3]. Thus, the number of measurements scales quadratically in the sparsity . However, we will show below that the scaling is indeed linear when centering the KR product.
We will use the work of [Ada2011] to prove a bound on the RIP constant of the centered and normalized KR product :
It is easy to see that . is a normalization factor to ensure that the columns of are still normalized after centering: . In general
Note that for a vector one has
Example 1 (iid with normalized 2. moment).
Let be a random vector with components being independent copies of with . Then
Example 2 (Constant Amplitude).
Example 3 (Spherical Distribution).
Let be drawn uniformly from the sphere with radius . Then and it can easily be checked that (6) gives:
Ii RIP for Centered KR Products
The -norm for of a real-valued random variable can333this definitions are not unique in the literature such that these norms may differ in constants formally be defined as:
Note that for . If is satisfied, the random variable is called sub-Gaussian for and sub-exponential for . The definitions above extend in a canonical way to random vectors. The -norm of a random vector is defined as the best uniform bound on the -norm of its marginals:
random variables and vectors for are often called heavy tailed. Note that this terminology is also important if the -norm of a random vectors grows with its dimension.
Ii-a RIP for Independent Heavy-tailed Columns
As introduced above, the KR product of a random matrix with independent sub-Gaussian isotropic columns is itself a matrix with heavy tailed (sub-exponential) independent columns having a special structure. The RIP properties for the column-independent model with normalized sub-Gaussian isotropic columns have been established in [vershynin_2018]. In a series of works [Ada2011, Guedon2014:heavy:columns, Guedon2017] the heavy tailed column independent model has been further investigated and concrete results can be found for various ensembles. However, the previously investigated ensembles not explicitly discuss the structure imposed by KR products. Thus, we make use of the following generic RIP result from [Ada2011, Theorem 3.3] for matrices with iid sub-exponential columns:
Theorem 1 (Theorem 3.3 in [Ada2011]).
Let and be integers such that . Let be independent random vectors normalized such that and let . Let , and set . Then for the matrix with columns ,
holds with probability larger then
where are universal constants.
We shall use this theorem for and . The key to get a good bound from Theorem 1 is to
Show the marginals of the columns of have sub-exponential tails with a sub-exponential norm, which is independent of the dimension .
Show that the norm of the columns of concentrate well around their mean.
If the columns of are exactly normalized, then the second point is trivially fulfilled, the latter two terms of (14) vanish and we can choose and to be arbitrary small. We can use the following corollary for matrices with constant norm:
Let all parameters be as in Theorem 1 with the additional requirement that . Additionally we assume that . Then the RIP constant of order of satisfies
with probability at least as long as
Where and are some universal constants.
Let us abbreviate . Since , the last two terms in (14) vanish for all and .
with probability larger than
Let for any . Note that the conditions and guarantee that . Plugging into (17) we see that the RIP-constant satisfies
where in the first line we made use of and in the last line we used . This bound fails with probability:
where in the second line it was used that . The statement of the Corollary follows by choosing small enough such that . ∎
Ii-B The Case of Sub-Gaussian iid Columns
We will show here that Corollary 1 holds almost unchanged if where are the columns of the centered self KR product of a matrix with sub-Gaussian iid entries as defined in (4) and Example 1. First we need to show, that the columns are sub-exponential with a -norm independent of . This is a consequence of the Hanson-Wright inequality, which states that every centered quadratic form of independent sub-Gaussian random variables is sub-exponential:
Theorem 2 (Hanson-Wright inequality).
Let be a random vector with independent components which satisfy and . Let be a matrix. Then, for every ,
See [Rud2013] ∎
With and we denote here operator norm and Frobenius norm of the matrix . Note that a RV with such a mixed tail behavior is especially sub-exponential. This can be seen by bounding its moments. Let be a RV with
Since , we have for . It follows
where is the Gamma function. So
which is equivalent to by elementary properties of sub-exponential random variables.
To apply Theorem 1 to we need to show that the norm of it’s columns concentrate well around their mean. This is the subject of the following theorem.
Let be the columns of the centered self KR product of a centered, normalized sub-Gaussian iid matrix as in Theorem 3. Let denote the distribution of the entries of . Then for it holds:
By union bound we have that
Furthermore, with the abbreviation , we have (see example 1)
which can be rewritten as
Example 1 shows that , thus
with , and . We can estimate the one sided tail by
Therefore is a sum of independent zero mean sub-exponential random variables with , as a centering argument and the identity for sub-Gaussian random variables shows, e.g. [vershynin_2018, Ch. 2.7]. Therefore the elemental Bernstein inequality gives that
Then the same argument as in (29) shows that
for some constant . So in particular
where in the last step we used that and . The probability of deviation of can be bound as follows:
For the other tail , notice that and are non-negative. For this is obvious, for it follows from Jensen inequality and . Therefore . ∎
Now we can state that the result of Corollary 1 holds almost unchanged, except for different constants, for the self KR product of an iid sub-Gaussian matrix:
Let and be integers such that and . Let be a random matrix with sub-Gaussian iid entries, distributed according to , with , and . Let be the centered and rescaled self-KR product of as defined in (4). Then the RIP constant of order of satisfies
for any with probability larger then
as long as
Where with . For some universal constants .
Theorem 3 shows that the columns of are sub-exponential. So the prerequisites of Theorem 1 are fulfilled with , for some absolute constant , and . We set and . Furthermore we can set , such that . Theorem 4, with and (II-B) show that there exist constants
if . (the latter is simply a constant, since is bounded by for sub-Gaussian .) Choosing in the condition (54) large enough, such that , we can guarantee that
with . So Theorem 1 gives that
holds with probability larger then
Then, the same calculation as in the proof of Corollary 1 shows, that there is a constant such that setting leads to the result of this theorem. ∎
Ii-C Spherical Columns
Let be a matrix such that its columns are drawn iid from a sphere with radius . See Example 3. Since the columns are now exactly normalized we can apply Corollary 1, if we can show that columns of the centered self-KR product have sub-exponential marginals, with a sub-exponential norm independent of the dimension. For this we can use the following result from [Ada2015] which states that a random vector which satisfies the convex concentration property also satisfies the Hanson-Wright inequality:
Theorem 6 (Theorem 2.5 in [Ada2015]).
Let be a mean zero random vector in , which satisfies the convex concentration property with constant , then for any matrix and every ,
The convex concentration property is defined as follows
Definition 1 (Convex Concentration Property).
Let be a random vector in . has the convex concentration property with constant if for every 1-Lipschitz convex function , we have and for every ,
A classical result states that a spherical random variable has the even stronger (non-convex) concentration property (e.g. [vershynin_2018, Theorem 5.1.4]):
Theorem 7 (Concentration on the Sphere).
Let be uniformly distributed on the Euclidean sphere of radius . Then there is an absolute constant , such that for every 1-Lipschitz function
So in particular has the convex concentration property with constant and it follows by Theorem 6 that it also satisfies the tail bound of the Hanson-Wright inequality. As shown in (29), this implies that the columns of are sub-exponential with for some absolute constant . With this we can apply Corollary 1.
In this section we did not specifically use the property that the columns of are drawn iid from the sphere, but only their convex concentration property. So the results also hold for the larger class of normalized columns with dependent entries, i.e. those which satisfy the convex concentration property. E.g. it is known that satisfies the convex concentration property if its entries are drawn iid without replacement from some fixed set of numbers with . For more examples see [Ada2015]. Also note that the sub-Gaussian iid case of section II-B is not covered by Theorem 6, since with sub-Gaussian iid does not, in general, have the convex concentration property with a constant independent of dimension [Ada2015].
We thank Fabian Jänsch, Radoslaw Adamczak, Saeid Haghighatshoar and Giuseppe Caire for fruitful discussions. PJ has been supported by DFG grant JU 2795/3.