Kesten-McKay law for random subensembles of Paley equiangular tight frames

05/10/2019 ∙ by Mark Magsino, et al. ∙ 0

We apply the method of moments to prove a recent conjecture of Haikin, Zamir and Gavish (2017) concerning the distribution of the singular values of random subensembles of Paley equiangular tight frames. Our analysis applies more generally to real equiangular tight frames of redundancy 2, and we suspect similar ideas will eventually produce more general results for arbitrary choices of redundancy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Frame theory concerns redundant representation in a Hilbert space. A frame [13] is a sequence in a Hilbert space for which there exist such that

for every . If every has unit norm, then we say the frame is unit norm, and if , we say the frame is tight [9]. In the special case where , a frame is simply a spanning set, but unit norm tight frames are still interesting and useful [4, 21]. For example, equiangular tight frames are unit norm tight frames with the additional property that is constant over the choice of pair . Equiangular tight frames are important because they necessarily span optimally packed lines, which in turn find applications in multiple description coding [25], digital fingerprinting [22], compressed sensing [1], and quantum state tomography [24]; see [14] for a survey.

Various applications demand control over the singular values of subensembles of frames. In quantum physics, Weaver’s conjecture [28] (equivalent to the Kadison–Singer problem [18, 8], and recently resolved in [19]) concerns the existence of subensembles of unit norm tight frames with appropriately small spectral norm. Compressed sensing [7, 11] has spurred the pursuit of explicit frames with the property that every subensemble is well conditioned [10, 5, 1]. Motivated by applications in erasure-robust analog coding, Haikin, Zamir and Gavish [16, 15]

recently launched a new line of inquiry: identify frames for which the singular values of random subensembles exhibit a predictable distribution. (One might consider this to be a more detailed analogue to Tropp’s estimates on the conditioning of random subensembles 

[27].) Of particular interest are random subensembles of equiangular tight frames, and in this paper, we consider equiangular tight frames comprised of vectors in , which correspond to symmetric conference matrices. (Note that such frames have already received some attention in the context of compressed sensing [1, 2].)

An matrix is said to be a conference matrix if

  • for every ,

  • for every with , and

  • .

A symmetric conference matrix of order exists whenever is a prime power (by a Paley–based construction), and only if and is a sum of two squares [17]

. Explicitly, the Paley conference matrices are obtained by building a circulant matrix from the Legendre symbol and then padding with ones, for example:

where “” denotes . One may verify that the above example satisfies . For every symmetric conference matrix , it holds that is the Gram matrix of an equiangular tight frame consisting of vectors in  [25]. In particular, the equiangular tight frames that arise from the Paley conference matrices are known as Paley equiangular tight frames. In what follows, we consider random principal submatrices of symmetric conference matrices with the understanding that they may be identified with the Gram matrix of a random subensemble of the corresponding equiangular tight frame.

Given an symmetric matrix

with eigenvalues

, we let

denote the uniform probability measure over the spectrum of

(counted with multiplicity):

This is known as the empirical spectral distribution of . If

is a random matrix, then its empirical spectral distribution

is a random measure. We say a sequence of random measures converges almost surely to a non-random absolutely continuous measure if for every with

, it holds that the random variable

converges to almost surely.

We are interested in random matrices of a particular form. Let denote a random subset of such that the events are independent with probability . Then for any fixed matrix , we write to denote the (random) principal submatrix of with rows and columns indexed by . Following [12], we define the Kesten–McKay distribution with parameter by

Recall that a lacunary sequence is a set of natural numbers for which there exists such that for every . We are now ready to state our main result, which corresponds to one of many conjectures posed in [16]; see Figure 1 for an illustration.

Theorem 1.

Fix , take any lacunary sequence for which there exists a sequence of symmetric conference matrices of increasing size , and consider the corresponding random matrices . Then the empirical spectral distribution of converges almost surely to the Kesten–McKay distribution with parameter .

Figure 1: Consider the Paley conference matrix of order . For each choice of , we draw and plot a histogram of the spectrum of along with a suitably scaled version of the Kesten–McKay density for . The similarity between these distributions was first observed by Haikin, Zamir and Gavish [16]. Our main result (Theorem 1) explains this phenomenon.

In the next section, we prove this theorem using the method of moments, saving the more technical portions for Section 3.

1.1 Notation

Given , let denote the diagonal matrix whose diagonal entries are the entries of . Given , let denote the induced -norm of (i.e., the largest singular value of ), and let denote the Schatten -norm of (i.e., the -norm of the singular values of ). Throughout this paper, we will investigate how quantities relate as . For example, suppose we are interested in a quantity that depends on both and some additional parameters . Then we write if for every , it holds that as . We write if there exists such that for all and , and we write if for every , there exists such that for all . Finally, we write if both and .

2 Proof of the main result

Our proof makes use of a standard sufficient condition for the almost sure convergence of random measures, which is a consequence of the moment continuity theorem, the Borel–Cantelli lemma, and Chebyshev’s inequality, cf. Exercise 2.4.6 in [26]:

Proposition 2.

Let be a sequence of uniformly subgaussian random probability measures, and let be a non-random subgaussian probability measure. Suppose that for every , it holds that

  • , and

  • .

Then converges almost surely to .

As we will see, verifying hypothesis (i) in our case reduces to a combinatorics problem, whereas hypothesis (ii) can be treated separately with the help of Talagrand concentration:

Proposition 3 (Talagrand concentration, Theorem 2.1.13 in [26]).

There exists a universal constant for which the following holds: Suppose is both convex and -Lipschitz in , and let be a random vector in with independent coordinates satisfying almost surely. Then for every , it holds that

Throughout, denotes an symmetric conference matrix, we draw and put . We typically suppress the subscript . While the size of is random, its average size is , and so we use as a proxy for . As one might expect, this is a good approximation:

Lemma 4.

Put and . Then

Proof.

Since is a submatrix of , it holds that

almost surely. Similarly, almost surely. Next, let denote the (random) size of . Then , and so our bound on gives

where the last step applies the fact that

has binomial distribution. This immediately implies the desired bound on

. Finally, since almost surely, we have

which completes the result. ∎

As such, to demonstrate hypothesis (i) from Proposition 2 in our case, it suffices to prove

(1)

The Kesten–McKay moments are implicitly computed in [20], and are naturally expressed in terms of entries of Catalan’s triangle:

Proposition 5 (Lemma 2.1 in [20]).

For every and , it holds that

Recalling that , then Proposition 5 gives that (1) is equivalent to

(2)

where denotes an entry of Borel’s triangle:

To compute these limits, we first find a convenient expression for . To this end, recall that is the submatrix of with index set , and let denote the random diagonal matrix such that . Then

Considering , it follows that

(3)

It remains to show that these coefficients converge to the corresponding coefficients in (2).

First, we introduce some additional notation. Taking inspiration from Bargmann invariants [3], it is convenient to write

Next, we say is a partition of into blocks if such that , and we let denote the set of all such partitions. For each partition of , we consider the set of functions whose level sets are the blocks of , namely

With this, we define

Considering (3), it therefore holds that

(4)

As such, to demonstrate (2), it suffices to determine the limit of for every partition of . We start with a quick calculation:

Lemma 6.

For every with , it holds that .

Proof.

Estimate using the triangle inequality to obtain a sum of terms, each of size at most . ∎

For each , this establishes that the coefficient of in (4) approaches zero, i.e., the corresponding coefficient in (2). Now we wish to tackle the limiting value of in general. In light of the related literature [23], it comes as no surprise that depends on whether is a so-called crossing partition. We say a partition of is crossing if there exist with for which there exist and such that . Otherwise, is said to be non-crossing. Next, for each , we let denote the unique member of such that . Consider the graph with vertex set and edges given by for every ; here, we interpret modulo so that . Let denote the set of non-crossing for which the edges of partition into simple even cycles. Finally, let denote the th Catalan number. With these notions, we can describe the limit of each :

Lemma 7 (Key combinatorial lemma).
  • Suppose . Then .

  • Suppose and the edges of partition into simple cycles of sizes . Then and

The proof of Lemma 7 is rather technical (involving multiple rounds of induction), and so we save it for Section 3. In the meantime, we demonstrate how Lemma 7 can be applied to prove that the coefficients in (4) converge to the coefficients in (2). Recall that a Dyck path of semi-length is a path in the plane from to consisting of steps along the vector , called up-steps, and steps along the vector , called down-steps, that never goes below the -axis. We say a Dyck path is strict if none of the path’s interior vertices reside on the -axis. Each (strict) Dyck path determines a sequence of letters from that represent up- and down-steps in the path; this sequence is known as a (strict) Dyck word. With these notions, we may prove the following result by leveraging the fact that Borel’s triangle counts so-called marked Dyck paths [6]; see Figure 2 for an illustration.

Figure 2: (top left) Select and , and consider the partition with all singleton blocks except for and . Observe that is a non-crossing partition. (bottom left) We depict the corresponding graph , whose vertices are the blocks of . By definition, blocks are adjacent in when they contain cyclicly adjacent members of . In this case, the edges of partition into four simple cycles, which we label , , and . (right) Each simple cycle of is assigned a strict Dyck word of the cycle’s length, and we mark all but the first up-steps. The only choice for and is , and the only choice for is ; here, denotes a marked up-step. Meanwhile, has choices: and . For each selection, we traverse from to , to , etc., to and back to , labelling the edges of with the next letter from the current cycle’s Dyck word. The result is a Dyck word with marked up-steps, none of which at ground level. We illustrate the corresponding marked Dyck paths above. Notice that can be recovered from either marked Dyck path since a cycle is born with each un-marked up-step and dies once the Dyck path returns to its height from the birth of that cycle. By Theorem 2 in [6], marked Dyck paths are counted by entries in Borel’s triangle, which explains their appearance in (2).
Lemma 8.

It holds that

Proof.

When , the result follows from Lemma 6, and when is odd, the edges in each fail to partition into even simple cycles, and so the result follows from Lemma 7(i). Now suppose is even and . For , recall that the edges of are indexed by and partitioned into simple even cycles. Define to be the words such that for every simple cycle in with edges indexed by , the restriction is a strict Dyck word with all but its first up-steps marked (here, denotes a marked up-step). Note that strict Dyck words of semi-length are in one-to-one correspondence with Dyck words of semi-length , and so there are of them. As such, Lemma 7 implies that for every , it holds that

(5)

Let denote the set of marked Dyck words with marked up-steps, none of which are at ground level. We observe that

(6)

Then equations (5) and (6) together give

where the last step applies Theorem 2 in [6]. ∎

At this point, we are in a position to verify hypothesis (i) from Proposition 2 in our case. For hypothesis (ii), we follow the approach suggested by Remark 2.4.5 in [26]

of leveraging Talagrand concentration to bound the variance. First, we pass to a setting that is more amenable to analysis with Talagrand concentration. Here and throughout, for each

, we fix an matrix such that .

Lemma 9.

It holds that .

Proof.

Define , and observe that

Since commutes with and , the binomial theorem gives

and so rearranging gives

(7)

The following estimate holds for any choice of random variables :

The lemma follows from applying this estimate to (7) by induction on . ∎

Next, we establish the convexity and Lipschitz continuity required by Talagrand:

Lemma 10.

For each , consider the mapping defined by . Then is convex and -Lipschitz.

Proof.

We adopt the shorthand notation . First, is convex since satisfies the triangle inequality and is convex:

To compute a Lipschitz bound, we apply the factorization

with and to get

where the last step follows from the fact that , meaning (and similarly for ). Next, we apply the reverse triangle inequality to get

which implies the result. ∎

Finally, we apply Talagrand concentration to obtain a variance bound:

Lemma 11.

It holds that .

Proof.

Given the mapping from Lemma 10, define in terms of subgradients by

This is known as the smallest convex extension of to , and it is straightforward to verify that is convex and -Lipschitz with . Let have independent entries, each equal to with probability and otherwise. Since almost surely, it holds that has the same distribution as , and we let denote its expectation. By Talagrand concentration (Proposition 3), there exists such that

Combining with Lemma 9 then gives

as desired. ∎

We may now verify hypotheses (i) and (ii) from Proposition 2 in our case.

Proof of Theorem 1.

Put and . First, we modify the random measure so that we may apply Proposition 2 to prove the result. Indeed, fails to be a probability measure with probability , since when is the empty set. To rectify this, we define

Then it suffices to prove almost surely, since the Borel–Cantelli lemma implies almost surely, and so

for every with . Conveniently, for every and , it holds that

almost surely, and so the left-hand side inherits moments from the right-hand side.

To apply Proposition 2, we first observe that

almost surely, and so are uniformly bounded, and therefore uniformly subgaussian. Similarly, is bounded and therefore subgaussian. Fix . As a consequence of Lemma 8, it holds that

and so by Lemma 4, we have

As such, satisfies hypothesis (i) from Proposition 2. Next, Lemma 11 establishes that , and so Lemma 4 implies

Writing , select such that for every . Then

As such, also satisfies hypothesis (ii) from Proposition 2, and so almost surely, as desired. ∎

3 Proof of Lemma 7

It remains to compute, for each , the limit of

where is the set of whose level sets are the blocks of and

We begin with some basic properties of .

Lemma 12.

For every , each of the following holds:

  1. If