# Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes

It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence in probability is at least in the order of n^-1/4 where n is the length of the code.

• 3 publications
• 62 publications
• 11 publications
08/28/2018

### Random Matrices from Linear Codes and Wigner's semicircle law

In this paper we consider a new normalization of matrices obtained by ch...
12/30/2019

### Distribution of the minimal distance of random linear codes

We study the distribution of the minimal distance (in the Hamming metric...
06/19/2022

### A Stirling-type formula for the distribution of the length of longest increasing subsequences, applied to finite size corrections to the random matrix limit

The discrete distribution of the length of longest increasing subsequenc...
05/31/2022

### Hadamard matrices related to a certain series of ternary self-dual codes

In 2013, Nebe and Villar gave a series of ternary self-dual codes of len...
11/20/2018

### Convergence rate of optimal quantization grids and application to empirical measure

We study the convergence rate of optimal quantization for a probability ...
10/10/2018

### On components of a Kerdock code and the dual of the BCH code C_1,3

In the paper we investigate the structure of i-components of two classes...
03/29/2022

### Growth factors of random butterfly matrices and the stability of avoiding pivoting

Random butterfly matrices were introduced by Parker in 1995 to remove th...

## 1 Introduction

Random matrix theory is the study of matrices whose entries are random variables. Of particular interest is the study of eigenvalue statistics of random matrices such as the empirical spectral measure. It has been broadly investigated in a wide variety of areas, including statistics

[19], number theory [13], economics [14], theoretical physics [18] and communication theory [17].

Most of the matrix models in the literature are random matrices with independent entries. In a recent series of work (initiated in [2] and developed further in [1, 3, 20]), the authors considered a class of sample-covariance type matrices formed randomly from linear codes over a finite field, and proved that if the Hamming distance of the dual codes is at least 5, then as the length of the codes goes to infinity, the empirical spectral distribution of the random matrices obtained in this way converges to the well-known Marchenko-Pastur (MP) law. Since truly random matrices (i.e. random matrices with i.i.d. entries) of large size satisfy this property, this can be interpreted as that sequences from linear codes of dual distance at least 5 behave like random among themselves. This is a new pseudo-random test for sequences and is called a “group randomness” property [1]. It may have many potential applications. It is also interesting to note that the condition that the dual distance is at least 5 is optimal in the sense that binary first-order Reed-Muller codes which have dual distance 4 do not satisfy this property (see [1, 3]).

How fast does the empirical spectral distribution converge to the MP law? This question is interesting in itself and important in applications as one may wish to use linear codes of proper length to generate pseudo-random matrices. Along with proving the convergence in expectation, the authors in [20] obtained a convergence rate in the order of where is the length of the code. This is quite unsatisfactory, as the numerical data showed clearly that the convergence is rather fast with respect to . In this paper, we prove that the convergence rate is indeed at least in the order of in probability. This substantially improves the previous result.

To introduce our main result, we need some notation.

Let be a linear code of length over the finite field of order , where is a prime power. The most interesting case is the binary linear codes, corresponding to . The dual code consists of the -tuples in which are orthogonal to all codewords of under the standard inner product. is also a linear code. Denote by the Hamming distance of . It is called the dual distance of .

Let be the standard additive character. To be more precise, if has characteristic , which is a prime number, then is given by , where is the absolute trace mapping from to . In particular, if , then the map is defined as . We extend component-wise to and obtain the map . Denote .

Denote by a matrix whose rows are chosen from uniformly and independently. This makes the set a probability space with the uniform probability.

Let be the Gram matrix of , that is,

 Gn=XX∗=1nΦnΦ∗n, (1)

where means the conjugate transpose of the matrix . Let be the empirical spectral measure of , that is,

 μn=1pp∑j=1δλj, (2)

where are the eigenvalues of and is the Dirac measure at the point . Note that is a random measure, that is, for any interval , the value is a random variable with respect to the probability space . Our main result is as follows.

###### Theorem 1.

Assume that is fixed. If , then

 |μn(I)−ϱMP,y(I)|≺n−14 (3)

uniformly for all intervals . Here is the empirical spectral measure of the Marchenko-Pastur law whose density function is given by

 dϱMP,y(x)=12πxy√(b−x)(x−a)1[a,b]dx, (4)

where the constant and are defined as

 a=(1−√y)2,b=(1+√y)2, (5)

and is the indicator function of the interval .

The symbol in (3) is a standard notation for “stochastic domination” in random matrix theory (see [7] for details). Here it means that for any and any , there is a quantity , such that whenever , we have

 supP[|μn(I)−ϱMP,y(I)|>n−14+ε]≤n−D,

where is the probability with respect to and the supremum is taken over all intervals and all linear codes of length over with .

For application purposes, from Theorem 1, binary linear codes of dual distance 5 with large length and small dimension are desirable as they can be used to generate random matrices efficiently. Here we mention two constructions of binary linear codes with parameters and dual distance 5. The first family is the dual of primitive double-error correcting BCH codes ([10]). The second family of such codes, which includes the well-known Gold codes, can be constructed as follows: Let be a function such that . Let and be a primitive element of . Define a matrix

 Hf:=[1αα2⋯αn−1f(1)f(α)f(α2)⋯f(αn−1)].

Given a basis of over , each element of can be identified as an

column vector in

, hence the above can be considered as a binary matrix of size . Denote by the binary linear code obtained from as a generator matrix. Note that has length and dimension . It is known that the dual distance of is 5 if and only if is an almost perfect nonlinear (APN) function [9, 16]. Since there are many APNs when

is odd, this provides a general construction of binary linear codes of dual distance 5 which may be of interest for applications.

For truly random matrices with i.i.d. entries, finding the rate of convergence has been a long-standing question, starting from [12, 4, 5] in early 1990s. Great progress has been made in the last 10 years, culminating in achieving the optimal rate of convergence where is the size of the matrix (see [11, 7, 15]). The major technique is the use of the Stieltjes transform. In this paper we also use this technique.

The convergence rate problem for the empirical spectral distribution of large sample covariance random matrices has been studied for example in [5, 8], and in particular in [8] an optimal rate of convergence in the form of was obtained under quite general conditions. However, despite out best effort, none of the techniques in [5] and [8] can be easily applied directly to our setting. Instead we use a combination of ideas from [5] and [8]. More over, it is not clear to us what the optimal rate of convergence is under the general condition of linear codes with dual distance 5. We hope to stress this problem in the future.

The paper is now organized as follows. In Section 2, Preliminaries we introduce the main tool, the Stieltjes transform and related formulas and lemmas which will play important roles in the Proof of Theorem 1. In Section 3 we show how Theorem 1 can be derived directly from a major statement in terms of the Stieltjes transform (Theorem 4). While the argument is standard, it is quite technical and non-trivial. To streamline the idea of the proof, we put some of the arguments in Section 5 Appendix. In Section 4 we give a detailed proof of Theorem 4.

## 2 Preliminaries

### 2.1 Stieltjes Transform

In this section we recall some basic knowledge of Stieltjes transform. Interested readers may refer to [6, Chapter B.2] for more details.

Let be an arbitrary real function with bounded variation, and be the corresponding (signed) measure. The Stieltjes transform of (or ) is defined by

 s(z):=∫∞−∞dF(x)x−z=∫∞−∞μ(dx)x−z,

where is a complex variable outside the support of (or ). In particular is well-defined for all , the upper half complex plane. Here is the imaginary part of .

It can be verified that for all . The complex variable is commonly written as for .

The Stieltjes transform is useful because a function of bounded variation (or signed measures) can be recovered from its Stieltjes transform via the inverse formula ([12, 4]):

 μ((x1,x2])=F(x2)−F(x1)=limη↓01π∫x2x1I(s(E+iη))dE.

Here means that the real number

approaches zero from the right. Moreover, unlike the method of moments, the convergence of Stieltjes transform is both necessary and sufficient for the convergence of the underlying distribution (see

[6, Theorem B.9]).

### 2.2 Resolvent Identities and Formulas for Green function entries

Let be a matrix. Denote by the Green function of , that is,

 G:=G(z)=(XX∗−zI)−1,

where and

is the identity matrix.

Given a subset , let be the matrix whose -th entry is defined by . In addition, let be the Green function of . We write and as the Green functions of and respectively. Then for , we have [8, (3.8)]

 1G(T)ℓℓ=−z−z∑j,kXℓjR(Tℓ)jk¯¯¯¯¯Xℓk, (6)

where the indices vary in , and is the -th entry of the matrix .

The two Green functions and are related by the following identity ([8, Lemma 3.9]):

 TrG(T)−TrR(T)=n−(p−|T|)z. (7)

Here is the cardinality of the set , and is the trace of the matrix .

Moreover, we have the following eigenvalue interlacing property ([8, Lemma 3.10])

 |TrG(T)−TrG|≤Cη−1, (8)

where is a constant depending on the set only, and also the Wald’s identity (see [8, (3.14)] or [7, (3.6)])

 ∑k|R(T)jk|2=η−1IR(T)jj. (9)

Noting here that we have written for .

### 2.3 Stieltjes Transform of the Marchenko-Pastur Law

The Stieltjes transform

of the Marchenko-Pastur distribution given in (

4) can be computed as (see [5])

 sMP,y(z)=−y+z−1−√(y+z−1)2−4yz2yz. (10)

It is well-known that is the unique function that satisfies the equation of in

 u(z)=11−y−z−yzu(z) (11)

such that whenever .

If a function satisfies Equation 11 with a small perturbation, we then expect that should be quite close to as well. This is quantified by the following result. First, we define

 κ:=min{|E−a|,|E−b|} (12)

where and are constants given in (5) and for a fixed constant , we define

 Sτ:={z=E+iη:κ≤τ−1,n−1/4+τ≤η≤τ−1}. (13)
###### Lemma 2.

[8, Lemma 4.5] Suppose the function satisfies:

1. for some fixed constant for all ;

2. is Lipschitz continuous with Lipschitz constant ;

3. for each fixed , the function is nonincreasing for .

Suppose is the Stieltjes transform of a probability measure satisfying

 u(z)=11−y−z−yzu(z)+Δ(z) (14)

for some .

Fix and define , where is the real part of . Suppose that

 |Δ(w)|≤δ(w),∀w∈L(z)∪{z}. (15)

Then we have

 |u(z)−sMP,y(z)|≤Cδ(z)√κ+η+δ(z),

where is the -dependent variable defined as in (12).

### 2.4 Convergence of Stieltjes Transform in Probability

The following result is useful to bound the convergence rate of a random Stieltjes transform in probability.

###### Lemma 3.

Let be a random matrix with independent rows, , and be the Stieltjes transform of . Then

 P(|s(z)−Es(z)|≥r)≤2exp(−n2η2r28p).
###### Proof of Lemma 3.

Note that the -th entry of is simply the inner product of the -th and -th rows of . Hence varying one row of only gives an additive perturbation of of rank at most two. Applying the resolvent identity [7, (2.3)], we see that the Green function is also only affected by an additive perturbation by a matrix of rank at most two and operator norm at most . Then the desired result follows directly by applying the McDiarmid’s Lemma [7, Lemma F.3].

For the purpose of this paper, we define an -dependent event to hold with high probability if for any , there is a quantity such that for any .

## 3 Proof of Theorem 1

From this section onwards, let be a linear code of length over with dual distance . Let be the standard additive character, extended to component-wisely. Write .

Let be a random matrix whose rows are picked from uniformly and independently. This makes a probability space. Let be fixed. Write and the Gram matrix of . Furthermore, let be the empirical spectral measure of given by (2).

Denote to be the Stieltjes transform of , which is given by

 sGn(z)=1pp∑j=11λj−z=1pTrG,

where are the eigenvalues of the matrix , and is the Green function of , that is, . Note that in this setting this Stieltjes transform is itself a random variable.

Denote

 sn(z):=EsGn(z)=1pETrG. (16)

Here is the expectation with respect to the probability space .

### 3.1 An equation for sn(z)

In the following result, we write defined in (16) in the form of the equation (11) with a small perturbation.

###### Theorem 4.

For any ,

 sn(z)=11−y−z−yzsn(z)+Δ(z)

where .

We remark that Theorem 4 is a major technical result regarding the expected Stieltjes transform , from which Theorem 1 can be derived directly without reference to linear codes at all. The proof of Theorem 4 is, however, quite complicated and is directly related to properties of linear codes. To streamline the idea of the proof, here we assume Theorem 4 and sketch a proof of Theorem 1. The proof of Theorem 4 is postponed to Section 4.

### 3.2 Proof of Theorem 1

Assuming Theorem 4

, we can first estimate the term

, following ideas from [7] and [8].

###### Theorem 5.

Assume that Theorem 4 holds. Then for any fixed , we have

 |sGn(z)−sMP,y(z)|≤nτ(n−1/4+n−1η−7/2)

with high probability.

###### Proof of Theorem 5.

We can check that all the conditions of Lemma 2 are satisfied: first by Theorem 4 we see that (14) holds for ; in addition, (15) holds for , and this function is independent of , nonincreasing in and Lipschitz continuous with Lipschitz constant . Hence by Lemma 2, we have

 |sn(z)−sMP,y(z)|≤Cδ(z)√κ+η+δ(z).

Note that in we have . Therefore we have

 |sn(z)−sMP,y(z)|=O(n−1η−7/2) (17)

for all .

Now Lemma 3 implies that

 P(|sGn(z)−sn(z)|>nτ−1/4)≤2exp(−n(nτ−1/4)48y)=2exp(−n4τ8y)≤n−D

on , for any and large enough . Combining this with (17) completes the proof of Theorem 5. ∎

Finally, armed with Theorem 5, we can derive Theorem 1 from a standard application of the Helffer-Sjöstrand formula in random matrix theory. The argument is essentially complex analysis. Interested readers may refer to Section 5 Appendix for details.

## 4 Proof of Theorem 4

Now we give a detailed proof of Theorem 4, in which the condition that becomes essential.

### 4.1 Linear codes with d⊥≥5

Recall the notation from the beginning of Section 3. Let be a linear code of length over . First is a simple orthogonality result regarding .

###### Lemma 6.

Let . Then

 1#C∑c∈Cψ(a⋅c)={1(a∈C⊥),0(a∉C⊥).

Here is the usual inner product between the vectors and .

As in Section 3, let be a random matrix whose rows are picked from uniformly and independently and let . Denote by the -th entry of .

###### Corollary 7.

Assume . Then for any ,

(a) if ;

(b) if the indices do not come in pairs. If the indices come in pairs, then .

Here is the expectation with respect to the probability space .

###### Proof of Corollary 7.

For simplicity, denote by the vector with a at the -th entry and at all other places.

(a) It is easy to see that

 E(Xℓj¯¯¯¯¯Xℓk) = n−1(#C)−1∑c∈Cψ(cj−ck) = n−1(#C)−1∑c∈Cψ((ej−ek)⋅c).

As and , so , and the desired result follows directly from Lemma 6.

(b) Again we can check that

 E(XℓjXℓt¯¯¯¯¯Xℓk¯¯¯¯¯Xℓs)=n−2(#C)−1∑c∈Cψ((ej+et−ek−es)⋅c).

If the indices do not come in pairs, since , we have , and the result is zero by Lemma 6; If the indices do come in pairs, noting that , we also obtain the desired estimate. This completes the proof of Corollary 7. ∎

### 4.2 Resolvent identities

We start with the resolvent identity (6) for . The sum on the right of (6) can be written as

 z∑j,kXℓjR(ℓ)jk¯¯¯¯¯Xℓk=zn∑jR(ℓ)jj+Zℓ,

where

 Zℓ=z∑j≠kXℓjR(ℓ)jk¯¯¯¯¯Xℓk. (18)

Using (6) and (7) we have

 1Gℓℓ =−z−znTrR(ℓ)−Zℓ =−z−zn(TrG(ℓ)−n−p+1z)−Zℓ =1−y−z−yzsn(z)+Yℓ, (19)

where

 Yℓ =yzsn(z)−znTrG(ℓ)+1n−Zℓ =zn(ETrG−TrG(ℓ))+1n−Zℓ. (20)

### 4.3 Estimates of Zℓ and Yℓ

We now give estimates on the (-dependent) random variable . First, given , we denote .

###### Lemma 8.

For any , we have

(a) ;

(b) .

###### Proof of Lemma 8.

(a) From the definition of in (18), we have

 E(ℓ)Zℓ=z∑j≠kR(ℓ)jkE(Xℓj¯¯¯¯¯Xℓk)=0,

where the first equality follows from the fact that rows of are independent, and second equality follows from statement (a) of Corollary 7. The proof of the result on is similar by replacing with .

(b) Expanding and taking expectation inside, noting that the rows of are independent, we have

 E|Zℓ|2 =|z|2E∣∣ ∣∣∑j≠kXℓjR(ℓ)jk¯¯¯¯¯Xℓk∣∣ ∣∣2 =|z|2∑j≠ks≠tE(R(ℓ)jk¯¯¯¯¯R(ℓ)st)E(XℓjXℓt¯¯¯¯¯Xℓk¯¯¯¯¯Xℓs).

Since , by using statement (b) of Corollary 7 and Wald’s identity (9), together with the trivial bound , we obtain

 E|Zℓ|2 ≤C|z|2n2∑j,kE|R(ℓ)jk|2 =C|z|2n2η∑jEIR(ℓ)jj≤Cnη2.

Here is a generic constant which may be different in each occurrence. ∎

###### Lemma 9.

For any , we have

(a) ;

(b) .

###### Proof of Lemma 9.

(a) By (20) we get

 EYℓ=znE(TrG−TrG(ℓ))+1n−EZℓ=znE(TrG−TrG(ℓ))+1n,

where the second equality follows from (a) of Lemma 8. Using (8) we easily obtain

 |EYℓ|≤C|z|nη≤Cnη.

(b) We split as

 E|Yℓ|2=E|Yℓ−EYℓ|2+|EYℓ|2=V1+V2+|EYℓ|2, (21)

where

We first estimate . By the definition of in (20) and applying (a) of Lemma 8, we see that

 Yℓ−E(ℓ)Yℓ=−Zℓ+E(ℓ)Zℓ=−Zℓ.

Then by (b) of Lemma 8 we obtain

 V1=E|Zℓ|2=O(n−1η−2). (22)

Next we estimate . Again by (20) and Lemma 8, we have

 E(ℓ)Yℓ−EYℓ=−zn(TrG(ℓ)−ETrG(ℓ))−(E(ℓ)Zℓ−EZℓ)=−zn(TrG(ℓ)−ETrG(ℓ)).

Hence

 V2 =|z|2n2E|TrG(ℓ)−ETrG(ℓ)|2 =|z|2n2∑m≠ℓE|E(Tm−1)TrG(ℓ)−E(Tm)TrG(ℓ)|2, (23)

where and for .

For , denote and . It is easy to check that

 γm=E(Tm−1)σm−E(Tm)σm.

Thus by (8) we have .

Putting this into (4.3) yields

 V2≤C|z|2nη2≤Cnη2.

Plugging the estimates of in statement (a), in (22) and above into the equation (21), we obtain the desired estimate of . This finishes the proof of Lemma 9. ∎

We can now complete the proof of Theorem 4.

###### Proof of Theorem 4.

Taking reciprocal and then expectation on both sides of (19), we get

 EGℓℓ=E1αn+Yℓ=1αn+Aℓ=1αn+Δℓ, (24)

where

 αn=1−y−z−yzsn(z),
 Aℓ=E1αn+Yℓ−1αn=−1α2nEYℓ+1α2nEY2ℓαn+Yℓ, (25)

and

 Δℓ=(1αn+Aℓ)−1−αn=−α2nAℓ1+αnAℓ. (26)

Multiplying on both sides of (25) and using the estimate in [5], we obtain

 |α2nAℓ|=∣∣ ∣∣−EYℓ+EY2ℓαn+Yℓ∣∣ ∣∣≤|EYℓ|+1ηE|Yℓ|2. (27)

Using the fact that from [5] and Lemma 9 we obtain

 |Δℓ|≤Cnη3

for all .

Then the theorem follows directly from summing both sides of (24) for all and then dividing both sides by . ∎

## 5 Appendix

In this section, we use Helffer-Sjöstrand formula to prove Theorem 1 from Theorem 5. This is a standard procedure well-known in random matrix theory. We follow the idea based on [7, Appendix C].

First we define the signed measure and its Stieltjes transform by

 ^μn:=μn−ϱMP,y,^sn(z):=∫^μn(dx)x−z=sGn(z)−sMP,y(z).

Now fix and define . For any interval , where and are constants defined in (5), we choose a smoothed indicator function satisfying for , for , and . These imply that the supports of and have Lebesgue measure bounded by . In addition, choose a smooth even cutoff function with for for and .

Then by the Helffer-Sjöstrand formula, we get

 ∫f(λ)^μn(dλ)=12π∬(∂u+i∂v)[f(u)+ivf′(u))χ(v)]^sn(u+iv)dvdu.

As LHS is real, we can write as

 ∫f(λ)^μn(dλ) =−12π∫∫|v|≤˜ηf′′(u)χ(v)vI^sn(u+iv)dvdu (28) −12π∫∫|v|>˜ηf′′(u)χ(v)vI^sn(u+iv)dvdu (29) +i2π∬(f(u)+ivf′(u))χ′(v)^sn(u+iv)dvdu (30)

First, by the trivial identity and the fact that is Lipschitz continuous on the compact set , we can easily extend Theorem 5 as follows:

###### Lemma 10.

For any fixed , we have, with high probability,

 |^sn(u+iv)|≤nε/2(n−1/4+n−1|v|−7/2),

for all such that and .

We may now estimate the three terms appearing in (28)-(30). First, for the term in (30), by using the fact that is even with support in , we have

 ∣∣∣∬(f(u)+ivf′(u))χ′(v)^sn(u+iv)dvdu∣∣∣≤C˜η (31)

with high probability.

We next estimate the term in (28). Since is small, we cannot apply Lemma 10 directly. However it can be proved that for all , the function is nondecreasing for . This implies, for ,

 vI^sn(u+iv)≤vIsGn(u+iv)≤˜ηIsGn