1 Introduction
Random matrix theory is the study of matrices whose entries are random variables. Of particular interest is the study of eigenvalue statistics of random matrices such as the empirical spectral measure. It has been broadly investigated in a wide variety of areas, including statistics
[19], number theory [13], economics [14], theoretical physics [18] and communication theory [17].Most of the matrix models in the literature are random matrices with independent entries. In a recent series of work (initiated in [2] and developed further in [1, 3, 20]), the authors considered a class of samplecovariance type matrices formed randomly from linear codes over a finite field, and proved that if the Hamming distance of the dual codes is at least 5, then as the length of the codes goes to infinity, the empirical spectral distribution of the random matrices obtained in this way converges to the wellknown MarchenkoPastur (MP) law. Since truly random matrices (i.e. random matrices with i.i.d. entries) of large size satisfy this property, this can be interpreted as that sequences from linear codes of dual distance at least 5 behave like random among themselves. This is a new pseudorandom test for sequences and is called a “group randomness” property [1]. It may have many potential applications. It is also interesting to note that the condition that the dual distance is at least 5 is optimal in the sense that binary firstorder ReedMuller codes which have dual distance 4 do not satisfy this property (see [1, 3]).
How fast does the empirical spectral distribution converge to the MP law? This question is interesting in itself and important in applications as one may wish to use linear codes of proper length to generate pseudorandom matrices. Along with proving the convergence in expectation, the authors in [20] obtained a convergence rate in the order of where is the length of the code. This is quite unsatisfactory, as the numerical data showed clearly that the convergence is rather fast with respect to . In this paper, we prove that the convergence rate is indeed at least in the order of in probability. This substantially improves the previous result.
To introduce our main result, we need some notation.
Let be a linear code of length over the finite field of order , where is a prime power. The most interesting case is the binary linear codes, corresponding to . The dual code consists of the tuples in which are orthogonal to all codewords of under the standard inner product. is also a linear code. Denote by the Hamming distance of . It is called the dual distance of .
Let be the standard additive character. To be more precise, if has characteristic , which is a prime number, then is given by , where is the absolute trace mapping from to . In particular, if , then the map is defined as . We extend componentwise to and obtain the map . Denote .
Denote by a matrix whose rows are chosen from uniformly and independently. This makes the set a probability space with the uniform probability.
Let be the Gram matrix of , that is,
(1) 
where means the conjugate transpose of the matrix . Let be the empirical spectral measure of , that is,
(2) 
where are the eigenvalues of and is the Dirac measure at the point . Note that is a random measure, that is, for any interval , the value is a random variable with respect to the probability space . Our main result is as follows.
Theorem 1.
Assume that is fixed. If , then
(3) 
uniformly for all intervals . Here is the empirical spectral measure of the MarchenkoPastur law whose density function is given by
(4) 
where the constant and are defined as
(5) 
and is the indicator function of the interval .
The symbol in (3) is a standard notation for “stochastic domination” in random matrix theory (see [7] for details). Here it means that for any and any , there is a quantity , such that whenever , we have
where is the probability with respect to and the supremum is taken over all intervals and all linear codes of length over with .
For application purposes, from Theorem 1, binary linear codes of dual distance 5 with large length and small dimension are desirable as they can be used to generate random matrices efficiently. Here we mention two constructions of binary linear codes with parameters and dual distance 5. The first family is the dual of primitive doubleerror correcting BCH codes ([10]). The second family of such codes, which includes the wellknown Gold codes, can be constructed as follows: Let be a function such that . Let and be a primitive element of . Define a matrix
Given a basis of over , each element of can be identified as an
column vector in
, hence the above can be considered as a binary matrix of size . Denote by the binary linear code obtained from as a generator matrix. Note that has length and dimension . It is known that the dual distance of is 5 if and only if is an almost perfect nonlinear (APN) function [9, 16]. Since there are many APNs whenis odd, this provides a general construction of binary linear codes of dual distance 5 which may be of interest for applications.
For truly random matrices with i.i.d. entries, finding the rate of convergence has been a longstanding question, starting from [12, 4, 5] in early 1990s. Great progress has been made in the last 10 years, culminating in achieving the optimal rate of convergence where is the size of the matrix (see [11, 7, 15]). The major technique is the use of the Stieltjes transform. In this paper we also use this technique.
The convergence rate problem for the empirical spectral distribution of large sample covariance random matrices has been studied for example in [5, 8], and in particular in [8] an optimal rate of convergence in the form of was obtained under quite general conditions. However, despite out best effort, none of the techniques in [5] and [8] can be easily applied directly to our setting. Instead we use a combination of ideas from [5] and [8]. More over, it is not clear to us what the optimal rate of convergence is under the general condition of linear codes with dual distance 5. We hope to stress this problem in the future.
The paper is now organized as follows. In Section 2, Preliminaries we introduce the main tool, the Stieltjes transform and related formulas and lemmas which will play important roles in the Proof of Theorem 1. In Section 3 we show how Theorem 1 can be derived directly from a major statement in terms of the Stieltjes transform (Theorem 4). While the argument is standard, it is quite technical and nontrivial. To streamline the idea of the proof, we put some of the arguments in Section 5 Appendix. In Section 4 we give a detailed proof of Theorem 4.
2 Preliminaries
2.1 Stieltjes Transform
In this section we recall some basic knowledge of Stieltjes transform. Interested readers may refer to [6, Chapter B.2] for more details.
Let be an arbitrary real function with bounded variation, and be the corresponding (signed) measure. The Stieltjes transform of (or ) is defined by
where is a complex variable outside the support of (or ). In particular is welldefined for all , the upper half complex plane. Here is the imaginary part of .
It can be verified that for all . The complex variable is commonly written as for .
The Stieltjes transform is useful because a function of bounded variation (or signed measures) can be recovered from its Stieltjes transform via the inverse formula ([12, 4]):
Here means that the real number
approaches zero from the right. Moreover, unlike the method of moments, the convergence of Stieltjes transform is both necessary and sufficient for the convergence of the underlying distribution (see
[6, Theorem B.9]).2.2 Resolvent Identities and Formulas for Green function entries
Given a subset , let be the matrix whose th entry is defined by . In addition, let be the Green function of . We write and as the Green functions of and respectively. Then for , we have [8, (3.8)]
(6) 
where the indices vary in , and is the th entry of the matrix .
The two Green functions and are related by the following identity ([8, Lemma 3.9]):
(7) 
Here is the cardinality of the set , and is the trace of the matrix .
2.3 Stieltjes Transform of the MarchenkoPastur Law
The Stieltjes transform
of the MarchenkoPastur distribution given in (
4) can be computed as (see [5])(10) 
It is wellknown that is the unique function that satisfies the equation of in
(11) 
such that whenever .
If a function satisfies Equation 11 with a small perturbation, we then expect that should be quite close to as well. This is quantified by the following result. First, we define
(12) 
where and are constants given in (5) and for a fixed constant , we define
(13) 
Lemma 2.
[8, Lemma 4.5] Suppose the function satisfies:

for some fixed constant for all ;

is Lipschitz continuous with Lipschitz constant ;

for each fixed , the function is nonincreasing for .
Suppose is the Stieltjes transform of a probability measure satisfying
(14) 
for some .
Fix and define , where is the real part of . Suppose that
(15) 
Then we have
where is the dependent variable defined as in (12).
2.4 Convergence of Stieltjes Transform in Probability
The following result is useful to bound the convergence rate of a random Stieltjes transform in probability.
Lemma 3.
Let be a random matrix with independent rows, , and be the Stieltjes transform of . Then
Proof of Lemma 3.
Note that the th entry of is simply the inner product of the th and th rows of . Hence varying one row of only gives an additive perturbation of of rank at most two. Applying the resolvent identity [7, (2.3)], we see that the Green function is also only affected by an additive perturbation by a matrix of rank at most two and operator norm at most . Then the desired result follows directly by applying the McDiarmid’s Lemma [7, Lemma F.3].
∎
For the purpose of this paper, we define an dependent event to hold with high probability if for any , there is a quantity such that for any .
3 Proof of Theorem 1
From this section onwards, let be a linear code of length over with dual distance . Let be the standard additive character, extended to componentwisely. Write .
Let be a random matrix whose rows are picked from uniformly and independently. This makes a probability space. Let be fixed. Write and the Gram matrix of . Furthermore, let be the empirical spectral measure of given by (2).
Denote to be the Stieltjes transform of , which is given by
where are the eigenvalues of the matrix , and is the Green function of , that is, . Note that in this setting this Stieltjes transform is itself a random variable.
Denote
(16) 
Here is the expectation with respect to the probability space .
3.1 An equation for
In the following result, we write defined in (16) in the form of the equation (11) with a small perturbation.
Theorem 4.
For any ,
where .
We remark that Theorem 4 is a major technical result regarding the expected Stieltjes transform , from which Theorem 1 can be derived directly without reference to linear codes at all. The proof of Theorem 4 is, however, quite complicated and is directly related to properties of linear codes. To streamline the idea of the proof, here we assume Theorem 4 and sketch a proof of Theorem 1. The proof of Theorem 4 is postponed to Section 4.
3.2 Proof of Theorem 1
Theorem 5.
Proof of Theorem 5.
We can check that all the conditions of Lemma 2 are satisfied: first by Theorem 4 we see that (14) holds for ; in addition, (15) holds for , and this function is independent of , nonincreasing in and Lipschitz continuous with Lipschitz constant . Hence by Lemma 2, we have
Note that in we have . Therefore we have
(17) 
for all .
4 Proof of Theorem 4
Now we give a detailed proof of Theorem 4, in which the condition that becomes essential.
4.1 Linear codes with
Recall the notation from the beginning of Section 3. Let be a linear code of length over . First is a simple orthogonality result regarding .
Lemma 6.
Let . Then
Here is the usual inner product between the vectors and .
As in Section 3, let be a random matrix whose rows are picked from uniformly and independently and let . Denote by the th entry of .
Corollary 7.
Assume . Then for any ,
(a) if ;
(b) if the indices do not come in pairs. If the indices come in pairs, then .
Here is the expectation with respect to the probability space .
Proof of Corollary 7.
For simplicity, denote by the vector with a at the th entry and at all other places.
4.2 Resolvent identities
4.3 Estimates of and
We now give estimates on the (dependent) random variable . First, given , we denote .
Lemma 8.
For any , we have
(a) ;
(b) .
Proof of Lemma 8.
The above estimations lead to the following estimations about .
Lemma 9.
For any , we have
(a) ;
(b) .
Proof of Lemma 9.
(a) By (20) we get
where the second equality follows from (a) of Lemma 8. Using (8) we easily obtain
(b) We split as
(21) 
where
We first estimate . By the definition of in (20) and applying (a) of Lemma 8, we see that
Then by (b) of Lemma 8 we obtain
(22) 
Hence
(23) 
where and for .
We can now complete the proof of Theorem 4.
Proof of Theorem 4.
Taking reciprocal and then expectation on both sides of (19), we get
(24) 
where
(25) 
and
(26) 
Multiplying on both sides of (25) and using the estimate in [5], we obtain
(27) 
Then the theorem follows directly from summing both sides of (24) for all and then dividing both sides by . ∎
5 Appendix
In this section, we use HelfferSjöstrand formula to prove Theorem 1 from Theorem 5. This is a standard procedure wellknown in random matrix theory. We follow the idea based on [7, Appendix C].
First we define the signed measure and its Stieltjes transform by
Now fix and define . For any interval , where and are constants defined in (5), we choose a smoothed indicator function satisfying for , for , and . These imply that the supports of and have Lebesgue measure bounded by . In addition, choose a smooth even cutoff function with for for and .
Then by the HelfferSjöstrand formula, we get
As LHS is real, we can write as
(28)  
(29)  
(30) 
First, by the trivial identity and the fact that is Lipschitz continuous on the compact set , we can easily extend Theorem 5 as follows:
Lemma 10.
For any fixed , we have, with high probability,
for all such that and .