 # Surjectivity of near square random matrices

We show that a nearly square iid random integral matrix is surjective over the integral lattice with very high probability. This answers a question by Koplewitz. Our result extends to sparse matrices as well as to matrices of dependent entries.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

In this note we study random rectangular matrices of size , where and , and the entries

are i.i.d copies of a random variable

taking integral values and such that for any prime

 maxx∈Z/pZP(ξ=x)≤1−αn, (1)

where is a parameter allowed to depend on . Such distributions are called -balanced.

For random square matrices of random Bernoulli entries taking values 0 and 1 with probability 1/2, the problem to estimate the probability

of being singular has attracted quite a lot of attention. In the early 60’s Komlós  showed . This bound was significantly improved by Kahn, Komlós, and Szemerédi in the 90’s to About ten years ago, Tao and Vu  improved the bound to . We also refer the reader to  by Rudelson and Vershynin for implicit bounds of type . The most recent record is due to and Bourgain, Vu and Wood , who show:

###### Theorem 1.1.
 pn≤(1√2+o(1))n.

These results imply that with very high probability the linear map is injective over (and hence the lattice has full rank in .) Another fundamental question of interest is the surjectivity onto , more specifically:

Is it true that with high probability is also surjective over (in other words, the quotient group is trivial)?

Unfortunately, the answer to this question turns out to be negative: with high probability is never surjective over

. To explain this at the heuristic level, assume that the vector

is in the image space , then (assuming that is non-singular)

 x=M−1n×n(e1)=((M−1n×n)11,…,(M−1n×n)1n)T∈Zn.

However, we have , where is the matrix obtained from by removing the first row and the -th column. By the co-factor expansion

 det(Mn×n)=n∑i=1(−1)i−1M1idet(M1i). (2)

But as the are independent from the submatrices , it is highly unlikely that the random sum becomes smaller than all so that the components of are all integral.

Having seen that is unlikely to be surjective, it is natural to think of rectangular matrices which might have better chance to be surjective. In fact, in the past several years there have been exciting developments (see for instance [8, 10, 17, 18]) in the study of for various ensembles of . For instance a special version of a recent result by Wood [18, Corollary 3.4] shows:

###### Theorem 1.2.

Let be a fixed integer. Let

be a random matrix with entries being iid copies of an

-balanced random variable of fixed . Let be a finite set of primes, then

 limn→∞P(Cok(Mn×(n+u))P≃{id})=∏p∈P∞∏k=1(1−p−k−u),

where is the product of -Sylow subgroups of and the cokernel is the quotient group .

We remark that is fixed and in this result. However as increases, the probability on the right hand side of the limit becomes arbitrarily small. Hence it follows that

 limsupn→∞P(Cok(Mn×n)≃{id})≤infPlimn→∞P(Cok(Mn×n)P≃{id})=0

which officially answers our question above.

In the opposite direction, it has been conjectured by Koplewitz [7, 8] that

###### Conjecture 1.3.

Let the matrix entries be iid copies of an -balanced random variable of fixed . Then for any fixed constant ,

 limn→∞P(Cok(Mn×⌊(1+ε)n⌋)≃{id})=1.

Also, with together with

 limn→∞P(Cok(Mn×(n+u))≃{id})=1.

To support these conjectures, Koplewitz himself showed in [7, Theorem 1] (see also [8, Theorem 30]) that . In the same paper he also confirmed Conjecture 1.3 for random matrices of entries distributed according to the Haar measure over the profinite completion of .

In this note we confirm the first conjecture. In fact we are able to extend the result to very sparse matrices. More specifically, we can assume to take integer values as in (1) with

 αn≥C0lognn (3)

for a sufficiently large constant .

###### Theorem 1.4 (Main result).

Let be as in (3). Assume furthermore that is bounded with probability one. Then for every , there exist and an absolute constant such that

 P(Cok(Mn×(n+⌊B(lognαnlog(lognαn)+logn)⌋))≃{id})≥1−O(n−A+e−cαnn).

In particular, if is fixed then

 P(Cok(Mn×⌊n+log1+o(1)n⌋)≃{id})≥1−O(n−ω(1)); (4)

as well as if then

 P(Cok(Mn×⌊(1+o(1))n⌋)≃{id})≥1−O(n−ω(1)). (5)

Note that a balanced assumption on is necessary as the results no longer hold for instance if we work with the Bernoulli ensemble; in this case the matrix cannot be surjective modulo 2 for even . Note also by considering the ensemble with we can see that roughly the stated number of additional columns is necessary up to multiplicative constants, just by considering rows that are identically

We will also discuss an extension to a family of matrices of dependent entries, see Section 3. Our method is short and direct. We will first prove a slightly weaker version (Theorem 2.5) by relying on a totally elementary lemma by Odlyzko (Lemma 2.3). We then refine the method by using a more involved result by Maples from  (Theorem 2.9). However, as  appears to be slightly incomplete and contains several (minor) errors, we will take this opportunity to recast Maples’ proof toward our sparsest settings. Along the way, we show that this approach also yields a completely new singularity bound for sparse integral matrices.

###### Theorem 1.5.

There exists an absolute constant such that as long as the entries of are iid copies of distributed as in (3) (which is not necessarily bounded) then

 pn≤e−cαnn.

We notice that the recent paper 

by Basak and Rudelson addressed the singularity (and in more general the least singular value) for a general family of sparse matrices. Unfortunately, Theorem

1.5 does not seem to follow from  because we have no restriction on the spectral norm of .

## 2. Proof of Theorem 1.4

We assume that

 P(|ξ|≤K0)=1,

for some positive constant . This assumption is only for Theorem 1.4, but not for Theorem 1.5.

A natural approach is to show that the equation system

 Mn×mx=ei,

has solutions , for any standard unit vector . However such an approach does not look simple as we would have to prove cancellation of extremely large numbers involving determinants of minors (see also the discussion around (2)). Instead, we will prove surjectivity by reducing our matrices over finite fields via the following result.

###### Lemma 2.1.

[7, Lemma 5] Let . A matrix is surjective if and only if the modulo matrix is surjective for every prime . Here is the matrix over given by .

###### Proof.

(of Lemma 2.1) Assume that is surjective, then contains a submatrix (depending on ) of size such that . Thus . This implies that the columns of generate , and hence is a full-rank integer lattice. In particular, the lattice co–volume (which is independent of ) is finite and divides for all . Now assume that . Then as , is not divisible by . But this holds for all prime , a contradiction. ∎

By this lemma, for our problem we need to show that is surjective (or equivalently, has rank in ) for every prime . This does not seem to be an easier task, but in what follows we show that there is a way to restrict the treatment to a set of a only a few primes.

Our first ingredient is the following simple bound (see also [Ng-repulsion, Lemma 3.9]).

Let be a prime. Let be a given parameter that might depend on . Let be a matrix of size whose entries are iid copies of a random variable from (3). Then the probability that has rank at most in is smaller than .

To prove this result we rely on a useful result by Odlyzko .

###### Lemma 2.3.

Let be a subspace of dimension in . Then if is a random vector whose entries are iid copies of a random variable from (3), then

 P(X∈H)≤(1−αn)n−d.

We insert a proof of this well-known result here for completion.

###### Proof of Lemma 2.3.

Let be a basis for By permuting coordinates, we may assume without loss of generality that the restrictions of these vectors to the first coordinates are again linearly independent. Consider the event . From the linear independence of there are unique so that

 (ξ1,…,ξd)t=d∑i=1ci~Hi.

Hence conditioning on , if then

 X=d∑i=1ciHi.

In particular, each value is determined. However the probability of each of these events is at most , and so by independence the event holds with probability at most . ∎

Now we turn to the quadratic estimate.

###### Proof of Lemma 2.2.

Let . Assume that the columns span the column space of . For now assume that . Let be the subspace spanned by . We are considering the event that . By Lemma 2.3, for any ,

 P(Xi∈H)≤(1−αn)n−d≤e−αnεnn.

Applying this bound for and using independence we obtain

 PXi,d+1≤i≤n(E1,…,d|X1,…,Xd)=PXi,d+1≤i≤n(Xd+1,…,Xn∈H|H)≤e−αnε2nn2.

Taking the union bound over at most choices of we conclude the proof. ∎

### 2.4. A simpler result

To get the main idea, in this subsection we show:

###### Theorem 2.5.

For every , there exists sufficiently large such that

 P(Cok(Mn×(n+u))≃{id})≥1−n−A,

where

 u=⌊Blog2nαN+√nlognαn⌋.

Note that does not drop below .

In what follows we prove Theorem 2.5. The same argument will also be used to deal with matrices of dependent entries. Let be the set of primes up to

 Pn:={p prime ,p≤(K0n)n/2}. (6)

By taking the union bound, Lemma 2.2 then implies:

###### Corollary 2.6.

Let be the event that the matrix has rank at least in for all . Then

 P(E)≥1−e−αnε2nn2+nlogn+n+nlogK0/2.

Set

 εn:=√3lognαnn.

With this value of , Corollary 2.6 implies

 P(E)≥1−e−nlogn.
###### Lemma 2.7 (surjectivity for small primes).

For any there is a sufficiently large so that with probability at least , the random matrix is surjective over for all simultaneously, and is as in Theorem 2.5.

###### Proof.

(of Lemma 2.7) It suffices to show that with high probability has full rank in (which would then imply surjectivity in ).

We consider the submatrix , the restriction of to the first columns. Let be the event defined in Corollary 2.6, i.e. that has rank at least over for all . We thus have

 PMn×n(E)≥1−e−nlogn.

Consider also the event that , where by Theorem 1.5

 P(E≠0)≥1−n−A,

provided that for large .

Now we condition on satisfying and and show that with high probability (with respect to the last columns) that is surjective over all .

Let be the collection of prime divisors of . Because by the Hadamard bound, the random set has small size, say

 |P∗|≤n2.

Case 1. When but , then has full rank in , and so does .

Case 2. Consider , we estimate the probability of the event that has full rank.

Let be the column subspace of , for which by assumption

 d0:=n−dim(H0)≤n−(1−εn)n≤√3nlognαn.

We next expose the remaning vectors in groups. For , at step we will add column vectors to the set of already exposed column vectors , where

 ki:=⌈Blognαdi−1⌉,

and where is the codimension of the subspace generated by . Notice that in this exposing process the choice of depends on , a decreasing sequence throughout the process.

Next let be the event that . In other words, is the event that after adding the vectors of group we have a strict decrease in the co-rank,

 di≤di−1−1.

Assuming that , then by Lemma 2.3, and by independence of the column vectors,

 P(Fi|∧i−1j=0Fj∧E∧E≠0,dim(Hi−1)

By Bayes’ rule, with probability at least , after adding columns, the matrix has full rank in .

Taking union bound over all primes , we obtain that with probability at least the obtained matrix has full rank in for all .

By Case 1. and Case 2., we have seen that with satisfying and , the matrix is surjective simultaneously over for all with the desired probability. The proof is then complete after unfolding the conditioning on (using Corollary 2.6). ∎

###### Proof.

(of Theorem 2.5) We condition on the event . Note that with probability one . This shows that with prime , . Hence on the matrix is surjective over for all .

Furthermore, Lemma 2.7 implies that with probability at least , for all the random matrix is surjective over . Hence altogether our matrix is surjective in by Lemma 2.1. ∎

### 2.8. Proof of Theorem 1.4

Now we turn to our main theorem, where the proof is similar but instead of Lemma 2.3 we will be using the following result by Maples (see either [9, Theorem 1.2] or [10, Corollary 1.3].)

###### Theorem 2.9.

Let be any prime. Assume that the entries of are iid copies of from (1) with from (3). Then for all with a sufficiently small absolute constant we have

 P(rank(Mn×n/p)=n−k)=O(nk(p−k2+e−cαn)). (7)

In fact [10, Corollary 1.3] says much more, that the bound is precisely

 p−k2k∏i=1(1−p−i)−1∞∏i=k+1(1−p−i)+O(e−cαn). (8)

However, we will not need this later result (given that it has not been formally verified, especially for the sparse case). Note that (8), in its limit form (), is a simple consequence of the aforementioned paper [18, Corollary 3.5] by Wood. Back to , as this paper has some mistakes (for instance [9, Proposition 2.3] is incorrect, see the appendix for further discussion), for transparency we will recast an almost complete proof of Theorem 2.9 in the appendix. Theorem 2.9 and Theorem 1.5 will then follow as a byproduct.

Let be the set of primes up to , where is the sufficiently small constant from Theorem 2.9, i.e.

 P′n:={p prime ,p≤ecαnn/2}.

Note that (defined in (6)). By applying (7) with for a sufficiently large constant to each and taking the union bound

 ∑p∈P′nnk(p−k2+O(e−cαnn))=n−ω(1).
###### Corollary 2.10.

Let be the event that the matrix has rank at least in for all . Then

 P(E′)≥1−n−ω(1).

We next prove an analog of Lemma 2.7. Set

 u=⌊B⋅(lognαnloglognαn+logn)⌋,

for a sufficiently large constant .

###### Lemma 2.11.

With probability at least , the random matrix is surjective over for all simultaneously.

###### Proof of Lemma 2.11.

Again, if suffices to show that with high probability has full rank in each .

We consider the submatrix , the restriction of to the first columns. Let be the event implied by Corollary 2.10 that this matrix has rank at least over for all . We thus have

 PMn×n(E′)≥1−n−ω(1).

Consider also the event that from Theorem 1.5. Conditioning on satisfying and , we will show that with high probability (with respect to the last columns) that is surjective over all .

To do this, similarly to the proof of Lemma 2.7, let be the collection of prime divisors of , then clearly the random set has size at most .

Case 1. When but , then has full rank in , and so does .

Case 2. Consider , we estimate the probability of the event that has full rank over . For this, first note that under , if (that is ) then the corank of over is at most . Now if , as , the corank of over for these large must be at most . So in either case the corank is at most .

Let be the column subspace of , for which by assumption

 d0:=n−dim(H0)≤(cαn)−1logn+C1logn.

Similarly to the proof of Theorem 2.5, for , we will add column vectors to the set of already exposed column vectors , where is the codimension of the subspace generated by .

Let be the event that . By Lemma 2.3, and by independence of the column vectors,

 P(Fi|∧i−1j=0Fj∧E′∧E≠0,dim(Hi−1)

By Bayes’ rule, with probability at least , after adding columns, the matrix has full rank in . (It is possible to improve the total number of extra columns by a more careful analysis of the but we will not do so here for simplicity.)

Taking the union bound over all primes , we obtain that with probability at least the matrix has full rank in for all .

We have seen that with satisfying and , the matrix is surjective simultaneously over for all with the desired probability. The proof is then complete after unfolding the conditioning on , knowing that these events hold with very high probability. ∎

Finally, for Theorem 1.4, conditioning on the event , with prime we have , and hence on the matrix is surjective over for all .

On the other hand, Lemma 2.11 implies that with probability at least , for all the random matrix is surjective over .

## 3. Some remarks

We have studied random matrices of independent entries. It is natural to consider Conjecture 1.3 for other families of matrices of dependent entries. Here we discuss one such model.

Let be a random symmetric matrix, where for simplicity we assume that the entries are iid copies of a bounded random variable from (1) with fixed . It follows from [12, 16] that for this model the singularity probability can be bounded by

 pn=n−ω(1). (9)

Heuristically, arguing similarly to (2) (where we expose both columns and rows at the same time to obtain a quadratic variant of (2)), we can show that with high probability the matrix is not surjective over . Actually an analog of Theorem 1.2 has been established in  for this model 111To be more precise, M. M. Wood studied the Laplacian, but her result also covers the non-normalized ensemble., which confirms the above heuristic. However, we will show that by adding a couple of few more (say) independent rows, the matrix becomes surjective.

###### Theorem 3.1.

Let be a random matrix where its restriction to the first columns is a symmetric matrix as above, and the last columns are independent with entries being iid copies of . Then for any , there exists such that for

 P(Cok(Mn×(n+u))≃{id})≥1−n−A.

To justify this result, we establish the following analog of Lemma 2.2.

Let be a prime. Let be a given parameter that might depend on . Let be a symmetric matrix where are iid copies of a bounded random variable from (1) with fixed . Then the probability that has rank at most in is smaller than .

###### Proof of Lemma 3.2.

Let . Assume that the columns spans the column space of . For now assume that . Let be the span of . We are considering the event that . Now as is dependent on , we cannot estimate the probability of directly by Odlyzko’s bound. However, we can get rid of the dependence by deleting the corresponding common entries as below.

For set

 Id+j:={1,…,d+j}.

For any and we denote by the restriction of over the components indexed by . For convenience we also denote by the subspace generated by . Assume that , then the following holds

• , and more generally ;

• the vector is independent of ;

• the vectors are mutually independent.

Now as has rank at most in , by Lemma 2.3 we have

 P(Xd+j+1|Id+j∈H|Id+j)≤(1−αn)j.

Applying this bound for and using the independence of , we obtain

 PXi,d+1≤i≤n(E1,…,d|X1,…,Xd)≤n−d−1∏j=1(1−αn)j≤e−αnε2nn2/2.

Taking union bound over at most choices of we conclude the proof. ∎

We can now complete the proof of Theorem 3.1 verbatim as in the proof of Theorem 2.5 with fixed . Indeed, Corollary 2.6 follows from Lemma 3.2, and Lemma 2.7 can be justified similarly (conditioning on (9)) because the last columns are mutually independent, and are independent from .

## Appendix A The corank estimate: proof proof of Theorem 2.9

We will work in a more general setting. Let be a prime power and be the finite field with

elements. We say that a probability distribution

in is -balanced (for some ) if for every additive subgroup in and

 μ(s+T)≤1−αn.

In the general finite field setting, we will assume

 αn≥n−1/2+ε for any ε>0 . (10)

In the more specific setting when (which is the setting of Theorem 2.9), as there is no non-trivial additive subgroup in , we will assume

 maxx∈Fpμ(x)=1−αn

where

 αn≥C0lognn, for a sufficiently large constant C0. (11)

In what follows is a random matrix where the entries are independent and identically distributed according to an -balanced either from (10) or (11), and . Notice that in either case, we do not assume the support of to be bounded. Recall that are the columns of and