    # The rank of sparse random matrices

Generalising prior work on the rank of random matrices over finite fields [Coja-Oghlan and Gao 2018], we determine the rank of a random matrix with prescribed numbers of non-zero entries in each row and column over any field. The rank formula turns out to be independent of both the field and the distribution of the non-zero matrix entries. The proofs are based on a blend of algebraic and probabilistic methods inspired by ideas from mathematical physics.

## Authors

10/17/2018

### The rank of random matrices over finite fields

We determine the rank of a random matrix A over a finite field with pres...
12/28/2021

### The full rank condition for sparse random matrices

We derive a sufficient condition for a sparse random matrix with given n...
02/21/2017

### Column normalization of a random measurement matrix

In this note we answer a question of G. Lecué, by showing that column no...
05/06/2020

### Rigid Matrices From Rectangular PCPs

We introduce a variant of PCPs, that we refer to as rectangular PCPs, wh...
04/21/2021

### On the rank of Z_2-matrices with free entries on the diagonal

For an n × n matrix M with entries in ℤ_2 denote by R(M) the minimal ran...
11/03/2020

### Near-Optimal Entrywise Sampling of Numerically Sparse Matrices

Many real-world data sets are sparse or almost sparse. One method to mea...
11/19/2013

### Near-Optimal Entrywise Sampling for Data Matrices

We consider the problem of selecting non-zero entries of a matrix A in o...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

### 1.1. Background and motivation

Determining the spectrum of a random matrix is a problem of fundamental importance to mathematical physics. A related but more basic quesiton is to find the multiplicity of zero as an eigenvalue, or equivalently to compute the rank. The intricacy of this task depends vitally on the density of the random matrix and on the underlying field. For instance, a simple moment calculation suffices to prove that a random

-matrix with entries drawn uniformly and independently from a the field has rank

with high probability. The argument essentially coincides with the proof of the Gilbert-Varshamov bound from coding theory

[19, 39]. By contrast, the case of dense -matrices over the rationals, first settled by Komlós  in the 1960s, requires a fairly sophisticated argument. Furthermore, the moment calculation that works beautifully for dense -matrices breaks down in the sparse case where the average number of non-zero entries per row or column is bounded . In effect, despite the long history of the rank problem for random matrices, there has not been a comprehensive formula for the rank of sparse random matrices over general fields.

The present paper contributes such a formula. Specifically, we prove that the recent rank formula for random matrices over finite fields from  actually holds for sparse random matrices over any field. This result requires a significant extensions of the approach developed in . Indeed, that proof, effectively based on a probabilistic counting argument, breaks down over infinite fields where it is no longer possible to sum over all elements of the kernel. The technical contribution here is to show how the probabilistic arguments from  can be replaced by algebraic ones. A key ingredient of this novel algebraic approach is an abstract proposition that shows how a slight random perturbation rids any given matrix over any field of short linear relations. We believe that this tool might be of independent interest and that it might find other applications in random matrix theory.

We proceed to state the rank formula, the main result of the paper. Subsequently in Section 1.3 we discuss further related work and elaborate on the difficulties posed by working with arbitray (infinite) fields and how we cope with them.

### 1.2. The rank formula

Following  we consider a fairly general model of sparse random matrices. The model allows us to specify the distribution of the number of non-zero entries in the rows and columns, with only mild assumptions on the moments of these distributions. Specifically, let

be integer-valued random variables such that

for some real . Let and . Also let be any field and let be an -valued random variable. Further, let be a sequence of mutually independent copies of , respectively. Moreover, for an integer divisible by the greatest common divisor of the support of let be a Poisson random variable, independent of everything else. Given the event

 n∑i=1di=m∑i=1ki, (1.1)

draw a simple bipartite graph with vertex sets and such that the degree of equals and the degree of equals for all uniformly at random. Then naturally induces a random matrix whose non-zero entries correspond to the edges of . Namely, let be the -matrix with entries

 Aij =1{aixj∈E(G)}⋅χi,j. (1.2)

Thus, the th column of features precisely non-zero entries, and similarly the th row contains non-zero entries. Routine arguments show that the random matrix model is well-defined, i.e., that (for large enough ) the event (1.1) occurs and there exists a simple graph with the desired degrees with positive probability [11, Proposition 1.9].

Let

 D(x) =∞∑h=1P[d=h]xh, K(x) =∞∑h=1P[k=h]xh

be the probability generating functions of . Since , the sums converge and are continuously differentiable on the unit interval. Therefore,

 Φ(x) =D(1−K′(x)/k)+dk(K(x)+(1−x)K′(x)−1).

exists for all .

###### Theorem 1.1.

For any field and for all as above we have

 limn→∞rk(A)n =1−maxα∈[0,1]Φ(α) in probability. (1.3)

For finite fields the formula (1.3) was established recently . Moreover, a simple but elegant argument due to Lelarge  shows that the the r.h.s. of (1.3) provides an upper bound on the rank over any field. Hence, the contribution of the present paper consists in the matching lower bound.

The right-hand side of (1.3) depends only on the degree distributions and . In other words, the very same expression that yields the rank of a random matrix over also gives the rank over the reals, the -adic numbers or, say, a structure as rich as a function field. Also the distribution from which the non-zero matrix entries are drawn is inconsequential. This hints at the rank being driven by an abstract, general principle. Part of the present contribution is to elucidate this abstract explanation.

The formula (1.3) does not generally match the most immediate guess that one might be tempted to put forward. Indeed, there is a simple graph-theoretic upper bound on the rank, the 2-core bound . The 2-core of the matrix is obtained by repeatedly applying the following two operations to the matrix.

• remove any all-zero columns,

• remove any columns with a single non-zero entry along with the row where that non-zero entry occurs.

The matrix that results from this process is a minor of . Let be the number of rows and columns of

. By construction, any vector in the kernel of

extends in at least one way to a vector in the kernel of . In particular, the set of all extensions of the zero vector in the kernel of to a vector in the kernel of is a linear subspace of . A few lines of linear algebra reveal that the dimension of this subspace, and hence of , is lower bounded by . Furthermore, [11, Theorem 1.3] shows that

 limn→∞n−n∗−(m−m∗)n =Φ(ρ) in probability, where ρ =sup{x∈[0,1]:Φ′(x)=0}.

Consequently, this purely graph-theoretic consideration yields the upper bound

 limsupn→∞rk(A)n ≤1−Φ(ρ). (1.4)

In fact, non-rigorous physics deliberations led to the ‘prediction’ that the bound (1.4) is generally tight for matrices over finite fields [2, 33].

However, Lelarge  refuted this prediction. Indeed, combining the formula for the matching number of random bipartite graphs from  with the Leibniz determinant formula, Lelarge obtained the upper bound111The argument is as easy as it is oblivious to the field and to : if the matrix has rank , then there exists a regular minor. The determinant of this minor is therefore non-zero. Hence, in the Leibniz expansion of some permutation renders a non-zero contribution. This permutation thus induces a matching of size in the random bipartite graph . Hence, an upper bound on the matching number entails an upper bound on the rank.

 limsupn→∞rk(A)n ≤1−maxα∈[0,1]Φ(α).

He also produced an example of a pair of degree distributions for which .

Let us conclude this section by glimpsing at a few immediate applications of Theorem 1.1 to specific random matrix models. The 2-core bound turns out to be tight in the first two examples but not in the last one. Figure 1. Left: the function Δ↦2−maxα∈[0,1]exp(−Δexp(Δ(α−1)))+(1+(1−α)Δ)exp(Δ(α−1)) for Example 1.2. Middle: the function d↦1−maxα∈[0,1]exp(−dαk−1)−d(1−kαk−1+(k−1)αk)/k from Example 1.3 with k=3. Right: the function Φ(x) from Example 1.4.
###### Example 1.2 (the adjacency matrix of random bipartite graphs).

Let be a random bipartite graph on vertices such that for any the edge is present with probability independently. Setting for a fixed , we see that for large the degrees of the are asymptotically distributed. Indeed, with the choice and the adjacency matrix and can be coupled such that w.h.p. Hence, Theorem 1.1 implies that

 limn→∞rkA(G(n,n,p))2n =1−maxα∈[0,1]Φ(α),with Φ(x) =exp(−Δexp(Δ(x−1)))+(1+(1−x)Δ)exp(Δ(x−1))−1.

A bit of calculus shows that in this example, i.e., the 2-core bound is tight.

###### Example 1.3 (fixed row sums).

Motivated by the minimum spanning tree problem in weighted random graphs, Cooper, Frieze and Pegden  studied the rank of the random matrix with degree distributions fixed and over the field . The same rank formula was obtained independently in . Extending the results from [4, 14], Theorem 1.1 shows that the rank of the random matrix with these degrees over any field is given by

 limn→∞rkAn =1−maxα∈[0,1]Φ(α),where Φ(x) =exp(−dxk−1)−dk(1−kxk−1+(k−1)xk).

Once more  shows that the 2-core bound is tight in this example.

###### Example 1.4 (non-exact 2-core bound).

There are plenty of choices of for which the 2-core bound fails to be tight, but degree distributions that render graphs whose 2-core is combinatorially instable furnish particularly egregious offenders. In such graphs the removal of a very small number of randomly chosen vertices likely causes the 2-core to unravel entirely. Analytically, this instability manifests itself in from (1.4) being a local minimum of . For instance, letting be the distributions with and , we find and , while the global maximum is attained at .

### 1.3. Discussion and related work

Prior work on the rank of random matrices relies on two separate sets of techniques, depending on whether the average number of non-zero entries per row/column is bounded or unbounded.

#### 1.3.1. Dense matrices

The difficulty of the rank problem for dense matrices depends on the distribution of the matrix entries. For instance, a square matrix with independent Gaussian entries is almost surely regular for the obvious reason that the submanifold of singular matrices has Lebesgue measure zero. In effect, random -matrices with independent Gaussian entries almost surely have full real rank. By contrast, the case of matrices with indepedent uniform entries is more subtle. Komlós  proved by way of the determinant that such square matrices are regular with high probability. As a consequence, a random -matrices with independent uniform entries has full real rank with high probability. Vu  presented a simpler proof of Komlós’ result. In fact, an intriguing conjecture, which has inspired an impressive line of research, e.g. [23, 37, 38], asserts that the dominant reason for a square random -matrix being singular is the existence of a pair of identical rows or columns.

Interesting enough, the probability that a dense square matrix with entries drawn uniformly from a finite field is singular converges to a number strictly between zero and one as the size of the matrix tends to infinity. In fact, Kovalenko  obtained a very precise formula for the distribution of the rank of random matrices with independent uniform entries over . The result extends to arbitrary finite fields 

. The rank of dense matrices with a positive fraction of non-zero entries drawn from non-uniform distributions has been investigated as well

[31, 32].

A further line of work deals with the rank of random matrices that are sparser but still have an unbounded average number of non-zero entries in each row or column. Balakin  and Blömer, Karp and Welzl  dealt with the rank of such matrices over finite fields. Furthermore, Costello and Vu [16, 15] studied the real rank of random symmetric matrices of a similar density. The basic combinatorial phenomena exhibited in all these works are broadly similar to those that drive the connectivity threshold of the binomial random graph, which occurs when the average degree is . Namely, random graphs of average degree is

are already ‘essentially connected’, apart from a few isolated vertices or possibly the odd bounded-sized component. Thus, the obstacle to connectivity is purely local. Similarly, the random matrix with

non-zero entries per row essentially has full rank, apart from a very small number of linear relations caused by local defects. In the words of , “dependency should come from small configurations”.

#### 1.3.2. Sparse matrices

Matters are quite different in the sparse case, i.e., if the average number of non-zero entries per row or column is bounded. In fact, we shall discover that the formula from Theorem 1.1 is driven by “dependency coming from large configurations”, i.e., by minimally linearly dependent sets of unbounded size. But let us first review the literature on sparse matrices.

The first major contribution was a paper by Dubois and Mandler  on the random -XORSAT problem. Translated into the language of linear algebra, this problem asks for what ratios the rank of a random -matrix over with precisely three one-entries per row is equal to w.h.p. The distribution of the random matrix is essentially the same as in Example 1.3. Dubois and Mandler pinpointed the precise threshold . The proof relies on a delicate moment calculation. To be precise, the proof strategy is to calculate the expected size of the kernel of 2-core matrix . Because the entries of are stochastically dependent, the calculation turns out to be moderately delicate. In fact, matters get worse when one considers a greater number of non-zero entries per row. This more general problem, known as random -XORSAT, was solved independently by Dietzfelbinger et al.  and by Pittel and Sorkin . Also considering more general fields with complicates the moment calculation enormously. Yet undertaking a technical tour-de-force Falke and Goerdt  managed to extend the moment approach to .

Ayre, Coja-Oghlan, Gao and Müller  proposed a different strategy to cope with sparse random matrices with precisely non-zero entries per row over general finite fields (the model from Example 1.3). Instead of performing a moment calculation,  relies on a coupling argument inspired by the Aizenman-Sims-Starr scheme from mathematical physics . The basic idea is to set up a coupling of a random matrix with columns and a slightly larger random matrix with

columns and to estimate the expected difference of their nullities. Rougly speaking, this boils down to calculating the probability that a random vector from the kernel of the matrix with

rows ‘survives’ the addition of another random row. The proof strategy was subsequently extended by Coja-Oghlan and Gao  to obtain the -rank of the general sparse random matrix model that we also study in the present paper.

Unfortunately, none of the aforementioned proof strategies extend to infinite fields. The moment method from [17, 18, 20, 34] does not extend because it inherently assumes that the set of potential vectors in the kernel stems from a finite ground set. Similarly, the idea harnessed in [4, 11] of calculating the probability that a random vector from the kernel ‘survives’ the addition of a new row breaks down once the kernel is infinite. Nonetheless, we will discover in the next section how the strategy from  can be repaired thanks to the addition of a new ingredient. Namely, we will replace the probabilistic reasoning from  by a more abstract algebraic insight.

The single prior contribution on the real rank of sparse random matrices is due to Bordenave, Lelarge and Salez , who computed the rank of the (symmetric) adjacency matrix of a random graph. The random graph model that they consider is fairly general: they study random graphs with a prescirbed tree limit in the Benjamini-Schramm topology. This model encompasses the sparse binomial random graph as well as random graphs with given degrees (subject to a moment condition). However,  requires a technical assumption on the Benjamini-Schramm limit to ensure that the 2-core bound is tight. The proof is based on extending fixed point calculations on random trees to the actual random graph, an approach that has been dubbed the ‘objective method’ .

### 1.4. Preliminaries

We use standard asymptotic notation etc. to refer to the limit . Additionally, we use the symbols etc. to refer to the limit . Further, we denote by a size-biased version of the random variable , i.e.,

 P[^k=h] =hP[k=h]E[k].

There is a natural representation of matrices by graphs. Indeed, given an matrix the Tanner graph is the bipartite graph with vertex set in which and are connected by an edge iff . Thus, the nodes represent the columns of or, equivalently, the variables of the homogeneous linear system induced by . We therefore refer to as the variable nodes of . Moreover, the represent the rows of or, equivalently, the individual linear equations of which the homogeneous system defined by is composed. Following coding theory terminology, we refer to as the check nodes. Of course, merely represents the positions of the non-zero entries of but not their values.

Conversely, given a bipartite multi-graph on a set of of variable nodes and a set of of check nodes we define a random matrix as follows. Suppose that , with the same edge possibly occuring several times in the multi-set. Moreover, let be independent copies of . Then we let

 Aax(G) =ℓ∑i=1χi1{fivi=ax} (x∈V, a∈F). (1.5)

Thus, each -edge in gives rise to a non-zero summand in (1.5). In effect, the matrix has at most non-zero entries (due to possible cancellations).

## 2. Proof strategy

Like in the two prior contributions [4, 11] that dealt with case of random matrices over finite fields, the scaffolding of the proof is provided by a coupling argument reminiscent of the ‘Aizenman-Sims-Starr scheme’ of mathematical physics . But the way we put this framework to work will be quite different. Basically, to deal with arbitray fields we will replace the probabilistic deliberations from  by more abstract algebraic ones, which constitute the main novelty of this paper. The algebraic approach leads to a simplified proof even in the case of finite fields. Yet fortunately, for much of the technical legwork, particularly coupling arguments clarifying the relations between various random matrix models, we can resort to , as this part is independent of the underlying field. We proceed to set out the key elements of this proof strategy.

### 2.1. The Aizenman-Sims-Starr scheme

In a nutshell, the prescription of the Aizenman-Sims-Starr scheme goes as follows . In order to calculate the mean of a random variable on a random ‘system’ of size in the limit , calculate the difference upon going to a system of size . To this end, design a coupling of the systems of sizes and such that the latter results from the former by adding only a bounded number of elements. To apply this recipe to the rank problem, write for the random matrix from (1.2) with columns. Since adding or removing a single row can only change the rank by one, Azuma’s inequality shows that it suffices to compute . In fact, since the upper bound on the rank already follows from , we merely need to bound from below, or equivalently bound from above. We are thus tempted to write

 limsupn→∞1nE[nul(An)] =limsupn→∞1nn−1∑N=1E[nul(AN+1)]−E[nul(AN)]≤limsupn→∞E[nul(An+1)]−E[nul(An)]. (2.1)

Further, to calculate the last expression we should couple and such that the former is obtained from the latter by adding a few rows and a column.

But this strategy runs into trouble because of the rigidty of the random matrix model . Indeed, depending on the precise choice of the distributions , may or may not be defined for all due to divisibility issues. (As a concrete example, consider the case deterministically. Then , which is possible only if is divisible by .) Following , we will deal with this issue by way of a relaxed random matrix model that, without significantly affecting the rank, allows for a bit of wiggling room. Formally, fix a parameter . Then for any integer we construct a random matrix as follows. Let be Poisson variables with means

 E[Mn,i] =(1−ε)P[k=i]dn/k (2.2)

that are mutually independent as well as independent of everything else. Let

 mε,n =∑i≥0Mn,i. (2.3)

Further, obtain a random Tanner graph with variable nodes and check nodes , , by drawing a random maximal matching of the complete bipartite graph with vertex sets

 n⋃h=1{xh}×[dh]% and⋃i≥3Mi⋃j=1{ai,j}×[i]. (2.4)

For each matching edge insert an edge between and into . Finally, let be the random matrix induced by this Tanner graph. We observe that it suffices to estimate the nullity of .

###### Proposition 2.1 ([11, Proposition 2.4]).

We have .

The random matrix is designed to mimic the matrix obtained from the original model by deleting every row with probability independently. (Of course, the latter model would be unworkable because still it is not defined for all .) Indeed, for each the expected number of rows with non-zero entries in equals , which explains (2.2). Further, the construction of is akin to the well known pairing model of random graphs with given degrees . Specifically, think of the vertices in the two sets (2.4) as sockets and of the edges of the matching as wires. Since (2.2) ensures that the expected number of sockets corresponding to check nodes equals while the expected number of variable sockets equals , and since these numbers are tightly concentrated, with high probability will occupy all check sockets but leave about variable sockets vacant. We refer to the vacant sockets as cavities.

The cavities provide the manoeuvring space that we need to couple and . A first idea should be to couple and such that the former is obtained from the latter by adding one variable along with new adjacent checks. Additionally, the new checks get connected with some random cavities of . Thus,

 Aε,n+1 =(Aε,n0BC), (2.5)

where has columns and is a column vector. The expected numbers of non-zero entries of are bounded. But this direct coupling does not quite suffice to estimate the nullity for a subtle reason. Therefore, we will instead obtain both and by adding a few rows/columns to a common base matrix , which is close to in total variation. Hence, instead of (2.5) we obtain a coupling of the form

 Aε,n+1 =(A′0BC), Aε,n =(A′B′). (2.6)

Since this coupling works for all , we replace (2.1) by

 limsupn→∞1nE[nul(Aε,n)] ≤limsupn→∞E[nul(Aε,n+1)−nul(Aε,n)]. (2.7)

Hence, in light of Proposition 2.1, the task of proving Theorem 1.1 comes down to establishing the following.

###### Proposition 2.2.

We have

The rest of the paper largely deals with the proof of Proposition 2.2. We begin by reviewing the approach pursued in  in the case of finite fields.

### 2.2. Finite fields and stochastic independence

The coupling scheme (2.6) basically reduces our task to computing the difference

 E[nul(A′0BC)−nulA′],

where are sparse. For finite fields computing the difference of the nullities is equivalent to computing

 ∣∣∣ker(A′0BC)∣∣∣/∣∣kerA′∣∣ =1|kerA′|∑σ∈kerA′τ∈F1{(B C)(στ)=0} (2.8)

The right hand side of (2.8) admits a probabilistic interpretation. Indeed, the expression just equals the expected number of ways in which a uniformly random extends to a vector in the kernel of the enhanced matrix obtained by attaching . Clearly, in order to calculate this expectation we need to confront the stochastic dependencies among the entries of . To be precise, with the set of columns where has a non-zero entry, we need to get a handle on the stochastic dependencies among . Crucially, the expected size of is bounded.

A key lemma from  deals with dependencies among bounded numbers of entries of . Specifically, the following definition introduces a small perturbation, applicable to any matrix, which, we will show, mostly eliminates dependencies among small numbers of coordinates.

###### Definition 2.3.

Let be an matrix and let be an integer. Let be uniformly random and mutually independent column indices. Then is obtained by adding new rows to such that for each the th new row has precisely one non-zero entry, namely a one in the th column.

Let denote the uniform distribution on .

###### Lemma 2.4 ([4, Lemma 3.1]).

For any , and for any finite field there exists such that for any matrix over the following is true. Choose uniformly at random. Then with probability at least the matrix satisfies

 ∑I⊆[n]:|I|=ℓmaxτ∈FI∣∣ ∣∣μA[θ]({∀i∈I:σi=τi})−∏i∈IμA[θ]({σi=τi})∣∣ ∣∣ <δnℓ. (2.9)

In words, for most sets of

coordinates the joint distribution of the coordinates

is close to a product distribution in total variation distance. Furthermore, the number of rows that we add to is bounded in terms of only; i.e., does not depend on the size of or on the matrix itself. Lemma 2.4 and its proof are inspired by the ‘pinning lemma’ from .

Equipped with Lemma 2.4,  proceeds by applying the Aizenman-Sims-Starr scheme to the enhanced matrix , with a suitable choice of the parameter . Just as in (2.6), the matrices are obtained by attaching sparse random to a common base matrix . One thus has to control the joint distribution of a random on the columns where features a non-zero entry (and similarly for ). Since is typically bounded and the set turns out to be ‘sufficiently random’, thanks to Lemma 2.4 the may be treated as stochastically independent, which enables the proof of Proposition 2.2 for finite fields.

Unfortunately, this strategy breaks down on infinite fields . The simple reason is that the kernel of may very well be infinite. In effect, expressions such as (2.8) do not make sense. Furthermore, as generally there is no such thing as the ‘uniform distribution on the kernel’, (2.9) does not make much sense either.

### 2.3. General fields and linear independence

We overcome these difficulties by working with linear rather than stochastic independence. While Lemma 2.4 shows that the perturbation likely eliminates most stochastic dependencies amongst small numbers of entires of a random , the key insight in the present work is that the very same perturbation also eliminates most short linear dependencies. This algebraic approach works over any field, not just a finite one. The following definition furnishes the necessary terminology. Recall that the support of a vector is defined as .

###### Definition 2.5.

Let be an -matrix over a field .

• A set is a relation of if there exists a row vector such that .

• If is a relation of , then we call frozen in . Let be the set of all frozen .

• A set is a proper relation of if is a relation of .

• For , we say that is -free if there are no more than proper relations of size .

If a set is relation of , then by adding up suitable multiples of the rows of the homogeneous linear system we can infer a non-trivial linear relation among the variables only. In the simplest case the set may be a singleton. Then the resulting linear relation involves only. Thus, the th component of any vector in the kernel of must be equal to zero. In this case we say that variable is frozen. Further, excluding frozen variables, a proper relation relation of renders a non-trivial dependency amongst two or more of the variables Finally, is -free if only few -subsets induce a proper relation. With these concepts in place, the promised algebraic generalisation of Lemma 2.4 reads as follows.

###### Proposition 2.6.

For any , there exists such that for any matrix over any field the following is true. With chosen uniformly at random, is -free with probability greater than .

The proof of Proposition 2.6, which we defer to Section 3, relies on a potential function argument. Proposition 2.6 is an actual generalisation of Lemma 2.4; for if is finite and is -free, then the bound (2.9) is easily verified to hold as well. Indeed, as we will see in Section 5, unless is a proper relation we have

 μA[θ]({∀i∈I:σi=τi}) =∏i∈IμA[θ]({σi=τi}) for all τ∈FI.

Thus, the absence of linear dependencies implies that of stochastic ones.

We prove Proposition 2.2 for general fields by combining Proposition 2.6 with the Aizenman-Sims-Starr coupling argument. As we saw in Section 2.1, this comes down to studying the change of the rank upon addition of a few rows or columns. The following lemma shows how this calculation can be performed in the absence of proper relations.

###### Lemma 2.7.

Let be matrices of size , and , respectively, and let be the set of all indices of non-zero columns of . Moreover, obtain from by replacing for each the ’th column of by zero. Unless is a proper relation of we have

 nulA−nul(A0BC)=rk(B′ C)−n′. (2.10)

Thus, we will use Proposition 2.6 to ensure that most likely the random set of non-zero columns of the matrix that we attach to does not form a proper relation. Then Lemma 2.7 shows that we can calculate the change in rank by merely considering the set of frozen variables and the rank of the matrix that we attach to .

Before delving into the technical details of the proofs of Propositions 2.2 and 2.6 and Lemma 2.7 we show how to complete the proof of Theorem 1.1.

### 2.4. Proof of Theorem 1.1

We require the following concentration bound for the nullity of .

###### Lemma 2.8 ([11, Lemma 5.7]).

For any we have .

Theorem 1.1 is now an immediate consequence of Proposition 2.1, Proposition 2.2 and Lemma 2.8. Indeed, Proposition 2.2 implies together with (2.7) that

 limsupε→0limn→∞1nE[nul(Aε,n)] ≤maxα∈[0,1]Φ(α).

Hence, Lemma 2.8 shows that for any for sufficiently small we have

 limsupn→∞P[nul(Aε,n)/n>δ+maxα∈[0,1]Φ(α)] =0.

Combining this bound with Proposition 2.1, we conclude that for any for small enough ,

 liminfn→∞P[rk(An)/n≥1−maxα∈[0,1]Φ(α)−δ] =liminfn→∞P[nul(An)/n≤δ+maxα∈[0,1]Φ(α)]=1. (2.11)

Since the upper bound on the rank follows from , (2.11) implies that converges to in probability, as claimed. ∎

## 3. Algebraic considerations

In this section we prove Proposition 2.6 and Lemma 2.7. The somewhat delicate proof of the former is based on a blend of probabilistic and algebraic arguments. The proof of the latter is purely algebraic and fairly elementary.

### 3.1. Proof of Proposition 2.6

Choosing sufficiently large, we may safely assume that the number of columns of the matrix is large enough. Moreover, given any matrix we define a minimal -relation of as a relation of of size that does not contain a proper subset that is a relation of . Let be the set of all minimal -relations of and set . Thus, is just the number of frozen variables of . Additionally, let and .

The proof of Proposition 2.6 is based on a potential function argument. To get started we observe that

 R1(A[t])⊆R1(A[t+1]) for all t≥0. (3.1)

This inequality implies that the random variable

 Δt =E[R1(A[t+ℓ])∣A[t]]−R1(A[t])n

is non-negative. The random variable gauges the increase in frozen variables upon addition of more rows that expressly freeze specific variables. Thus, ‘big’ values of , say , witness a kind of instability as pegging a few variables to zero entails that another variables get frozen to zero due to implicit linear relations. We will exploit the observation that, since and is monotonically increasing in , such instabilities cannot occur for many . Thus, the expectation will serve as our potential. A similar potential was used in  to prove Lemma 2.4; but in the present more general context the analysis of the potential is significantly more subtle. The following lemma puts a lid on the potential.

We have .

###### Proof.

For any we have

 ∑j≥0E[Δr+jℓ] =1n∑j≥0E[R1(A[r+(j+1)ℓ])]−E[R1(A[r+jℓ])]≤1nlimj→∞E[R1(A[r+jℓ])]=1.

Summing this bound on , we obtain

 ∑θ∈[Θ]E[Δθ] ≤ℓ−1∑r=0∑j≥0E[Δr+jℓ]≤ℓ. (3.2)

Since is chosen uniformly and independently of everything else, dividing (3.2) by completes the proof. ∎

The following lemma shows that unless is -free, there exist many minimal -relations for some .

###### Lemma 3.2.

If fails to be -free then there exists such that .

###### Proof.

Assume that

 Rh(A[t]) <εnh/ℓ for all 2≤h≤ℓ. (3.3)

Since every proper relation of size contains a minimal -relation for some , (3.3) implies that possesses fewer than proper relations of size in total. Hence, if (3.3) holds, then is -free. ∎

As a next step we show that is large if possesses many minimal -relations for some .

###### Lemma 3.3.

If for some , then .

###### Proof.

Let be the set of all relations that contain and set . Moreover, let be the set of all with . Since , double counting yields

 |Vt,h|≥εn2ℓ. (3.4)

Consider along with a minimal -relation . If , i.e., comprises and the next indices that get pegged, then . Indeed, since is a minimal -relation of there is a row vector such that . Hence, if , then we can extend to a row vector such that by picking appropriate values for the last entries, and thus . Furthermore, since is uniformly random, we conclude that

 P[I={v,it+1,…,it+h−1}∣A[t]] =(h−1)!/nh−1≥n1−h. (3.5)

Now, because every satisfies , (3.5) implies that

 P[v∈F(A[t+h−1])∣A[t]] ≥rv,t,h/nh−1≥εh/(2ℓ). (3.6)

We also notice that because no minimal -relation contains a frozen variable. Therefore, combining (3.1), (3.4) and (3.6) and using linearity of expectation, we obtain

 Δt ≥∑v∈Vt,hP[v∈F(A[t+h−1])∣A[t]]≥εh|Vt,h|2ℓn≥ε2h4ℓ2≥ε22ℓ2,

as desired. ∎

Combining Lemmas 3.2 and 3.3, we immediately obtain the following.

###### Corollary 3.4.

If fails to be -free then .

We have all the ingredients in place to complete the proof of Proposition 2.6.

###### Proof of Proposition 2.6..

We define so that

 P[A[θ] is (ε,ℓ)-free] >1−ε/2−P[θ∈T]. (3.7)

Hence, we are left to estimate . Applying Corollary 3.4, we obtain for every ,

 E[Δt] ≥ε22ℓ2⋅P[A[t] fails to be (ε,ℓ)-free]≥ε34ℓ2. (3.8)

Moreover, averaging (3.8) on and applying Lemma 3.1, we obtain

 ε34ℓ2⋅P[θ∈T] =ε34ℓ2⋅|T|Θ≤1Θ∑t∈TE[Δt]≤E[Δθ]≤ℓΘ.

Consequently, choosing , we can ensure that . Thus, the assertion follows from (3.7). ∎

### 3.2. Proof of Lemma 2.7

We are going to derive Lemma 2.7 from the following simpler statement.

###### Lemma 3.5.

Let be a matrix, let be a matrix and let be a matrix. Let be the set of all indices of non-zero columns of . Unless is a relation of we have

 nulA−nul(A0BC)=rk(B C)−n′. (3.9)
###### Proof.

Suppose that is not a relation of . We begin by showing that

 nulA−nul(AB)=rk(B). (3.10)

Writing for the rows of and for the rank and applying a row permutation if necessary, we may assume that are linearly independent. Hence, to establish (3.10) it suffices to prove that for all ,

 rk⎛⎜ ⎜ ⎜ ⎜⎝AB1⋮Bℓ⎞⎟ ⎟ ⎟ ⎟⎠

In other words, we need to show that does not belong to the space spanned by and the rows of . Indeed, assume that . Then and thus , in contradiction to the assumption that is no relation of . Hence, we obtain (3.11) and thus (3.10). Finally, to complete the proof of (3.9) we apply (3.10) to the matrices and , obtaining

 =nul(A 0)−nul(A0BC)=rk(B C),

as desired. ∎

###### Proof of Lemma 2.7.

For any the ’th standard unit row vector can be written as a linear combination of the rows of . Since elementary row operations do not alter the nullity of a matrix, we therefore find

 nul(A0BC) =nul(A0B′C).

The assertion thus follows from Lemma 3.5. ∎

## 4. Coupling arguments

In this section we prove Proposition 2.2 by coupling and . Fortunately, we can reuse some of the considerations from prior work, where the very same coupling was set up for random matrices over finite fields. Some parts of the technical legwork that was used, e.g., to bound the contribution from exceptional cases generalise to arbitrary fields without the need to change a single iota, and so we will simply refer to  for those bits. However, the calculations of the nullity that we conduct here differ significantly from those performed in . The reason is that we will seize the algebraic perspective provided by Proposition 2.6 and Lemma 2.7. Although the present arguments are more general as they cover both finite and infinite fields in one sweep, they are actually simpler and more transparent than their probabilistic counterparts in .

### 4.1. The coupling scheme

We proceed to construct the coupling of and in detail. As hinted at earlier in equation (2.6), we will actually construct a coupling under which both and