 # On the Bias of Reed-Muller Codes over Odd Prime Fields

We study the bias of random bounded-degree polynomials over odd prime fields and show that, with probability exponentially close to 1, such polynomials have exponentially small bias. This also yields an exponential tail bound on the weight distribution of Reed-Muller codes over odd prime fields. These results generalize bounds of Ben-Eliezer, Hod, and Lovett who proved similar results over F_2. A key to our bounds is the proof of a new precise extremal property for the rank of sub-matrices of the generator matrices of Reed-Muller codes over odd prime fields. This extremal property is a substantial extension of an extremal property shown by Keevash and Sudakov for the case of F_2. Our exponential tail bounds on the bias can be used to derive exponential lower bounds on the time for space-bounded learning of bounded-degree polynomials from their evaluations over odd prime fields.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Reed-Muller codes are among the oldest error correcting codes, first introduced by Muller  and Reed  in the 1950s. These codes were initially defined in terms of bounded-degree multivariate polynomials over but the same definition can be applied over any finite field. To be more precise, the Reed-Muller code over finite field , denoted , takes the message as the coefficients of some -variate polynomial of degree at most over , and the encoding is simply the evaluation of that polynomial over all possible inputs chosen from .

A function is balanced if elements of occurs an equal number of times as an output of . The bias of a function with co-domain is a measure of the fractional deviation of from being balanced. Since each codeword in a Reed-Muller code is the evaluation of a (polynomial) function over all elements of its domain, the definition of bias directly applies to the codewords of a Reed-Muller code.

Some elements of a Reed-Muller code are very far from balanced (for example the 0 polynomial yields the all-0 codeword, and the codeword for the polynomial has value 1 much more frequently than average) but since, as we might expect, randomly-chosen polynomials behave somewhat like randomly-chosen functions, most codewords are close to being balanced. We quantify that statement and show that for all prime fields, only an exponentially small fraction of Reed-Muller codewords (equivalently, an exponentially small fraction of polynomials of bounded degree) have as much an exponentially small deviation from perfect balance. That is, at most an exponentially small fraction of polynomials have more than an exponentially small bias. Such a result is already known for the case of   so we will only need to prove the statement for odd prime fields.

We now define bias formally and discuss its applications. In the case that , the bias of ,

 \bias(f):=12n∑x∈Fn2(−1)f(x)=Prx∈RFn2[f(x)=0]−Prx∈RFn2[f(x)=1].

More generally, for a prime, , and , we define the -th order bias of as

 biasj(f):=1pn∑x∈Fnpωj⋅f(x).

Prior uses of bias over these larger co-domains often focus only on the case of a single  (e.g., [6, 11]) since they consider structural implications of bias. However, the use of different values of is essential for the applications of bias to bounding the imbalance of functions and codewords since, for , one can have functions with 1st-order bias 0 that are very far from balanced. It turns out that it is necessary and sufficient to bound for all (or, equivalently, all integers with since ) in order to bound the imbalance: A standard exponential summation argument (e.g., Proposition 2.1 in [2, 3]), shows that for every ,

 ∣∣∣Prx∈RFnp[f(x)=b]−1p∣∣∣⩽maxj∈F∗p|biasj(f)|.

For Reed-Muller codes, the bias of a codeword exactly determines its fraction (number of non-zero entries, which is called the weight of the codeword. (In the case of the bias is determined by the weight but that is not true for for odd prime .) The distribution of weights of codewords in Reed-Muller codes over plays a critical role in many applications in coding theory and in many other applications in theoretical computer science. As a consequence, the weight distribution of Reed-Muller codes over has been the subject of considerable study. For degrees and , the exact weight distribution (and hence the distribution of the bias) for has been known for roughly 50 years [19, 16]. For other degrees, precise bounds are only known for weights up to 2.5 times the minimum distance of such codes [12, 13] but this is very far from the balanced regime.

For general constant degrees, Kaufman, Lovett and Porat  give a somewhat tight bound on the weight distribution for Reed-Muller codes over , and Abbe, Shpilka, and Wigderson  generalize the result to linear degrees. These results yield tail bounds for the number of codewords with bias approaching 0 and, using the cases for arbitrarily small constant bias, imply good bounds for list-decoding algorithms [9, 14].

Ben-Eliezer, Hod, and Lovett  proved sharper bounds showing that the fraction of codewords with more than exponentially small bias (of the form for constant ) is at most for constant where is the dimension of the code. (For they also showed that this fraction of codewords is tight by exhibiting a set of codewords in of size for that has such a bias.) This bound was used by [2, 3, 8] to show that learning bounded degree polynomials over from their evaluations with success probability requires space or time .

#### Our Results

We generalize the results of Ben-Eliezer, Hod, and Lovett  to show that only an exponentially small fraction of polynomials over prime fields can have non-negligible bias. Formally speaking, let denote the set of polynomials of degree at most in variables over , and let denote the set of monic monomials of degree at most in variables. (The Reed-Muller code has dimension and satisfies .)

Our main result is the following theorem:

###### Theorem 1.1.

For any there are constants depending on such that for any odd prime , for all integers and all , we have

 Prf∈RPp(d,n)[|biasj(f)|>p−c1n/d]⩽p−c2|Mp(d,n)|.

Using this theorem together with the methods of our companion paper [2, 3] or of , we obtain that any algorithm that learns polynomials over of degree at most with probability at least from their evaluations on random inputs either requires time or space . For the details, see .

The following corollary of Theorem 1.1 is also immediate:

###### Corollary 1.2.

For any there are constants such that for any odd prime and integers , with , the number of codewords of of weight at most is at most .

There is a limit to the amount that Theorem 1.1 can be improved, as shown by the following proposition:

###### Proposition 1.3.

For any there are constants and depending on such that for all integers and all , we have

 Prf∈RPp(d,n)[|biasj(f)|>p−c′′n/d]⩾p−c′|Mp(d,n)|.

As part of our proof of Theorem 1.1, we must prove the following tight bound on the rank of the evaluations of monomials of degree at most on sets of points. Alternatively this can be seen as the extremal dimension of the span of truncated Reed-Muller codes at sizes that are powers of the field size.

###### Lemma 1.4.

Let be a subset of such that . Then the dimension of the subspace spanned by is at least .

Though this is all that we require to prove Theorem 1.1, we prove it as a special case of a more general theorem that gives an exact extremal characterization of the dimension of the span of truncated Reed-Muller codes of all sizes. This generalizes a characterization for the case of proved by Keevash and Sudakov .

###### Theorem 1.5.

Let and let . For with ,

 dim⟨{(q(x))q∈Mp(d,n) : x∈S}⟩⩾dim⟨{(q(x))q∈Mp(d,r) : x∈T}⟩,

where consists of the

lexicographically minimal vectors in

. (This is equality when is also lexicographically minimal.)

Thus, the extremal value of the dimension is a function that is independent of . As part of the proof of Theorem 1.5, we characterize a variety of properties of .

#### Proof Overview

Our basic approach is a generalization of the high level outline of  to odd prime fields, though parts of the argument are substantially more complex:

We begin by using a moment method, showing that that

is bounded for suitable . Because we are dealing with odd prime fields rather than we restrict ourselves to the case that is even. For bounding these high moments, we reduce the problem to lower bounding the rank of certain random matrices (Lemma 2.4). This is the place where we can apply Lemma 1.4 to prove the bound.

For the case of handled in , a similar property to Lemma 2.4 (Lemma 4 in ), which follows from an extremal characterization of polynomial evaluations by Keevash and Sudakov , was independently shown to follow more simply via an algorithmic construction that avoids consideration of any subset size that is not a power of 2. Unfortunately, this simpler algorithmic construction seems to break down completely for the case of odd prime fields.

We instead provide the full extremal characterization for all set sizes, analogous to the Keevash and Sudakov characterization for . This is the major source of technical difficulty in our paper. Like Keevash and Sudakov, we show that the proof of our extremal characterization is equivalent to proving the sub-additivity of a certain arithmetic function. However, proving this sub-additivity property is an order of magnitude more involved since it involves sub-additivity over terms for arbitrary rather than just over the two terms required for the case of .

#### Discussion and Related Work

Prior to our work, the main approach to analyzing the bias of polynomials over arbitrary prime fields has been to take a structural point of view. The general idea is to show that polynomials of large bias must have this bias because of some structural property. For polynomials of degree , a complete structural characterization has been known for more than a century (). Green and Tao  initiated the modern study of the relationship between the bias and the structure of polynomials over finite fields. Kaufman, Lovett, and Porat  used this approach to obtain their bounds on bias over . Over general prime fields, Haramaty and Shpilka  gave sharper structural properties for polynomials of degrees . In papers  for constant degree and  for large degree, Bhowmick and Lovett generalized the result of  to show that if a degree polynomial has large bias, then can be expressed as a function of a constant number of polynomials of degree at most . These bounds are sufficient to analyze the list-decoding properties of Reed-Muller codes. However, all of these structural results, except for the characterization of degree 2 polynomials, are too weak to obtain the bounds on sub-constant bias that we derive. Indeed, none is sufficient even to derive Corollary 1.2.

An open problem that remains from our work, as well as that of Ben-Eliezer, Hod, and Lovett  is whether the amount of the bias can be improved still further by removing the factor from the exponent in the bias in the statement of Theorem 1.1 for some range of values of growing with . Though Proposition 1.3 (and its analogue in ) show that a large number of polynomials have bias , we would need to extend them to say that for all there is a such that the conclusion of the proposition holds in order to rule out improving the bias in Theorem 1.1.

#### Organization

The proof of Theorem 1.1, except for the proof of Lemma 1.4, is in Section 2. Section 2 also contains the proof of Proposition 1.3. In Section 3 we reduce the proof of Lemma 1.4, and that of the general extremal rank property of Theorem 1.5, to proving the sub-additivity of the arithmetic function . In Section 4 we introduce some properties of , and finally in Section 5 we prove the sub-additivity of .

## 2 The bias of random polynomials over odd prime fields

In this section we prove Theorem 1.1. To provide tail bounds on the bias, we first characterize its high moments, focusing on even moments to ensure that they are real-valued.

###### Lemma 2.1.

Let be an odd prime and . For , let and be chosen uniformly at random from . Then

 Ef∈RPp(d,n)[|biasj(f)|2t]=Prx(1),⋯,x(t),y(1),⋯,y(t)[∀q∈Mp(d,n),  t∑k=1q(x(k))=t∑k=1q(y(k))].
###### Proof.

Note that , therefore . So we have

 Ef∈RPp(d,n)[|biasj(f)|2t] =Ef∈RPp(d,n)[biasj(f)t⋅bias−j(f)t] =Ef∈RPp(d,n)[t∏k=1Ex(k)[ωj⋅f(x(k))]⋅t∏k=1Ey(k)[ω−j⋅f(y(k))]] =Ef∈RPp(d,n)[Ex(1),⋯,x(t),y(1),⋯,y(t)[ωj⋅(∑tk=1f(x(k))−∑tk=1f(y(k)))]] =Ex(1),⋯,x(t),y(1),⋯,y(t)[Ef∈RPp(d,n)[ωj⋅(∑tk=1f(x(k))−∑tk=1f(y(k)))]]

For each let denote the coefficient of in . We identify with its vector of coefficients and choose uniformly by choosing the uniformly. Therefore

 Ef∈RPp(d,n)[|biasj(f)|2t] =Ex(1),⋯,x(t),y(1),⋯,y(t)[Ef∈RPp(d,n)[ωj⋅(∑q∈Mp(d,n)fq⋅(∑tk=1q(x(k))−∑tk=1q(y(k))))]] =Ex(1),⋯,x(t),y(1),⋯,y(t)[∏q∈Mp(d,n)Efq∈RFp[ωj⋅fq⋅(∑tk=1q(x(k))−∑tk=1q(y(k)))]] =Ex(1),⋯,x(t),y(1),⋯,y(t)[1(∀q∈Mp(d,n), ∑tk=1q(x(k))−∑tk=1q(y(k))=0)] =Prx(1),⋯,x(t),y(1),⋯,y(t)[∀q∈Mp(d,n), t∑k=1q(x(k))=t∑k=1q(y(k))]

where the second equality follows since for all . ∎

Now let us look at the probability

 Prx(1),⋯,x(t),y(1),⋯,y(t)[∀q∈Mp(d,n), t∑k=1q(x(k))=t∑k=1q(y(k))].

We view as arbitrary fixed values and we will upper bound this probability following the analysis of a similar probability in . That is, we will upper bound the probability that this holds by considering a special subset that allows us to derive a linear system whose rank will bound the probability that the constraints indexed by all hold.

We divide arbitrarily into two disjoint parts and with . consists of all monomials of degree are most that have degree 1 on and degree at most on .

We use the following properties of the , whose proof we defer to later, to show that contains a significant fraction of all monomials in .

###### Proposition 2.2.

If for some then

• there exists a constant such that for sufficiently large , if then

 |Mp(d,n′)|⩾γ′|Mp(d,n)|.
• If there exist constants such that for sufficiently large ,

 ρ1|Mp(d,n)|⩽nd⋅|Mp(d−1,n)|⩽ρ2|Mp(d,n)|.
###### Corollary 2.3.

Let . If for some , then there exists a constant such that for sufficiently large ,

 |M′|=⌊nd⌋⋅|Mp(d−1,n−⌊nd⌋)|⩾γ⋅|Mp(d,n)|.
###### Proof.

The equality follows immediately from the definition of . Let . Then

 |M′| =⌊nd⌋⋅|Mp(d−1,n′)| ⩾n′2d|Mp(d−1,n′)|since d⩽n ⩾ρ12|Mp(d,n′)|by Proposition~{}???(b) ⩾ρ1γ′2|Mp(d,m)|by Proposition~{}???(a)

and setting yields the claim. ∎

Let denote the event that for all . To simply notation, since we think of as fixed, for each define by . Since any is of the form for some and a monomial of degree at most on , requires that

 bq=t∑k=1q(x(k))=t∑k=1q′(x(k)R)⋅x(k)i.

where for , we write for restricted to the coordinates in . We view these constraints as a system of linear equations over the set of variables for and whose coefficients are given by the values of for for all . Observe that for different values of we get separate and independent subsystems of equations with precisely the same coefficients but potentially different constant terms since depends on both and . Therefore the probability that is a solution is the product of the probabilities for the individual choices of .

For each , there is a matrix for a system of linear equations on for each , having one constraint for each polynomial of degree at most on . Observe that .

In particular, it follows that

 Prx(1),⋯,x(t),y(1),⋅,y(t)[E∣(x(1)R,⋯,x(t)R)=xR]⩽p−rank(QxR)⋅|L|. (1)

We now see that for almost all choices of , if is at least a constant factor larger than then the rank of is large. This follows by replacing by , by , by and by in the following lemma.

###### Lemma 2.4.

For any there is a constant such that there exist constants and such that for and , if is chosen uniformly at random from , then the matrix given by . then

 Prx[rank(Qx)⩽γ|Mp(d,n)|]⩽p−c|Mp(d+1,n)|.

We first show how to use Lemma 2.4 to prove Theorem 1.1.

###### Proof of Theorem 1.1.

Let , and set and and as in Lemma 2.4. Let . We first bound the expected value of . By Lemma 2.1 and the definition of event we have

 Ef∈RPp(d,n)[|biasj(f)|2t]=Prx(1),⋯,x(t),y(1),⋅,y(t)[E].

Let and . Now by definition,

 Prx(1),⋯,x(t),y(1),⋅,y(t)[E] ⩽ PrxR[rank(QxR)⩽γ|Mp(d,n′)|] +Prx(1),⋯,x(t),y(1),⋅,y(t)[E : rank(QxR)⩽γ|Mp(d′,n′)| for xR=(x(1)R,⋯,x(t)R)] ⩽ PrxR[rank(QxR)⩽γ|Mp(d′,n′)|]+p−γ|Mp(d′,n′)|⋅|L|

by (1). Observe that so we can apply Lemma 2.4 with , , and to derive that

 PrxR[rank(QxR)⩽γ|Mp(d′,n′)|]p−c|Mp(d′+1,n′)|.

Therefore,

 Ef∈RPp(d,n)[|biasj(f)|2t]⩽p−c|Mp(d,n′)|+p−γ|Mp(d−1,n′)|⋅|L|. (2)

Now, for sufficiently large , by Proposition 2.2(a), and by Corollary 2.3 . Therefore,

 Ef∈RPp(d,n)[|biasj(f)|2t]⩽p−cγ′|Mp(d,n)|+p−γ2|Mp(d,n)⩾p−c′|Mp(d,n)|

for some constant . Now we can apply Markov’s inequality to obtain that for any .

 Prf∈RPp(d,n)[|biasj(f)|>p−c1n/d] =Prf∈RPp(d,n)[|biasj(f)|2t>p−2t⋅c1n/d] ⩽p−c′|Mp(d,n)|p−2t⋅c1n/d =p2t⋅c1n/d−c′|Mp(d,n)|

By definition, for a fixed . Therefore, by Proposition 2.2(b), . By choosing , we obtain that and setting we derive that

 Prf∈RPp(d,n)[|biasj(f)|>p−c1n/d]⩽p−c2|Mp(d,n)|

as required. ∎

It remains to prove Lemma 2.4 and Proposition 2.2. We first prove Lemma 2.4 using Lemma 1.4. The proof of Lemma 1.4 is quite involved and forms the bulk of the paper. Its proof is the subsequent sections.

###### Proof of Lemma 2.4 using Lemma 1.4.

Let for and let be the minimum of from Proposition 2.2 and from Corollary 2.3. Fix . We will first check the probability that an arbitrary fixed set of columns spans the whole matrix, and then apply a union bound to obtain the final result.

Let denote the linear space spanned by those columns. Recall that each column of is the evaluation of all monomials of degree at most at some point . (Since , distinct elements of have distinct evaluations.)

Let integer be maximal such that there are at least distinct elements of with evaluations that are in . Then by Lemma 1.4, we have . But since can be spanned by vectors, we have

 γ|Mp(d,n)|⩾b⩾dim(V)⩾|Mp(d,r)|

By Proposition 2.2(a), we have

 |Mp(d,⌈n(1−1/d)⌉)|⩾γ|Mp(d,n)|⩾|Mp(d,r)|

So . There are distinct evaluations and fewer than of them fall into . So a uniform random evaluation is in with probability . Since the other columns of are chosen uniformly and independently, the probability that these columns span the whole matrix is at most

 (p1−⌊n/d⌋)t−b⩽(p1−⌊n/d⌋)(η−γ)|Mp(d,n)|

since for some to be chosen later. Since , we have and we can apply Proposition 2.2 to get that

 (p1−⌊n/d⌋)t−b⩽p−(η−γ)nd⋅|Mp(d,n)|/2⩽p−(η−γ)ρ1|Mp(d+1,n)|/2

for some . Therefore, by a union bound over all choices of columns we have

Note that , so we have

Note that for any constant , is . Therefore, for fixed constant , we can choose a sufficiently large such that

for some constant . ∎

### 2.1 Proof of Proposition 2.2

We first give basic inequalities regarding that are independent of the choice of .

###### Proposition 2.5.

For ,

 d∑i=0(ni)⩽|Mp(d,n)|⩽d∑i=0(n−1+ii)=(n+dd)
###### Proof.

It is well known that there are non-negative integer solutions to the equation . Thus by iterating degrees we have

 |Mp(d,n)|⩽d∑i=0(n−1+ii)=(n+dd)

On the other hand, if we only consider multi-linear terms, we will get

 d∑i=0(ni)⩽|Mp(d,n)|.

We now prove part (a): For where , let denote the set of monomials of the form , . Then we have . Therefore

 |Mp(d,n′)||Mp(d,n)|⩾mine:∑iei⩽d|Me,n′||Me,n|

Now, for fixed , consider the following process to generate elements in : we first choose elements from , then apply permutation over to get . We claim that this process generate each monomial in equally many times, if we go over all elements and all permutations. Indeed, for arbitrary monomials and , can be generated by if and only if can be generated by . Moreover, the number of occurrence for each monomial is precisely the number of satisfying permutations, hence only depends on . Therefore, we have

 |Me,n′||Me,n|=(n′k)(nk)

This quantity is a decreasing function of . Hence we have