# On Multilinear Forms: Bias, Correlation, and Tensor Rank

In this paper, we prove new relations between the bias of multilinear forms, the correlation between multilinear forms and lower degree polynomials, and the rank of tensors over GF(2)= {0,1}. We show the following results for multilinear forms and tensors. 1. Correlation bounds : We show that a random d-linear form has exponentially low correlation with low-degree polynomials. More precisely, for d ≪ 2^o(k), we show that a random d-linear form f(X_1,X_2, ..., X_d) : (GF(2)^k)^d → GF(2) has correlation 2^-k(1-o(1)) with any polynomial of degree at most d/10. This result is proved by giving near-optimal bounds on the bias of random d-linear form, which is in turn proved by giving near-optimal bounds on the probability that a random rank-t d-linear form is identically zero. 2. Tensor-rank vs Bias : We show that if a d-dimensional tensor has small rank, then the bias of the associated d-linear form is large. More precisely, given any d-dimensional tensor T :[k]×... [k]_d times→ GF(2) of rank at most t, the bias of the associated d-linear form f_T(X_1,...,X_d) := ∑_(i_1,...,i_d) ∈ [k]^d T(i_1,i_2,..., i_d) X_1,i_1· X_1,i_2... X_d,i_d is at most (1-1/2^d-1)^t. The above bias vs tensor-rank connection suggests a natural approach to proving nontrivial tensor-rank lower bounds for d=3. In particular, we use this approach to prove that the finite field multiplication tensor has tensor rank at least 3.52 k matching the best known lower bound for any explicit tensor in three dimensions over GF(2).

## Authors

• 2 publications
• 17 publications
• 5 publications
• 7 publications
• 24 publications
10/26/2017

### Barriers for Rank Methods in Arithmetic Complexity

Arithmetic complexity is considered simpler to understand than Boolean c...
04/08/2019

### More barriers for rank methods, via a "numeric to symbolic" transfer

We prove new barrier results in arithmetic complexity theory, showing se...
10/28/2020

### Matrix and tensor rigidity and L_p-approximation

In this note we make two observations. First: the low-rank approximation...
11/18/2019

### New lower bounds for matrix multiplication and the 3x3 determinant

Let M_〈 u,v,w〉∈ C^uv⊗ C^vw⊗ C^wu denote the matrix multiplication tensor...
02/21/2021

### An Optimal Inverse Theorem

We prove that the partition rank and the analytic rank of tensors are eq...
02/27/2019

### An exponential lower bound for the degrees of invariants of cubic forms and tensor actions

Using the Grosshans Principle, we develop a method for proving lower bou...
04/07/2015

### Tensor machines for learning target-specific polynomial features

Recent years have demonstrated that using random feature maps can signif...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This work is motivated by two fundamental questions regarding “explicit constructions” in complexity theory: finding functions uncorrelated with low degree polynomials, and finding tensors with high tensor rank.

##### Functions uncorrelated with low degree polynomials.

The first question is that of finding an explicit function uncorrelated with low degree polynomials. More concretely, we seek functions such that for every polynomial of degree at most (assume say),

 Prx∈Fn2[f(x)=P(x)]≤12+εn.

It is well known (and easy to prove) that a random function has this property with superpolynomially small (and even exponentially small); the challenge is to find an explicit function .

A solution to this problem will have immediate applications in Boolean circuit complexity. It will give hard-on-average problems for , and via the Nisan-Wigderson hardness vs. randomness technique [NW94], it will give pseudorandom generators against (improving upon analogous results for from the late 1980s). The original motivation for an explicit function with small came from the seminal work of Razborov [Raz87] and Smolensky [Smo87] who showed that any function computable by a sub-exponential sized circuit satisfies and furthermore that the has . The Nisan-Wigderson paradigm [NW94] of pseudorandom generator construction requires explicit functions with exponentially small . The current best known constructions of explicit functions [Raz87, Smo87, BK12, VW08] that cannot be approximated by low-degree polynomials come in two flavors, (a) polynomially small (in fact, ) for large degree bounds ( as large as ) or (b) exponentially small for small degree bounds (). However, we do not know of any explicit function that exhibits exponentially small against low-degree polynomials of polynomially large (or even super-logarithmically large) degree polynomials. For a nice survey on correlation with low degree polynomials, see [Vio09].

##### Tensors with high rank.

The second question is that of finding an explicit tensor of high tensor rank. Tensors are a high-dimensional generalization of (-dimensional) matrices. Just as a matrix of size over a field is given by a map , a tensor of dimension and size is given by a map . A tensor

is said to be of rank one if there exist vectors

such that or equivalently, for all , we have . A tensor is said to be of tensor-rank at most if it can be written as the sum of rank one tensors. We seek tensors with tensor-rank as high as possible.

It is well known (and easy to prove) that a random tensor has tensor rank as large as . The challenge is to find an explicit such with tensor rank larger than . A substantial improvement on this lower bound for any explicit tensor will have immediate applications in arithmetic circuit complexity; for , it will give improved arithmetic circuit lower bounds [Str73], and for large it will give superpolynomial arithmetic formula lower bounds [Raz13, CKSV16]. For general odd , a lower bound of was shown for an explicit tensor by Alexeev et al. [AFT11], while for even , no lower bounds better than the trivial bound are known for any explicit tensor.

Unlike matrix rank, we do not have a good understanding of tensor-rank even for 3-dimensional tensors. For instance, it is known that for a given 3-dimensional tensor over the rationals, the problem of deciding if the rank of is at most is NP-hard [Hås90]. In the case of dimension three, the tensor-rank of very specific tensors like the matrix multiplication tensor [Blä99, Shp03], the finite field multiplication tensor [CC88, STV92] and the polynomial multiplication tensor [BD80, Kam05] has been studied in prior works. For this case, the current best lower bound known for any explicit tensor over is a lower bound of for the finite field multiplication tensor due to Chudnovsky and Chudnovsky [CC88, STV92], which builds on the lower bound result of Brown and Dobkin [BD80] for the polynomial multiplication tensor. For general fields, the best known lower bound for any explicit tensor is for the matrix multiplication tensor due to Bläser [Blä99].

Also relevant to this discussion is a recent result of Effremenko et al. [EGOW17], who showed that a fairly general class of lower bound techniques called rank methods are not strong enough to give lower bounds on tensor rank stronger than . In a nutshell, not only can we not prove good tensor rank lower bounds, we do not even have techniques, which ‘in principle’ could be useful for such lower bounds!

### 1.1 Our results

We make contributions to both the above questions by studying multilinear forms and their bias. A -linear form is a map which is linear in each of its arguments. The bias of a -linear form is defined as follows.

 bias(f):=∣∣Ex1,…,xd∈Fk2[(−1)f(x1,…,xk)]∣∣.

This measures the difference between the probability of output and output . Similarly, the correlation of a -linear form with another function is defined as , which measures the difference between the probabilities (on a random input) that and agree and disagree.

A -linear form can naturally be viewed as a polynomial of degree in variables. We can then ask, for some , is there a -linear form such that the correlation of with every degree polynomial in is small? Knowing the existence of a -linear that achieves this small correlation property gives a significantly reduced search space for finding an explicit with small correlation with lower degree polynomials. Our first result gives a positive answer to this question for a large range of and .

Theorem A (informal).   Let and let . Let . Then with high probability, for a uniformly random -linear form , we have that for all polynomial of degree at most :

 Corr(f,P)≤2−k(1−o(1))=2−nd(1−o(1)).

Moreover, for every -linear form, there is a degree polynomial (namely the constant polynomial) such that

 Corr(f,P)≥Ω(2−k).

For small enough (), the above theorem actually holds with .

An important step towards proving Theorem A is a precise understanding of the distribution of the bias of a random -linear form. Along the way, we give tight upper bounds on the probability that the sum of random rank-1 -dimensional tensors equals .

Previously, a beautiful result of Ben-Eliezer, Lovett and Hod [BHL12] showed that for all , there are polynomials of degree whose correlation with polynomials of degree is . The results are incomparable; the in [BHL12] need not come from a -linear form, and for this more general setting the bound might not be tight, but on the positive side [BHL12] can handle larger while proving correlation bounds against polynomials with degree as large as .

A -linear form can also be naturally viewed as a -dimensional tensor. Indeed, can be completely specified by the tensor of values , as the vary in . We can then ask, are there natural properties of the -linear form which would imply that the tensor rank of is high?

We show that having low bias, which is a simple measure of pseudorandomness for -linear forms, already implies something nontrivial about the tensor rank. We prove a lower bound on the tensor rank in terms of the bias of the form.

Theorem B.   Let be a -linear form. Let be its associated tensor, and let be the rank of . Then

 bias(f)≥(1−12d−1)t.

In particular, if , then

 t≥k⋅log22d−12d−1−1.

Moreover, for every there is a tensor with tensor rank such that the following is true.

 bias(f)≤(1−12d−1)t+d2k.

This lower bound on tensor rank in terms of bias is almost optimal for any fixed . It implies that any explicit -linear form with low bias (such -linear forms are easy to construct) automatically must have tensor rank . Purely from the point of view of proving tensor rank lower bounds for explicit tensors, these results are only interesting in the case of (for larger the implied tensor rank lower bounds fail to beat trivial explicit tensor rank lower bounds).

For , this gives a natural and clean route to proving nontrivial tensor rank lower bounds for explicit tensors. In particular, trilinear forms with nearly minimal bias of of must have tensor rank at least (which happens to be tight). A finer analysis of our arguments shows that trilinear forms with exactly minimal bias of , such as the finite field multiplication tensor, have tensor rank , thus matching the best known explicit tensor rank lower bound for -dimensional tensors [BD80, CC88, STV92]. It also immediately implies that the matrix multiplication tensor has tensor rank , which is nontrivial (but still far from the best known bound of  [Shp03, Blä99]).

### 1.2 Methods

Underlying our main results, Theorem A and Theorem B, are two related combinatorial bounds involving rank- -linear forms. We now state these bounds for the special case of . For , let . Let be the trilinear form defined as

 Pi(u,v,w)=⟨u,xi⟩⋅⟨v,yi⟩⋅⟨w,zi⟩.

Now, consider the trilinear form given by

 P(u,v,w)=t∑i=1Pi(u,v,w).

Then, we have the following.

1. If are picked uniformly at random from , then the probability that is identically is very small. Concretely,

 Prxi,yi,zi[P≡0]

is about , provided . This bound is essentially optimal.

2. For arbitrary , the bias of is large. Concretely,

 minxi,yi,zi[bias(P)]≥(3/4)t.

This bound is also essentially optimal.

We now give an outline of the proofs of Theorem A and Theorem B.

The proof of Theorem A follows the high-level outline of [BHL12]

. We first use the method of moments to show that for a fixed

-variate polynomial of degree , the correlation of a random -linear with is small with extremely high probability. Then, by a union bound over all , we conclude that a random is uncorrelated with all with quite high probability.

Implementing this approach gives rise to some natural and interesting questions about rank-1 tensors. How many rank-1 tensors can lie in a given low dimensional linear space of tensors? Given a collection of random rank-1 tensors, what is the probability that the dimension of the space spanned by them is small? What is the probability that the sum of random rank-1 tensors equals ? We investigate these questions using linear-algebraic ideas, and obtain near-optimal answers for all of them.

For example, the case requires us to study the probability that

 t∑i=1xi⊗yi⊗zi=0.

By some simple manipulations, this reduces to bounding the probability that the linear space of matrices

 span{xi⊗yi:i∈[t]}

has dimension . We bound this by studying the probability that lies in the linear space

 span{xj⊗yj:j∈[i−1]}.

This final probability is bounded using the following general theorem.

Lemma.   For any linear space of dimension , the probability that is at most .

The proof of this lemma is hands on, and uses basic linear algebra and some elementary analytic inequalities. The key is to take an echelon form basis for . We use this basis to understand which are “important”; i.e., they have the property that with noticeable probability for a random .

The above lemma is essentially tight: with and being tight examples. The sets of the important in these two examples look very different. Because of this, our final proof involves proving tight upper bounds on an analytic maximization problem that has multiple very different global maxima.

For Theorem B, which gives a relationship between tensor rank and bias, the proof proceeds in the contrapositive. We show that any -linear form whose underlying tensor has low rank must have high bias. Let us illustrate the underlying ideas in the case of . Here, we are given the -linear form , defined as

 P(u,v,w)=t∑i=1⟨xi,u⟩⋅⟨yi,v⟩⋅⟨zi,w⟩.

We want to show that this has high bias if is small. The key claim that we show is the following.

Lemma.   Let . For at least fraction of the pairs , we have that for all :

 ⟨v,yi⟩⋅⟨w,zi⟩=0.

For any fixed , the set of satisfying the above is the union of two codimension hyperplanes in , and thus a random satisfies it with probability . The above lemma shows that the probability of all these events happening together is at least as large as it would have been had they been independent.

## 2 Preliminaries

Unless otherwise stated, we always work over the field . We use capital etc. to denote formal variables or sets of formal variables, and small letters to denote instantiations of these formal variables.

For integers , denote by the set of all degree multilinear polynomials in , where is a variable set. Note that every naturally corresponds to a unique map .

### 2.1 Bias and Correlation

Two fundamental notions used in this paper are those of bias and correlation, which we now define.

###### Definition 2.1 (Bias).

Bias of a function is defined as

 bias(f):=∣∣Ex∈Fn2(−1)f(x)∣∣.

The bias of an -valued function is defined as , where is the standard map from to .

###### Definition 2.2 (Correlation).

We define the correlation between two functions , by

 Corr(f,g):=bias(f−g).

Given a function , we will be interested in its maximum correlation with low degree polynomials. Towards this we define

 Corr(f,d):=maxg∈Poly(n,d)Corr(f,g).

More generally, given a class of functions, we define

 Corr(f,C):=maxg∈CCorr(f,g).

### 2.2 Tensors and d-linear forms

Tensors are generalizations of matrices to higher dimensions.

###### Definition 2.3 (Tensors and Tensor rank).

Let and be natural numbers. A dimensional tensor of size over a field is a map . is said to be of rank one if there exist vectors such that for every , . The rank of is the minimum such that can be written as a sum of rank one tensors.

Every matrix can be naturally associated with a bilinear polynomial, and in some cases, one can study the properties of this bilinear polynomial as a proxy of studying various properties of the matrix itself. This paradigm also generalizes to tensors, as the following definition indicates.

###### Definition 2.4 (Tensors as Multilinear Forms).

Let be a dimensional tensor. Then, the set-multilinear polynomial associated with is the polynomial in variables over defined as follows.

 fT(X1,1,X1,2,…,Xd,k)=∑(i1,i2,…,id)∈[k]dT(i1,i2,…,id)⋅d∏j=1Xj,ij.

Given the above association between -dimensional tensors and -linear forms, we will use the terms tensor and -linear form interchangeably.

### 2.3 Some explicit tensors

We now define some explicit tensors which we use at various places in this paper. We start with the trace function.

#### 2.3.1 Trace tensor

###### Definition 2.5.

is the -linear map defined as follows.

 Trace(α)=α+α2+…+α2k−1.

The map will be useful for us as we define the candidate hard tensor for our lower bounds.

###### Definition 2.6.

Let be the function defined as follows.

 Tr(X,Y,Z):=Trace(XYZ),

where denotes multiplication over the larger field when are viewed as encodings of elements in .

Since is an -linear map, the function can be viewed as a -linear polynomial in the variables . For the rest of this paper, when we say , we refer to this natural -linear polynomial and the three dimensional tensor associated with it. We remark that, upto change of basis, this is the finite field multiplication tensor, which was analyzed by Chudnovsky-Chudnovsky [CC88] and Shparlinksi-Tsfasman-Vladut [STV92].

#### 2.3.2 Matrix multiplication tensor

###### Definition 2.7.

The tensor corresponding to the product of two matrices is defined as

 Mn(X,Y,Z)=n∑i=1n∑j=1n∑k=1Xi,jYj,kZi,k.

Here, .

Note that is the trace of the matrix product . In other words, . Note this is the matrix trace and is different from the trace function considered in the previous section where we viewed as elements of the large field.

## 3 Correlation of random d-linear forms

In this section, we study the correlation of random -linear forms with lower degree polynomials.
Our main result in this section is the following theorem, which states that a random -linear form is uncorrelated with degree polynomials under certain conditions.

###### Theorem 3.1.

Let be integers such that , and . Set .

Pick a uniformly random -linear form . Then, with probability , has the following property. For all polynomials with degree at most , we have,

 Corr(f,P)<2−(1−o(1))n/d.

Along the way, we develop several tools to understand the bias of random -linear forms. For example, we show that a random -linear form is unbiased with extremely high probability.

###### Theorem 3.2.

Let be fixed. Let be integers with , and consider a uniformly random -linear form . Then,

 Pr[bias(f)≥2−(1−ε)k]≤2−Ω(ε2kd).
###### Remark 3.3.

Note that any -linear form vanishes if any one of the block of variables is zero. Hence, the bias of any -linear form (or equivalently its correlation with the constant 0 polynomial) is at least . Theorem 3.2 states that it is extremely unlikely for a random -linear form to have even slightly more bias while Theorem 3.1 states that it is extremely unlikely for a random -linear form to have slightly better correlation with any degree polynomial.

The key ingredient in the proofs of the above theorems is the following theorem on the distribution of the sum of random rank- tensors.

###### Theorem 3.4.

Let be a constant. Let be integers with , and . Let

be picked independently and uniformly distributed in

.Then,

 Pr[t∑i=1d⨂j=1x(i,j)=0]≤2−(1−ε/2)⋅kt.
###### Remark 3.5.

If any block of vectors (say wlog. , the first block of vectors) are all (this happens with probability ), then the -dimensional linear form . The above theorem states that the probability of the -linear form vanishing is not significantly larger.

In turn, the proof of the above theorem is based on the following lemma, which gives an upper bound on the probability that a random rank- tensor lies in a fixed low dimensional subspace.

###### Lemma 3.6.

Let be integers and be a subspace of of dimension . Let be picked independently and uniformly at random, and let . Then,

 Pr[T∈U]≤d2k+2u/kd−12k.
###### Remark 3.7.

Let where is a -dimensional subspace of . Note, . Clearly, . The above lemma states that the probability is not significantly larger than this for any other .

In the next subsection, we show how Theorem 3.1 and Theorem 3.2 follow from Theorem 3.4. After that, we prove Theorem 3.4 by studying the distribution of the dimension of a collection of random rank tensors.

### 3.1 Proofs of Theorem 3.1 and Theorem 3.2

We first prove Theorem 3.2.

###### Proof of Theorem 3.2.

We want to bound . We shall do so by bounding the moment of for a suitable choice of and applying Markov’s inequality.

Let denote the tensor associated with . Thus are all independent and uniformly distributed in .

We now compute the moment of .

 Ef[(bias(f))t] =Ef[(Ex(1),…,x(d)∼Fk2[(−1)f(x(1),…,x(d))])t] =Ef⎡⎣∏i∈[t](Ex(i,1),…,x(i,d)∼Fk2[(−1)f(x(i,1),…,x(i,d))])⎤⎦ =E{x(i,j)}i∈[t],j∈[d][Ef[(−1)∑ti=1f(x(i,1),…,x(i,d))]] =E{x(i,j)}i∈[t],j∈[d]⎡⎢⎣∏(ℓ1,…,ℓd)∈[k]d1∑ti=1∏dj=1x(i,j)ℓj=0⎤⎥⎦ =E{x(i,j)}i∈[t],j∈[d][1∀(ℓ1,…,ℓd)∈[k]d, ∑ti=1∏dj=1x(i,j)ℓj=0] =Pr{x(i,j)}i∈[t],j∈[d][∀(ℓ1,…,ℓd)∈[k]d, t∑i=1d∏j=1x(i,j)ℓj=0] =Pr{x(i,j)}i∈[t],j∈[d][t∑i=1d⨂j=1x(i,j)=0].

Setting , Theorem 3.4 tells us that

 Ef[(bias(f))t]=2−(1−ε/2)kt.

Using Markov’s inequality,

 Prf[bias(f)≥2−(1−ε)k]≤2−(1−ε/2))kt2−(1−ε)kt≤2−εkt/2≤2−Ω(ε2kd)

as claimed.

We now use a similar argument to prove Theorem 3.1.

###### Proof of Theorem 3.1.

Fix an arbitrary . Let denote the space of degree polynomials in . We want to show that with high probability over the choice of , we have that for every , .

Fix and consider the moment of . Imitating the proof of Theorem 3.2, we get

 Ef[(bias(f−P))t] =E{x(i,j)}i∈[t],j∈[d][(−1)∑ti=1P(x(i,1),…,x(i,d))⋅1∀(ℓ1,…,ℓd)∈[k]d, ∑ti=1∏dj=1x(i,j)ℓj=0] ≤E{x(i,j)}i∈[t],j∈[d][1∀(ℓ1,…,ℓd)∈[k]d, ∑ti=1∏dj=1x(i,j)ℓj=0] =Pr[t∑i=1d⨂j=1x(i,j)=0].

Now we will apply Theorem 3.4. Observe that since , we have,

 d<2εk/5.

As in the proof of Theorem 3.2, we set , invoke Theorem 3.4 and apply Markov’s inequality to get,

 Prf[bias(f−P)≥2−(1−ε)k]≤2−ε2kd/20.

Now . Thus, by a union bound over all , we have the following.

 Prf[Corr(f,C)≥2−(1−ε)k]≤|C|⋅2−ε2kd/20. (1)

It remains to estimate

. We show below that . The proof of this lemma works for any other as long as satisfies . Note that . Let denote .

 (n≤ℓ) ≤(n≤d/2)≤(2end)d/2≤(2eδ)δn/2 =o((1δ)δn)[Since δ=o(1)] =o(kd).

Combining this with Equation (1), we get,

 Prf[Corr(f,C)≥2−(1−ε)k]≤2o(kd)⋅2−ε2kd/20.

Since this holds for every , we get the desired result. ∎

### 3.2 Random rank-1 tensors

In this subsection, we first prove Lemma 3.6 on the probability that a random rank- tensor lies in a fixed low-dimensional subspace. We then give a corollary of this lemma which bounds the probability that a collection of random rank- tensors spans a very low dimensional subspace. This corollary will be used in the proof of Theorem 3.4.

###### Proof of Lemma 3.6.

Define

 fd,k(u)=(1−(1−12k)d−1)+(1−12k)d−1⋅2u/kd−12k.

We will prove, by induction on , the following stronger bound.

 Pr[T∈U]≤fd,k(u).

The fact that this implies the lemma, follows from the observations that and that .

##### Base case.

The case is trivial (using the observation that ). We now show the statement holds for larger .

##### Induction step.

Let . We will view as . Every element of can thus be written as a tuple , where each is an element of (thus the coordinates are partitioned into blocks of coordinates, with each block having coordinates). We let be the th projection map, mapping to .

With this convention, we take a basis for in row echelon form. Concretely, this gives us a basis for , such that is a disjoint union of ( is the set of basis vectors pivoted in the ’th block of coordinates), such that,

• for all and , ,

• the vectors , as varies in , are linearly independent.

Define . Thus we have and

 k∑j=1dim(Uj)=dim(U).

For , we define a linear map by defining on a basis for :

 ψij(πj(v))=πi(v), ∀v∈Bj.

Then we have the following basic claim (which follows immediately from the above echelon form representation of ).

###### Claim 3.8.

Let . Then only if there exists such that for each we have

 πi(v)=ui+∑j

To simplify notation, we will denote by and by . We want to find an upper bound on .

###### Claim 3.9.

Let and , then,

 Pry∈Fk2[y⊗~z∈U]≤2|S|2k.
###### Proof.

For fixed

, given the random variable

, we define random variables by: . Note that . Also note that is only a function of . By creftype 3.8, only if for all , .

 Pry∈Fk2[y⊗~z∈U] ≤Pry[∀i≤k,ui∈Ui] =k∏i=1Pr[ui∈Ui|u1∈U1,…,ui−1∈Ui−1] =k∏i=1Eu1∈U1,…,ui−1∈Ui−1[Prui[ui∈Ui|u1,…,ui−1]] =k∏i=1Eu1∈U1,…,ui−1∈Ui−1[Prui[πi(v)−∑j