 # Sharp variance-entropy comparison for nonnegative gaussian quadratic forms

In this article we study quadratic forms in n independent standard normal random variables. We show that among nonnegative quadratic forms, a diagonal form with equal coefficients maximizes differential entropy when variance is fixed. We also prove that differential entropy of a weighted sum of i.i.d. exponential random variables with nonnegative weights is maximized, under fixed variance, when the weights are equal.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

For a random variable with density its Shannon differential entropy is defined by the formula , provided that this integral converges, with the convention that . It is a classical fact that if

is a random variable with finite second moment and

is a Gaussian random variable satisfying . Thus, a Gaussian random variable maximizes entropy under fixed variance (note that even if has finite second moment, the integral in the definition of may diverge to , but never to ). This statement can be rewritten in the form of a variance-entropy comparison as follows: for any random variable with finite second moment one has , see e.g. Theorem 8.6.5 in . Due to the Pinsker-Csiszár-Kullback inequality, see [11, 5, 8], one has , whenever is a Gaussian random variable with the same mean and variance as the random variable . Here stands for the total variation distance. Hence the quantity is a strong measure of closeness to Gaussianity. In fact we have , where

is the so-called Kullback–Leibler divergence (or relative entropy).

###### Definition 1.

Let be a standard

Gaussian random vector in

( stands for the identity matrix). For a symmetric real matrix we define . The random variable is called a Gaussian quadratic form (in variables). If is additionally positive semi-definite, then is called a nonnegative Gaussian quadratic form.

Our main result reads as follows.

###### Theorem 1.

Let be a nonnegative Gaussian quadratic form. Then

 h(XA)≤h(χ2(n))+12lnVar(XA)−12ln(2n)

with equality if and only if for some . Here

is a random variable with a chi-square distribution with

degrees of freedom. Equivalently, if is a Gaussian random variable with the same variance as , then .

In this article denotes the standard Euclidean norm and stands for the standard scalar product in . By we denote the unit Euclidean sphere centered at the origin. We also take . By we denote equality in distribution of random variables. We also implicitly assume that in our abstract statements all integrals and expected values are well-defined and may have values . Those statements are then used in very concrete settings where those quantities are easily seen to be well-defined and finite.

###### Remark 1.

Theorem 1 shows that, in a sense of relative entropy, a Gaussian random variable cannot be approximated by a nonnegative Gaussian quadratic form too well, that is, if , then .

As we shall explain in Section 2, rotation invariance of the standard Gaussian random vector in allows us to reduce Theorem 1 to the case of diagonal quadratic forms with nonnegative entries. The study of these diagonal forms is the core of our investigation. We shall prove the following proposition.

###### Proposition 1.

Let be i.i.d. Gaussian random variables. Then for any nonnegative real numbers satisfying one has

 h(n∑i=1dig2i)≤h(1√nn∑i=1g2i),

with equality if and only if .

Proposition 1 gives a solution to a particular instance of a more general problem.

###### Problem 1.

For a given sequence of i.i.d. random variables with finite second moment find the maximum of the function . What if is replaced with ?

Note that if are i.i.d. then and hence in the above problem we are looking for the maximum of entropy of weighted sums of i.i.d. random variables under fixed variance. Let us now discuss the state of the art of Problem 1. In the celebrated article  Artstein, Ball, Barthe and Naor showed that if is a sequence of i.i.d. random variables with variance and , then the sequence is nondecreasing. The convergence of this sequence to , where is a standard Gaussian random variable, was established much earlier by Barron in , under minimal conditions that for at least one (see also the work  of Linnik for some partial results). In view of these results the following natural question arises: is it always true that the maximum in Problem 1 is achieved when ? Unfortunately, the answer to this question is negative even for symmetric random variables in the case , as shown in . In fact, solving Problem 1 is a difficult and complex issue even for the simplest random variables . As an example, let us mention the case of

being uniformly distributed in

(see Question 3 in Section 6), in which case it is believed that the maximum is attained for equal coefficients, but as far as we know it has not yet been proven. The only general result that we are aware of is Theorem 8 in , where the problem was solved in the case of being i.i.d. Gaussian mixtures, that is, random variables of the form , where random variables and random variables are independent. In fact, the authors showed a stronger statement: if are i.i.d. Gaussian mixtures and in the Schur order, then . Let us recall that the definition of the Schur order is that for vectors and with nonnegative entries we have iff for all with equality for , where and are nonincreasing rearrangements of the sequences and . Note that for any we have , which shows that indeed in this case gives the maximum in Problem 1, whereas , gives the minimum. The latter is in fact true not only for Gaussian mixtures, but for any arbitrary i.i.d. random variables , which is an easy consequence of the famous entropy power inequality of Shannon and Stam (see [12, 13]) in the following linearized form: if the real numbers satisfy , then for a sequence of independent random variables one has .

Proposition 1 provides a solution to Problem 1 in the case , where are independent random variables. It turns out that our method that we call the method of intersecting densities (recently introduced in  in the context of moments of log-concave random variables) can also be applied to tackle the case of being independent one-sided exponential random variables with parameter (in fact, in this case the proof is slightly easier, and thus we shall present it before the proof of Proposition 1).

###### Proposition 2.

Suppose are i.i.d. random variables with densities . Then for any nonnegative satisfying we have

 h(n∑i=1diXi)≤h(1√nn∑i=1Xi),

with equality if and only if .

This article is organized as follows. In Section 2 we show how Proposition 1 implies Theorem 1. In Section 3 we describe our key method. The proof of Proposition 2 is given in Section 4, whereas Section 5 is devoted to the proof of Proposition 1. Finally, in Section 6 we present some open problems.

## 2. Reduction to diagonal quadratic forms

We begin with the following simple lemma.

###### Lemma 1.

Let be a Gaussian quadratic form in variables and let be an orthogonal transformation in . Then has the same distribution as . In particular, every Gaussian quadratic form has the same distribution as a certain Gaussian quadratic form with being diagonal. If additionally was assumed to be nonnegative, then the associated diagonal matrix has nonnegative entries.

###### Proof.

Let . Note that because of rotation invariance of , the random vector has the same distribution as . We have , which has the same distribution as . To prove the second part it suffices to observe that every symmetric matrix is diagonalizable by a certain orthogonal change of basis . If the matrix is positive semi-definite, then the resulting diagonal matrix clearly has nonnegative entries.

###### Lemma 2.

Let be a Gaussian quadratic form. Then and .

###### Proof.

By Lemma 1 and by the invariance of and under matrix similarity, the statement is invariant under the transformation

for any orthogonal matrix

. We can therefore assume that is diagonal. In this case , where are some real numbers and are i.i.d. random variables. Clearly, . Moreover,

 EX2A =E(n∑i=1aiig2i)2=n∑i=1a2iiEg4i+∑i≠jaiiajjEg2ig2j=3n∑i=1a2ii+∑i≠jaiiajj =2n∑i=1a2ii+(n∑i=1aii)2=2tr(A2)+(tr(A))2.

Hence, .

Now we show how Proposition 1 implies Theorem 1.

###### Proof of Theorem 1.

Thanks to Lemma 1, we can assume that is diagonal, that is, for some . Since for any random variable and any non-zero real number one has and , the statement is invariant under scaling of . Thus, one can also assume that . In this case, due to Lemma 2, one has . Hence, Proposition 1 yields that . The equality cases follow easily from Lemma 1 and equality cases in Proposition 1. ∎

## 3. General strategy & the method of intersecting densities

We begin by recalling the following standard bound for the entropy.

###### Lemma 3.

Suppose

are probability densities of random variables

and , respectively. Take . Then

• , that is, ,

• if , then .

###### Proof.

(a) We can assume that the support of is contained in the support of (otherwise the right-hand side is and there is nothing to prove). Since for we have , one gets

 −∫plnp+∫plnq=∫supp(p)(plnq−plnp)=∫supp(p)pln(q/p)≤∫supp(p)p(q/p−1)≤0.

(b) From part (a) we have . ∎

In our proof of Proposition 1, in order to verify the assumption of Lemma 3(b), we shall use a trick that we call the method of intersecting densities. The next lemma describes this crucial idea. Let us first introduce the following definition.

###### Definition 2.

Let be measurable function. We say that changes sign at point if one of the following condition holds:

• there exist such that and is positive a.e. on , nonpositive a.e. on and negative on some subset of of positive measure;

• there exist such that and is negative a.e. on , nonnegative a.e. on and positive on some subset of of positive measure.

We call such the sign change point of . If has precisely sign change points, then we say that changes sign exactly times.

Let us observe that if all sign change points of are , then is either nonpositive a.e. on or nonnegative a.e. on .

###### Lemma 4.

Suppose , where are arbitrary real numbers and . Suppose also that are real random variables with densities and supported in , such that , , and the function changes sign exactly three times and is positive a.e. before the first sign change point. Then .

###### Proof.

Our goal is to prove the inequality . Because of our assumptions, we have for . Our desired inequality is therefore equivalent to

 (1)

where are arbitrary real numbers. A crucial step now is to explore the freedom of the choice of these three numbers. We know that changes sign exactly three times at some points . We choose so that for . This can be done because the matrix , associated to the system of linear equations that have to satisfy, is a Vandermonde matrix.

Let . We now show that the integrand in (1) is nonnegative, which will clearly finish the proof (the obtained inequality will be strict because it will also easily follow that this integrand is not an a.e. zero function). We already know that and that changes sign at and is positive a.e. before the first sign change point . Since close to the function is positive (note that ), it is enough to show that also changes its sign at and that these are the only sign change points of this function.

To show this we observe that the function has the form for some real numbers . This function is clearly smooth on . It is enough to show that has only three zeros and none of them is a zero of (then we easily conclude that the zeros correspond to sign changes). Suppose that has more than three zeros, counting multiplicities ( is a zero of multiplicity if for , where is the th derivative of , with the convention that ). Since itself has at least three distinct zeros, by Rolle’s theorem we deduce that has at least three distinct zeros. But for we have Thus, the equation is equivalent to the quadratic equation , which cannot have more than two solutions (unless vanishes identically, which clearly does not hold in our case as ). We arrived at a contradiction. ∎

In Lemma 4 we assumed that changes sign exactly three times and that and . Our next lemma shows that the conditions and are enough to guarantee that changes sign at least three times.

###### Lemma 5.

Let be integers and let be measurable. Suppose that changes sign at exactly points. Assume moreover that for all . Then .

###### Proof.

We prove the lemma by contradiction. Assume that . Let be the sign change points of . From our assumption, for every polynomial of degree at most one has . Let us take and . We have . On the other hand, does not change sign since changes sign exactly at the same points as . Since is not identically zero, we get , contradiction. ∎

###### Corollary 1.

Suppose are real random variables with densities and , such that and . Then the function changes sign at least three times.

###### Proof.

It is enough to apply Lemma 5 with and . ∎

## 4. Proof of Proposition 2

###### Lemma 6.

Suppose that are i.i.d. random variables having values in the interval and having strictly positive density on , where . Let be a measurable function. Suppose that for every , every and for all satisfying we have

 (2) EΦ(s+d1Y1+d2Y2)≤EΦ(s+c1Y1+c2Y2).

Then whenever satisfy in the Schur order, then

 (3) EΦ(n∑i=1diYi)≤EΦ(n∑i=1ciYi).

If the inequality (2) is always strict and is not a permutation of , then (3) is also strict.

In particular, for we have

 (4) EΦ(Y1)≤EΦ(n∑i=1diYi)≤EΦ(1√nn∑i=1Yi).

Moreover, if (2) is always strict, then the left inequality in (4) is strict whenever has at least two non-zero coordinates, whereas the right inequality in (4) is strict whenever .

###### Proof.

We first show that if satisfy and , then is well-defined, that is, is a.s. satisfied. Indeed,

 s+d1Y1+d2Y2 >−l√(n−2)(1−δ2)−l(d1+d2)≥−l√(n−2)(1−δ2)−l√2(d21+d22) =−l(√n−2⋅√1−δ2+√2δ)≥−l√n−2+2⋅√1−δ2+δ2=−l√n,

where the last inequality results from the Cauchy-Schwarz inequality.

We now consider a random variable , where . Observe that a.s.

 S>−ln∑i=3di≥−l ⎷(n−2)n∑i=3d2i=−l√(n−2)(1−δ2),

again by Cauchy-Schwarz inequality. After substituting for in (2) and taking expectation with respect to we get

 (5) EΦ(n∑i=1diYi)≤EΦ(c1Y1+c2Y2+n∑i=3diYi),

for satisfying . The inequality (5) is strict if (2) is strict for every . The inequality (3) (together with its strict version) follows from (5) by using the standard fact that if are two vectors with nonnegative coordinates, then can be obtained from by applying a finite sequence of operations of the form

 T(λ)j,k(z1,…,zn)=(z1,…,zj−1,λzj+(1−λ)zk,zj+1,…,zk−1,λzk+(1−λ)zj,zk+1,…,zn),

where and , see  Chapter 2, Lemma B.1. The inequality (5) with and shows that after applying (and, by symmetry, also every , ) to the vector of squares of coordinates, the corresponding expectation cannot decrease (in the case we clearly have equality). Moreover, it increases if (so that ) whenever (5) is strict. The desired inequality is therefore obtained by applying finitely many such intermediate inequalities. Note that if the inequality (2) is always strict and is not a permutation of , then (3) is strict, since in this case we would need to do the operation with at least once.

The last part follows from (3) (and its strict version) by observing that for every one has . ∎

###### Remark 2.

In fact, for our purposes we only need to know that under the assumptions of Lemma 6 we have for . This can be alternatively obtained as follows. Start with the vector . Suppose that not all the numbers are equal. Then, by symmetry, we can assume that . By exchanging the pair for , where and , and applying (5), we can see that by considering instead of we can increase the number of coordinates equal to , while not decreasing the corresponding expectation. After at most such steps we would arrive at the vector .

The next lemma is borrowed from . We prove it here just for completeness.

###### Lemma 7.

Suppose that , are real numbers. Then the function is either identically zero or it has at most zeroes in the interval .

###### Proof.

We proceed by induction on . The statement is trivial for . Assume that the assertion is true for and, without loss of generality, that is not of the form . The equation is equivalent to where is non-constant. To prove our assertion by contradiction, suppose that the latter has more than solutions in . Then Rolle’s theorem shows that the function

 ~h′(t)=(b2−b1)a2tb2−b1−1+⋯+(bn−b1)antbn−b1−1,

which is not identically zero, has at least zeros. This contradicts the inductive hypothesis.

###### Lemma 8.

Let be independent random variables with densities and let . Suppose satisfy . Let and be the densities of and , respectively. Then changes sign exactly three times and is positive before the first sign change point.

###### Proof.

By our assumptions, and . By Corollary 1, the function changes sign at least three times. We shall show that it changes sign at most three times. We will consider four cases.

Case 1: , . A standard computation shows that

 fU=e−x+d1+d2d2−e−x+d1+d2d1d2−d11[−(d1+d2),∞)(x),fV=e−x+c1+c2c2−e−x+c1+c2c1c2−c11[−(c1+c2),∞)(x).

The inequality together with the constraint implies that . The function is supported in and is continuous. On we have . Therefore, the function does not change sign in the interval . On we have for some real numbers , . Since , from Lemma 7 it follows that this function has at most zeros in the real line (use the change of variables ). Thus, it has at most sign changes.

Case 2: , . We provide a direct proof even though this case follows from Case 1 via continuity argument. The density of is now equal to . The function is again continuous. The proof is similar to the proof in Case 1, except for the fact that now on the function has the form , where are distinct. It is enough to show that it has at most distinct zeros. Suppose that this function has at least distinct zeros. Then the same can be said about . Thus, by Rolle’s theorem, has at least zeros. But is a sum of two exponential functions, which has exactly one zero, contradiction.

Case 3: , . In this case , so on the function is of the form , where are distinct. From Lemma 8, this function can have at most distinct zeros. Thus, has at most zeros in . Again, our function is positive on . We already know that we must have at least three sign change points. The additional third sign change point is therefore the point (note that in the present case is the only discontinuity point of in ).

Case 4: , . Now the function on has the form