Entropy versus variance for symmetric log-concave random variables and related problems

11/01/2018 ∙ by Mokshay Madiman, et al. ∙ University of Delaware University of Warsaw Carnegie Mellon University 0

We show that the uniform distribution minimises entropy among all symmetric log-concave distributions with fixed variance. We construct a counter-example regarding monotonicity and entropy comparison of weighted sums of independent identically distributed log-concave random variables.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

It is a classical fact going back to Boltzmann [8] that when the variance of a real-valued random variable is kept fixed, the differential entropy is maximized by taking to be Gaussian. As is standard in information theory, we use the definition of Shannon [20]: the differential entropy (or simply entropy

, henceforth, since we have no need to deal with discrete entropy in this note) of a random vector

with density is defined as

provided that this integral exists, this definition having a minus sign relative to Boltzmann’s -functional. It is easy to see that if one tried to minimize

the entropy instead of maximizing it, there is no minimum among random variables with densities– indeed, a discrete random variable with variance 1 has differential entropy

, and densities of probability measures approaching such a discrete distribution in an appropriate topology would have differential entropies converging to

as well. Nonetheless, it is of significant interest to identify minimizers of entropy within structured subclasses of probability measures. For instance, it was observed independently by Keith Ball (unpublished) and in [7] that the question of minimizing entropy under a covariance matrix constraint within the class of log-concave measures on

is intimately tied to the well known hyperplane or slicing conjecture in convex geometry.

More generally, log-concave distributions emerge naturally from the interplay between information theory and convex geometry, and have recently been a very fruitful and active topic of research (see the recent survey [19]). A probability density on is called log-concave if it is of the form for a convex function

. Our first goal in this note is to establish some sharp inequalities relating the entropy (and in fact, the more general class of Rényi entropies) to moments of log-concave distributions. Our second goal is to falsify a conjecture about Schur-concavity of weighted sums of i.i.d. random variables from a log-concave distribution, motivated by connections to the monotonicity of entropy in the central limit theorem as well as by questions about capacity-achieving distributions for additive non-Gaussian noise channels. Finally we make an unconnected remark about a variant for complex-valued random variables of a recent entropy power inequality for dependent random variables of Hao and Jog

[13].

First, we show that among all symmetric log-concave probability distributions on

with fixed variance, the uniform distribution has minimal entropy. In fact, we obtain a slightly more general result.

Theorem 1.

Let be a symmetric log-concave random variable and . Then,

with equality if and only if is a uniform random variable.

It is instructive to write this inequality using the entropy power , in which case it becomes

In the special case corresponding to the variance, we have the sandwich inequality

with both inequalities being sharp in the class of symmetric log-concave random variables (the one on the left, coming from Theorem 1

, giving equality uniquely for the uniform distribution, while the one on the right, coming from the maximum entropy property of the Gaussian, giving equality uniquely for the Gaussian distribution.) Note that

, so the range of entropy power given variance is quite constrained for symmetric log-concave random variables.

We note that Theorem 1

can be viewed as a sharp version in the symmetric case of some of the estimates from

[15]. However, finding the sharp version is quite delicate and one needs significantly more sophisticated methods, as explained below. For related upper bounds on the variance in terms of the entropy for mixtures of densities of the form , see the recent work [9].

Our argument comprises two main steps: first we reduce the problem to simple random variables (compactly supported, piecewise exponential density), using ideas and techniques developed by Fradelizi and Guedon [12] in order to elucidate the sophisticated localization technique of Lovász and Simonovits [14]. Then, in order to verify the inequality for such random variables, we prove a two-point inequality.

To motivate our next result, we first recall the notion of the Schur majorisation. One vector in is majorised by another one , usually denoted , if the nonincreasing rearrangements and of and satisfy the inequalities for each and . For instance, any vector with nonnegative coordinates adding up to is majorised by the vector and majorises the vector . It was conjectured in [4] that for two independent identically distributed log-concave random variables the function is nondecreasing on . A way to generalise this is to ask whether for any i.i.d. copies of a log-concave random variable, we have , provided that . This property (called Schur-concavity in the literature) holds for symmetric Gaussian mixtures as recently shown in [11]. We suspect it holds for uniform random variables (see also [1]), but here we prove it does not hold in general for log-concave symmetric distributions for the following reason: since , if it held, then the sequence would be nondecreasing and as it converges to , where is an independent Gaussian random variable with the same variance as , we would have in particular that . We construct examples where the opposite holds.

Theorem 2.

There exists a symmetric log-concave random variable with variance such that if are its independent copies and is large enough, we have

where is a standard Gaussian random variable, independent of the .

Our proof is based on sophisticated and remarkable Edgeworth type expansions recently developed in [5] en route to obtaining precise rates of convergence in the entropic central limit theorem.

Theorem 2 can be compared with the celebrated monotonicity of entropy property proved in [3] (see [16, 21, 10] for simpler proofs and [17, 18] for extensions), which says that for any random variable the sequence is nondecreasing, where are its independent copies.

Theorem 2 also has consequences for the understanding of capacity-achieving distributions for additive noise channels with non-Gaussian noise.

Our theorem also provides an example of two independent symmetric log-concave random variables and with the same variance such that , where is a Gaussian random variable with the same variance as and , independent of them, which is again in contrast to symmetric Gaussian mixtures (see [11]). It is an interesting question, already posed in [11], whether in general, for two i.i.d. summands, swapping one for a Gaussian with the same variance, increases entropy.

Our third result is a short remark regarding a recent inequality by Hao and Jog (see their paper [13] for motivation and proper discussion): if is an uncoditional random vector in , then . Recall that a random vector is called unconditional if for every choice of signs , the vector has the same distribution as . We remark that a complex analogue of this inequality also holds and the proof is essentially trivial thanks to existence of complex matrices with determinant and all entries with modulus .

Theorem 3.

Let be a random vector in which is complex-unconditional, that is for every complex numbers such that for every , the vector has the same distribution as . Then

In the subsequent sections we present proofs of our theorems and provide some additional remarks.

2. Proof of Theorem 1

Let

be the set of all even log-concave probability density functions on

. Define for the following functionals: entropy,

and -th moment,

Our goal is to show that

Reduction

Bounded support

First we argue that it only suffices to consider compactly supported densities. Let be the set of all densities from which are supported in the interval . Given , by considering , which is in , and checking that and tend to and , we get

This last infimum can be further rewritten as

Consequently, to prove Theorem 1, it suffices to show that for every , we have

Degrees of freedom

We shall argue that the last infimum is attained at desities which on are first constant and then decrease exponentially. Fix positive numbers and and consider the set of densities . We treat as a subset of which is a locally convex Hausdorff space (later on, this will be needed to employ Krein-Milman type theorems).


Step I. We show that is finite and attained at a point from the set of the extremal points of .

Let us recall that a set is an extremal subset of ( is a vector space) if it is nonempty and if for some , we have for some elements and , then both and are in . Notice that this definition does not require the convexity of . Moreover, is an extremal point of , if is extremal. We remark that for a convex function , the set of points where its supremum is attained (if nonnempty) is an extremal subset of (for instance, see Lemma 7.64 in [2]).

An application of Zorn’s lemma together with a separation type theorem shows that every nonempty compact extremal subset of of a locally convex Hausdorff vector space contains at least one extremal point (see Lemma 7.65 in [2]). Therefore, it remains to show

  • ,

  • the set of the extremisers of entropy,

    is nonempty and compact.

First we note a standard lemma.

Lemma 4.

For every even log-concave function , we have

Proof.

By homogeneity we can assume that . Consider such that . By log-concavity, there is exactly one sign change point for . We have,

since the integrand is nonpositive. It remains to verify the lemma for , which holds with equality. ∎

To see (a), we observe that by Lemma 4 combined with the inequality , we get , which gives (a).

To see (b), let and take a sequence of functions from such that . To proceed we need another elementary lemma.

Lemma 5.

Let be a sequence of functions in . Then there exists a subsequence converging pointwise to a function in .

Proof.

As noted above, the functions from are uniformly bounded (by Lemma 4) and thus, using a standard diagonal argument, by passing to a subsequence, we can assume that converges for every rational (in ), say to . Notice that is log-concave on the rationals, that is , for all rationals and such that is also a rational. Moreover, is even and nonincreasing on . Let . If , then pick any rational and observe that , so . The function is continuous on . If , consider rationals such that . Then, by monotonicity and log-concavity,

thus (these limits exist by the monotonicity of ). Now for any , take rationals and such that and . Since, , we get

therefore . Thus, is convergent, to say . We also set, say . Then converges to at all but two points , the function is even and log-concave. By Lebesgue’s dominated convergence theorem, . ∎

By the lemma, for some subsequence and . By the Lebesgue dominated convergence theorem, , so . To show that the set is compact, we repeat the same argument.


Step II. Every extremal point of has at most degrees of freedom.

Recall the notion of degrees of freedom of log-concave functions introduced in [12]. The degree of freedom of a log-concave function is the largest integer such that there exist and linearly independent continuous functions defined on such that for every , the function is log-concave.

Suppose and has more than two degrees of freedom. Then there are continuous functions (supported in ) and such that for all the function is log-concave. Note that the space of solutions to the system of equations

is of dimension at least . Therefore this space intersected with the cube contains a symmetric interval and, in particular, two antipodal points and . Take and , which are both in . Then, and therefore is not an extremal point.


Step III. Densities with at most degrees of freedom are simple.

We want to determine all nonincreasing log-concave functions on with degree of freedom at most . Suppose are points of differentiability of the potential , such that . Define

We claim that is a log-concave non-increasing function for , with sufficiently small. To prove log-concavity we observe that on each interval the function is of the form . On the interval it is of the form . Log-concavity follows from Lemma 1 in [12]. We also have to ensure that the density is nonincreasing. On it follows from the fact that

for small . On the other intervals we have similar expressions

which follows from the fact that for some .

From this it follows that if there are points , such that , then has degree of freedom . It follows that the only function with degree of freedom at most is of the form

A two-point inequality

It remains to show that for every density of the form

where is a positive normalising constant and and are nonnegative, we have

with equality if and only if is uniform. If either or are zero, then is a uniform density and we directly check that there is equality. Therefore let us from now on assume that both and are positive and we shall prove the strict inequality. Since the left-hand side does not change when is replaced by for any positive , we shall assume that . Then the condition is equivalent to . We have

Moreover,

Putting these together yields

Therefore, the proof of Theorem 1 is complete once we show the following two-point inequality.

Lemma 6.

For nonnegative , positive and we have

Proof.

Integrating by parts, we can rewrite the left hand side as for a Borel measure on (which is absolutely continuous on with density and has the atom ). With and fixed, this is a strictly convex function of (by Hölder’s inequality). The right hand side is linear as a function of . Therefore, it suffices to check the inequality for and . For the inequality becomes equality. For , after computing the integral and exponentiating both sides, the inequality becomes

where we put and , which are positive. We lower-bound the right hand side using the estimate , , by

Therefore it suffices to show that

After moving everything on one side, plugging in , , expanding and simplifying, it becomes

where

It suffices to prove that these functions are nonnegative for . This is clear for . For , we check that and

For , we check that , and

It follows that and are nonnegative for . ∎

Remark 7.

If we put and in the inequality from Lemma 6, we get (in particular ). We suspect that this necessary condition is also sufficient for the inequality to hold for all positive and .

3. Proof of Theorem 2

Let us denote

and let be the density of and let be the density of . Since is assumed to be log-concave, it satisfies for all . According to the Edgeworth-type expansion described in [5] (Theorem 3.2 in Chapter 3), we have (with any )

where

Here the functions are given by

where are Hermite polynomials,

and the summation runs over all nonnegative integer solutions to the equation , and one uses the notation . The numbers are the cumulants of , namely

Let us calculate . Under our assumption (symmetry of and ), we have and . Therefore and

We get that for any

Let be the density of . Let us assume that it is of the form , where is even, smooth and compactly supported (say, supported in ) with bounded derivatives. Moreover, we assume that and that . Multiplying by a very small constant we can ensure that is log-concave.

We are going to use Theorem 1.3 from [6]. To check the assumptions of this theorem, we first observe that for any we have

since has bounded support. We have to show that for sufficiently big there is

Since is symmetric, we can assume that . Then

where we have used the fact that , has a bounded support contained in and . We conclude that

and thus

(In this proof and denote sufficiently large and sufficiently small universal constants that may change from one line to another. On the other hand, , and denote constants that may depend on the distribution of .) Moreover, for we have

so

Let us define . Note that , where . We have

We first bound . Note that

Assuming without loss of generality that , we have

We also have

Moreover, assuming without loss of generality that ,