It is a classical fact going back to Boltzmann  that when the variance of a real-valued random variable is kept fixed, the differential entropy is maximized by taking to be Gaussian. As is standard in information theory, we use the definition of Shannon : the differential entropy (or simply entropy
, henceforth, since we have no need to deal with discrete entropy in this note) of a random vectorwith density is defined as
provided that this integral exists, this definition having a minus sign relative to Boltzmann’s -functional. It is easy to see that if one tried to minimize
the entropy instead of maximizing it, there is no minimum among random variables with densities– indeed, a discrete random variable with variance 1 has differential entropy
, and densities of probability measures approaching such a discrete distribution in an appropriate topology would have differential entropies converging toas well. Nonetheless, it is of significant interest to identify minimizers of entropy within structured subclasses of probability measures. For instance, it was observed independently by Keith Ball (unpublished) and in  that the question of minimizing entropy under a covariance matrix constraint within the class of log-concave measures on
is intimately tied to the well known hyperplane or slicing conjecture in convex geometry.
More generally, log-concave distributions emerge naturally from the interplay between information theory and convex geometry, and have recently been a very fruitful and active topic of research (see the recent survey ). A probability density on is called log-concave if it is of the form for a convex function
. Our first goal in this note is to establish some sharp inequalities relating the entropy (and in fact, the more general class of Rényi entropies) to moments of log-concave distributions. Our second goal is to falsify a conjecture about Schur-concavity of weighted sums of i.i.d. random variables from a log-concave distribution, motivated by connections to the monotonicity of entropy in the central limit theorem as well as by questions about capacity-achieving distributions for additive non-Gaussian noise channels. Finally we make an unconnected remark about a variant for complex-valued random variables of a recent entropy power inequality for dependent random variables of Hao and Jog.
First, we show that among all symmetric log-concave probability distributions onwith fixed variance, the uniform distribution has minimal entropy. In fact, we obtain a slightly more general result.
Let be a symmetric log-concave random variable and . Then,
with equality if and only if is a uniform random variable.
It is instructive to write this inequality using the entropy power , in which case it becomes
In the special case corresponding to the variance, we have the sandwich inequality
with both inequalities being sharp in the class of symmetric log-concave random variables (the one on the left, coming from Theorem 1
, giving equality uniquely for the uniform distribution, while the one on the right, coming from the maximum entropy property of the Gaussian, giving equality uniquely for the Gaussian distribution.) Note that, so the range of entropy power given variance is quite constrained for symmetric log-concave random variables.
We note that Theorem 1
can be viewed as a sharp version in the symmetric case of some of the estimates from. However, finding the sharp version is quite delicate and one needs significantly more sophisticated methods, as explained below. For related upper bounds on the variance in terms of the entropy for mixtures of densities of the form , see the recent work .
Our argument comprises two main steps: first we reduce the problem to simple random variables (compactly supported, piecewise exponential density), using ideas and techniques developed by Fradelizi and Guedon  in order to elucidate the sophisticated localization technique of Lovász and Simonovits . Then, in order to verify the inequality for such random variables, we prove a two-point inequality.
To motivate our next result, we first recall the notion of the Schur majorisation. One vector in is majorised by another one , usually denoted , if the nonincreasing rearrangements and of and satisfy the inequalities for each and . For instance, any vector with nonnegative coordinates adding up to is majorised by the vector and majorises the vector . It was conjectured in  that for two independent identically distributed log-concave random variables the function is nondecreasing on . A way to generalise this is to ask whether for any i.i.d. copies of a log-concave random variable, we have , provided that . This property (called Schur-concavity in the literature) holds for symmetric Gaussian mixtures as recently shown in . We suspect it holds for uniform random variables (see also ), but here we prove it does not hold in general for log-concave symmetric distributions for the following reason: since , if it held, then the sequence would be nondecreasing and as it converges to , where is an independent Gaussian random variable with the same variance as , we would have in particular that . We construct examples where the opposite holds.
There exists a symmetric log-concave random variable with variance such that if are its independent copies and is large enough, we have
where is a standard Gaussian random variable, independent of the .
Our proof is based on sophisticated and remarkable Edgeworth type expansions recently developed in  en route to obtaining precise rates of convergence in the entropic central limit theorem.
Theorem 2 can be compared with the celebrated monotonicity of entropy property proved in  (see [16, 21, 10] for simpler proofs and [17, 18] for extensions), which says that for any random variable the sequence is nondecreasing, where are its independent copies.
Theorem 2 also has consequences for the understanding of capacity-achieving distributions for additive noise channels with non-Gaussian noise.
Our theorem also provides an example of two independent symmetric log-concave random variables and with the same variance such that , where is a Gaussian random variable with the same variance as and , independent of them, which is again in contrast to symmetric Gaussian mixtures (see ). It is an interesting question, already posed in , whether in general, for two i.i.d. summands, swapping one for a Gaussian with the same variance, increases entropy.
Our third result is a short remark regarding a recent inequality by Hao and Jog (see their paper  for motivation and proper discussion): if is an uncoditional random vector in , then . Recall that a random vector is called unconditional if for every choice of signs , the vector has the same distribution as . We remark that a complex analogue of this inequality also holds and the proof is essentially trivial thanks to existence of complex matrices with determinant and all entries with modulus .
Let be a random vector in which is complex-unconditional, that is for every complex numbers such that for every , the vector has the same distribution as . Then
In the subsequent sections we present proofs of our theorems and provide some additional remarks.
2. Proof of Theorem 1
be the set of all even log-concave probability density functions on. Define for the following functionals: entropy,
and -th moment,
Our goal is to show that
First we argue that it only suffices to consider compactly supported densities. Let be the set of all densities from which are supported in the interval . Given , by considering , which is in , and checking that and tend to and , we get
This last infimum can be further rewritten as
Consequently, to prove Theorem 1, it suffices to show that for every , we have
Degrees of freedom
We shall argue that the last infimum is attained at desities which on are first constant and then decrease exponentially. Fix positive numbers and and consider the set of densities . We treat as a subset of which is a locally convex Hausdorff space (later on, this will be needed to employ Krein-Milman type theorems).
Step I. We show that is finite and attained at a point from the set of the extremal points of .
Let us recall that a set is an extremal subset of ( is a vector space) if it is nonempty and if for some , we have for some elements and , then both and are in . Notice that this definition does not require the convexity of . Moreover, is an extremal point of , if is extremal. We remark that for a convex function , the set of points where its supremum is attained (if nonnempty) is an extremal subset of (for instance, see Lemma 7.64 in ).
An application of Zorn’s lemma together with a separation type theorem shows that every nonempty compact extremal subset of of a locally convex Hausdorff vector space contains at least one extremal point (see Lemma 7.65 in ). Therefore, it remains to show
the set of the extremisers of entropy,
is nonempty and compact.
First we note a standard lemma.
For every even log-concave function , we have
By homogeneity we can assume that . Consider such that . By log-concavity, there is exactly one sign change point for . We have,
since the integrand is nonpositive. It remains to verify the lemma for , which holds with equality. ∎
To see (a), we observe that by Lemma 4 combined with the inequality , we get , which gives (a).
To see (b), let and take a sequence of functions from such that . To proceed we need another elementary lemma.
Let be a sequence of functions in . Then there exists a subsequence converging pointwise to a function in .
As noted above, the functions from are uniformly bounded (by Lemma 4) and thus, using a standard diagonal argument, by passing to a subsequence, we can assume that converges for every rational (in ), say to . Notice that is log-concave on the rationals, that is , for all rationals and such that is also a rational. Moreover, is even and nonincreasing on . Let . If , then pick any rational and observe that , so . The function is continuous on . If , consider rationals such that . Then, by monotonicity and log-concavity,
thus (these limits exist by the monotonicity of ). Now for any , take rationals and such that and . Since, , we get
therefore . Thus, is convergent, to say . We also set, say . Then converges to at all but two points , the function is even and log-concave. By Lebesgue’s dominated convergence theorem, . ∎
By the lemma, for some subsequence and . By the Lebesgue dominated convergence theorem, , so . To show that the set is compact, we repeat the same argument.
Step II. Every extremal point of has at most degrees of freedom.
Recall the notion of degrees of freedom of log-concave functions introduced in . The degree of freedom of a log-concave function is the largest integer such that there exist and linearly independent continuous functions defined on such that for every , the function is log-concave.
Suppose and has more than two degrees of freedom. Then there are continuous functions (supported in ) and such that for all the function is log-concave. Note that the space of solutions to the system of equations
is of dimension at least . Therefore this space intersected with the cube contains a symmetric interval and, in particular, two antipodal points and . Take and , which are both in . Then, and therefore is not an extremal point.
Step III. Densities with at most degrees of freedom are simple.
We want to determine all nonincreasing log-concave functions on with degree of freedom at most . Suppose are points of differentiability of the potential , such that . Define
We claim that is a log-concave non-increasing function for , with sufficiently small. To prove log-concavity we observe that on each interval the function is of the form . On the interval it is of the form . Log-concavity follows from Lemma 1 in . We also have to ensure that the density is nonincreasing. On it follows from the fact that
for small . On the other intervals we have similar expressions
which follows from the fact that for some .
From this it follows that if there are points , such that , then has degree of freedom . It follows that the only function with degree of freedom at most is of the form
A two-point inequality
It remains to show that for every density of the form
where is a positive normalising constant and and are nonnegative, we have
with equality if and only if is uniform. If either or are zero, then is a uniform density and we directly check that there is equality. Therefore let us from now on assume that both and are positive and we shall prove the strict inequality. Since the left-hand side does not change when is replaced by for any positive , we shall assume that . Then the condition is equivalent to . We have
Putting these together yields
Therefore, the proof of Theorem 1 is complete once we show the following two-point inequality.
For nonnegative , positive and we have
Integrating by parts, we can rewrite the left hand side as for a Borel measure on (which is absolutely continuous on with density and has the atom ). With and fixed, this is a strictly convex function of (by Hölder’s inequality). The right hand side is linear as a function of . Therefore, it suffices to check the inequality for and . For the inequality becomes equality. For , after computing the integral and exponentiating both sides, the inequality becomes
where we put and , which are positive. We lower-bound the right hand side using the estimate , , by
Therefore it suffices to show that
After moving everything on one side, plugging in , , expanding and simplifying, it becomes
It suffices to prove that these functions are nonnegative for . This is clear for . For , we check that and
For , we check that , and
It follows that and are nonnegative for . ∎
If we put and in the inequality from Lemma 6, we get (in particular ). We suspect that this necessary condition is also sufficient for the inequality to hold for all positive and .
3. Proof of Theorem 2
Let us denote
and let be the density of and let be the density of . Since is assumed to be log-concave, it satisfies for all . According to the Edgeworth-type expansion described in  (Theorem 3.2 in Chapter 3), we have (with any )
Here the functions are given by
where are Hermite polynomials,
and the summation runs over all nonnegative integer solutions to the equation , and one uses the notation . The numbers are the cumulants of , namely
Let us calculate . Under our assumption (symmetry of and ), we have and . Therefore and
We get that for any
Let be the density of . Let us assume that it is of the form , where is even, smooth and compactly supported (say, supported in ) with bounded derivatives. Moreover, we assume that and that . Multiplying by a very small constant we can ensure that is log-concave.
We are going to use Theorem 1.3 from . To check the assumptions of this theorem, we first observe that for any we have
since has bounded support. We have to show that for sufficiently big there is
Since is symmetric, we can assume that . Then
where we have used the fact that , has a bounded support contained in and . We conclude that
(In this proof and denote sufficiently large and sufficiently small universal constants that may change from one line to another. On the other hand, , and denote constants that may depend on the distribution of .) Moreover, for we have
Let us define . Note that , where . We have
We first bound . Note that
Assuming without loss of generality that , we have
We also have
Moreover, assuming without loss of generality that ,