 # Rényi entropy and variance comparison for symmetric log-concave random variables

We show that for any α>0 the Rényi entropy of order α is minimized, among all symmetric log-concave random variables with fixed variance, either for a uniform distribution or for a two sided exponential distribution. The first case occurs for α∈ (0,α^*] and the second case for α∈ [α^*,∞), where α^* satisfies the equation 1/α^*-1logα^*= 1/2log 6, that is α^* ≈ 1.241.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

For a random variable with density its Rényi entropy of order is defined as

 hα(X)=hα(f)=11−αlog(∫fα(x)dx),

assuming that the integral converges, see . If one recovers the usual Shannon differential entropy . Also, by taking limits one can define , where stand for the support of and , there is the essential supremum of .

It is a well known fact that for any random variable one has

 h(X)≤12logVar(X)+12log(2πe)

with equality only for Gaussian random variables, see e.g. Theorem 8.6.5 in . The problem of maximizing Rényi entropy under fixed variance has been considered independently by Costa, Hero and Vignat in  and by Lutwak, Yang and Zhang in , where the authors showed, in particular, that for the maximizer is of the form

 f(x)=c0(1+(1−α)(c1x)2)1α−1+,

which will be called the generalized Gaussian density. Any density satisfying shows that for the supremum of under fixed variance is infinite. One may also ask for reverse bounds. However, the infimum of the functional under fixed variance is as can be seen by considering for which the variance stays bounded whereas as . Therefore, it is natural to restrict the problem to a certain natural class of densities, in which the Rényi entropy remains lower bounded in terms of the variance. In this context it is natural to consider the class of log-concave densities, namely densities having the form , where is convex. In  it was proved that for any symmetric log-concave random variable one has

 h(X)≥12logVar(X)+12log12

with equality if and only if is a uniform random variable. In the present article we shall extend this result to general Rényi entropy. Namely, we shall prove the following theorem.

###### Theorem 1.

Let be a symmetric log-concave random variable and , . Define to be the unique solution to the equation (). Then

and

For equality holds if and only if is uniform random variable on a symmetric interval, while for the bound is attained only for two-sided exponential distribution. When , two previously mentioned densities are the only cases of equality.

The above theorem for trivially follows from the case as already observed in  (see Theorem 5 therein). This is due to the monotonicity of Rényi entropy in . As we can see the case of Theorem 1 is a strengthening of the main result of , as in this case and the right hand sides are the same.

This article is organized as follows. In Section 2 we reduce Theorem 1 to the case . In Section 3 we further simplify the problem by reducing it to simple

functions via the concept of degrees of freedom. Section

4 contains the proof for these simple functions. In the last section we derive two applications of our main result.

## 2. Reduction to the case α=α∗

The following lemma is well known. We present its proof for completeness. The proof of point (ii) is taken from . As pointed out by the authors, it can also be derived from Theorem 2 in  or from Theorem VII.2 in .

###### Lemma 2.

Suppose

is a probability density in

.

• The function is log-convex on .

• If is log-concave then the function is log-concave on .

###### Proof.

(i) This is a simple consequence of Hölder’s inequality.

(ii) Let . The function can be written as , where is convex. Changing variables we get . For any convex the so-called perspective function is convex on . Indeed, for , and we have

 W(λz1 +(1−λ)z2,λp1+(1−λ)p2)=(λp1+(1−λ)p2)V⎛⎝λp1z1p1+(1−λ)p2z2p2λp1+(1−λ)p2⎞⎠ ≤λp1V(z1p1)+(1−λ)p2V(z2p2)=λW(z1,p1)+(1−λ)W(z2,p2).

Since , the assertion follows from the Prékopa’s theorem from  saying that a marginal of a log-concave function is again log-concave. ∎

###### Remark.

The use of the term perspective function appeared in , however the convexity of this function was known much earlier.

The next corollary is a simple consequence of Lemma 2. The right inequality of this corollary appeared in , whereas the left inequality is classical.

###### Corollary 3.

Let be a log-concave probability density in . Then for any we have

 0≤hq(f)−hp(f)≤nlogqq−1−nlogpp−1.

In fact the first inequality is valid without the log-concavity assumption.

###### Proof.

To prove the first inequality we observe that due to Lemma 2 the function defined by is convex. From the monotonicity of slopes of we get that , which together with the fact that gives .

Similarly, to prove the right inequality we note that is concave with . Thus gives , which finishes the proof. ∎

Having Corollary 3 we can easily reduce Theorem 1 to the case . Indeed, the case follows from the left inequality of Corollary 3 ( is non-increasing in ). The case is a consequence of the right inequality of the above corollary, according to which the quantity is non-decreasing in .

## 3. Reduction to simple functions via degrees of freedom

The content of this section is a rather straightforward adaptation of the method from . Therefore, we shall only sketch the arguments.

By a standard approximation argument it is enough to prove our inequality for functions from the set of all continuous even log-concave probability densities supported on . Thus, it suffices to show that

 (1) inf {hα∗(f): f∈FL, Var(f)=σ2}≥logσ+12log2+logα∗α∗−1.

Take . We shall show that is attained on . Equivalently, since it suffices to show that is attained on

. We first argue that this supremum is finite. This follows from the estimate

and from the inequality , see Lemma 1 in . Next, let be a sequence of functions from such that . According to Lemma 2 from , by passing to a subsequence one can assume that pointwise, where is some function from . Since , by the Lebesgue dominated convergence theorem we get that and therefore the supremum is attained on .

Now, we say that is an extremal point in if cannot be written as a convex combination of two different functions from , that is, if for some and , then necessarily . It is easy to observe that if is not extremal, then it cannot be a maximizer of on . Indeed, if for some and with , then the strict convexity of implies

 ∫fα∗=∫(λf1+(1−λ)f2)α∗<λ∫fα∗1+(1−λ)∫fα∗2≤M.

This shows that in order to prove (1) it suffices to consider only the functions being extremal points of . Finally, according to Steps III and IV of the proof of Theorem 1 from  these extremal points are of the form

 f(x)=c1[0,a](|x|)+ce−γ(|x|−a)1[a,a+b](|x|),a+b=L, c>0, a,b,γ≥0,

where it is also assumed that .

## 4. Proof for the case α=α∗

Due to the previous section, we can restrict ourselves to probability densities of the form

 f(x)=c1[0,a](|x|)+ce−γ(|x|−a)1[a,a+b](|x|),a,b,γ≥0.

The inequality is invariant under scaling for any positive , so we can assume that (note that in the case we get equality). We have

 ∫Rfα=cα∫R1[0,a](|x|)+cα∫Re−αx1[0,b](|x|)=2cα(a+1−e−αbα)

and thus

 hα(f)=11−αlog∫Rfα=11−αlog(2cα(a+1−e−αbα)).

Moreover,

 Var(f)=2c∫Rx21[0,a](x)dx+2c∫R(x+a)2e−x1[0,b]dx=2c(a33+∫b0(x+a)2e−xdx),

so our inequality can be rewritten as

 11−α∗log(2cα∗(a+1−e−α∗bα∗))+logα∗1−α∗≥12log(2c(a33+∫b0(x+a)2e−xdx))+12log2,

which is

 11−α∗log(2cα∗(aα∗+1−e−α∗b))≥12log(2c(a33+∫b0(x+a)2e−xdx))+12log2.

The constraint gives . After multiplying both sides by , exponentiating both sides and plugging the expression for in, we get the equivalent form of the inequality, , where

 (2) G(a,b,α)=2(aα+1−e−αb)21−α(a+1−e−b)1−3α1−α−(a33+∫b0(x+a)2e−xdx).

We will also write .

To finish the proof we shall need the following lemma.

###### Lemma 4.

The following holds:

• holds for every ,

• for every ,

• for every ,

• for every ,

• for every .

With these claims at hand it is easy to conclude the proof. Indeed, one easily gets, one by one,

 ∂3∂a3G(a,b)≤0,∂2∂a2G(a,b)≥0,∂∂aG(a,b)≥0,G(a,b)≥0,b≥0.

The proof of points (d) and (e) relies on the following simple lemma.

###### Lemma 5.

Let , where the series is convergent for every nonnegative . If there exists a nonnegative integer such that for and for , then changes sign on at most once. Moreover, if at least one coefficient is positive and at least one negative, then there exists such that on and on .

###### Proof.

Clearly the function is nonincreasing on , so the first claim follows. To prove the second part we observe that for small the function must be strictly positive and is strictly decreasing on . ∎

With this preparation we are ready to prove Lemma 4.

###### Proof of Lemma 4.

(a) This point is the crucial observation of the proof. It turns out that

 ∂4G∂a4(a,b,α) =8α(α+1)(3α−1)(1+a−e−b)3α−1α−1(1+aα−e−bα)21−α ×((eb−αebα+(α−1)eb+bα)(α−1)(eb(a+1)−1)(ebα(aα+1)−1))4,

which is nonegative for .

(b) By a direct computation we have

 ∂3G(a,b,α)∂a3 =−2−4α(1−α)3(1+a−e−b)2α−1(1+aα−e−bα)1−3αα−1 ×[(α+1)(3α−1)(1+aα−e−bα)3−2α3(α+1)(1+a−e−b)3 +3α(α+1)(3α−1)(1+a−e−b)2(1+aα−e−bα) +6α(1−3α)(1+a−e−b)(1+aα−e−bα)2].

When tends to infinity with fixed this converges to

 −2−4α(1−α)3α1−3αα−1((α+1)(3α−1)α3−2α3(α+1)+3α2(α+1)(3α−1)+6α3(1−3α)),

which is . If , using equality , we get that this expression is equal to .

(c) Again a direct computation yields

 ∂2G(a,b,α)∂a2 =4α2(α+1)(a−e−b+1)(1−e−ba−1+a−1)2αα−1(α−e−αba−1+a−1)−2αα−1(α−1)2 +8α(1−3α)(a−e−b+1)(1−e−ba−1+a−1)α+1α−1(α−e−αba−1+a−1)−α+1α−1(α−1)2 −2a+2e−b−2.

As tends to infinity, we have

 (1−e−ba−1+a−1)w=1+w(1−e−b)a−1+o(a−1)

and

 (α−e−αba−1+a−1)w=αw+w(1−e−αb)αw−1a−1+o(a−1).

Using these formulas together with the above expression for the second derivative easily gives

 ∂2G(a,b,α)∂a2=h1(α)1x+h2(b,α)+o(a−1),

where

 h1(α)=12α−2α−1−2

and

 h2(b,α)=2(e−b−1)+4α(α11−α−αα1−α)2(α−1)3(2(α−1−αe−b+e−bα)+3(1−e−b)α(α−1)).

We have . Moreover,

 4α∗((α∗)11−α∗−(α∗)α∗1−α∗)2(α∗−1)3=4α∗(1√6−1√6α∗)2(α∗−1)3=23α∗(α∗−1).

Hence,

 lima→∞∂2G(a,b,α)∂a2=h2(b,α∗)=43α∗(α∗−1)((1−e−b)α∗−(1−e−bα∗)).

This expression is nonnegative for since the function is concave, so we have as (monotonicity of slopes).

(e) To illustrate our method, before proceeding with the proof of (d) we shall prove (e), as the idea of the proof of (d) is similar, but the details are more complicated. Our goal is to show the inequality

 (3) (1−e−α∗b)21−α∗(1−e−b)1−3α∗1−α∗≥1−b2+2b+22e−b.

after taking the logarithm of both sides our inequality reduces to nonnegativity of

 ϕ(b)=21−α∗log(1−e−α∗b)+1−3α∗1−α∗log(1−e−b)−log(1−b2+2b+22e−b).

We have

 ϕ′(b)=2α∗(1−α∗)(eα∗b−1)+1−3α∗(1−α∗)(eb−1)+b2b2+2b−2eb+2.

It turns out that changes sign on at most once. To show that, firstly, clear out the denominators (they have fixed sign on ) to obtain the expression

 (4) 2α∗(b2+2b−2eb+2)(eb−1)+(1−3α∗)(eα∗b−1)(b2+2b−2eb+2)+b2(1−α∗)(eb−1)(eα∗b−1).

Now we will apply Lemma 5 to . That expression can be rewritten as

 −4α∗(∞∑n=3bnn!)(∞∑n=1bnn!)+(6α∗−2)(∞∑n=1(α∗b)nn!)(∞∑n=3bnn!)+b2(1−α∗)(∞∑n=1bnn!)(∞∑n=1(α∗b)nn!),

so the -th coefficient in the Taylor expansion is equal to

 an =(6α∗−2)(n−3∑j=1(α∗)jj!(n−j)!)−4α∗(n−3∑j=11j!(n−j)!)+(1−α∗)(n−3∑j=1(α∗)jj!(n−2−j)!) ≤1n!(6α∗−2)(α∗+1)n+1−α∗(n−2)!((α∗+1)n−2−1−(α∗)n−2) ≤6n!(α∗+1)n−n(n−1)30n!(α∗+1)n+8n2n!(α∗)n.

When , we have and , so is less than zero for . It can be checked (preferably using computational software) that the rest of coefficients satisfy the pattern from Lemma 5, with for , for and for .

This way we have proved that changes sign in exactly one point . Thus, is first increasing and then decreasing. Since and , the assertion follows.

(d) We have to show that

 (1−e−b)2α∗α∗−1(1−e−bα∗)−1+α∗α∗−1α∗−1[(3α∗−1)(1−e−bα∗)−2α∗(1−e−b)]≥1−(b+1)e−b.

Let be the expression on the left side and on the right side. Both and are positive for , so we can take the logarithm of both sides. We will now show that changes sign at most once on . We have

 (log(φ1))′−(log(φ2))′ =2α∗(eb−1)(α∗−1)−(α∗+1)α∗(α∗−1)(ebα∗−1) +α∗(3α∗−1)eb−2ebα∗α∗eb(1−3α∗)+2α∗ebα∗+(α∗−1)ebα∗+b−beb−b−1.

Multiplying the above expression by the product of denominators does not change the hypothesis, since each of the denominators is positive. After this multiplication we get the expression

 [−(eb−1)(eb−1−b)(α∗+1)α∗+2(eb−1−b)α∗(ebα∗−1)−b(eb−1)(α∗−1)(ebα∗−1)] ×(eb(1−3α∗)+2α∗ebα∗+(α∗−1)ebα∗+b) +α∗(α∗−1)(eb−1)(eb−1−b)(ebα∗−1)(eb(3α∗−1)−2ebα∗).

Let us consider the Taylor series of this function (it is clear that the series converges to the function everywhere). It can be shown (again using computational software) that coefficients of this series up to order are nonnegative and coefficients of order greater than , but lesser than are negative. Now we will show negativity of coefficients of order at least (our bound will be very crude, so it would not work, if we replaced with lower number). Firstly we note that

 eb(1−3α∗)+2α∗ebα∗+(α∗−1)ebα∗+b

has -th Taylor coefficient equal to

 1−3α∗+2(α∗)n+1+(α∗−1)(α∗+1)nn!≥1−3α∗+2α∗+α∗−1n!=0,

so all its coefficients are nonnegative. Thus we can change expression in square brackets to (we discard the first term and bound from above the second and third one) to increase every Taylor coefficient of main expression. Now we want to show the negativity of coefficients of order at least for

 (eb−1)(ebα∗−1)[(5/2−b/5)(eb(1−3α∗)+2α∗ebα∗+(α∗−1)ebα∗+b)+α∗(α∗−1)(eb−b−1)((3α∗−1)eb−2ebα∗)]

The expression in square brackets has -th Taylor coefficient equal to zero for , while for it is

 cn =5(1−3α∗)2n!+3α∗−15(n−1)!+5(α∗)n+1n!−2(α∗)n5(n−1)!+5(α∗−1)(α∗+1)n2n!−(α∗−1)(α∗+1)n−15(n−1)! +α∗(α∗−1)(3α∗−1)2n−n−1n!−2α∗(α∗−1)n!((α∗+1)n−(α∗)n−n(α∗)n−1).

Using the bounds

 5(1−3α∗)2n!≤0,−2(α∗)n5(n−1)!≤0,α∗(α∗−1)(3α∗−1)2n−n−1n!≤2nn!

and

 2α∗(α∗−1)n!((α∗)n+n(α∗)n−1)≤(n+1)(α∗)nn!,−(α∗−1)(α∗+1)n−15(n−1)!≤−45n10n!(α∗−