DeepAI

# The entropic barrier is n-self-concordant

For any convex body K ⊆ℝ^n, S. Bubeck and R. Eldan introduced the entropic barrier on K and showed that it is a (1+o(1)) n-self-concordant barrier. In this note, we observe that the optimal bound of n on the self-concordance parameter holds as a consequence of the dimensional Brascamp-Lieb inequality.

11/13/2019

### Strong Self-Concordance and Sampling

Motivated by the Dikin walk, we develop aspects of an interior-point the...
11/04/2019

### Generalized Self-concordant Hessian-barrier algorithms

Many problems in statistical learning, imaging, and computer vision invo...
01/06/2022

### No self-concordant barrier interior point method is strongly polynomial

It is an open question to determine if the theory of self-concordant bar...
09/13/2022

### Semiparametric Estimation of Optimal Dividend Barrier for Spectrally Negative Lévy Process

We disucss a statistical estimation problem of an optimal dividend barri...
07/19/2022

### Towards An Optimal Solution to Place Bistatic Radars for Belt Barrier Coverage with Minimum Cost

With the rapid growth of threats, sophistication and diversity in the ma...
10/16/2020

### On directional Whitney inequality

This paper studies a new Whitney type inequality on a compact domain Ω⊂ℝ...
01/26/2023

### Measuring Regulatory Barriers Using Annual Reports of Firms

Existing studies show that regulation is a major barrier to global econo...

## 1 Introduction

Let be a convex body. In [bubeckeldan2019entropic], S. Bubeck and R. Eldan introduced the entropic barrier , defined as follows. First, let denote the logarithmic Laplace transform of the uniform measure on ,

 f(θ) :=ln∫Kexp⟨θ,x⟩\Dx. (1)

Then, define to be the Fenchel conjugate of ,

 f⋆(x) :=supθ∈\Rn{⟨θ,x⟩−f(θ)}.

They proved the following result.

###### Theorem 1 ([bubeckeldan2019entropic, Theorem 1]).

The function is strictly convex on . Also, the following statements hold.

1. is self-concordant, i.e.

 ∇3f⋆(x)[h,h,h]≤2\abs⟨h,∇2f⋆(x)h⟩3/2,for all x∈\interiorK,h∈\Rn.
2. is a -self-concordant barrier, i.e.

 ∇2f⋆(x) ⪰1ν∇f⋆(x)∇f⋆(x)\T,for all x∈\interiorK,

with .

Self-concordant barriers are most well-known for their prominent role in the theory of interior-point methods for optimization [nesterov1995interiorpoint], but they also find applications to numerous other problems such as online linear optimization with bandit feedback [abernethy2008banditlinear] (indeed, the latter was a motivating example for the introduction of the entropic barrier in [bubeckeldan2019entropic]).

A central theoretical question in the study of self-concordant barriers is: for any convex domain , does there exist a -self-concordant barrier for , and if so, what the optimal value of the parameter ? In their seminal work [nesterov1995interiorpoint], Y. Nesterov and A. Nemirovskii constructed for each a universal barrier with . On the other hand, explicit examples (e.g. the simplex and the cube) show that the best possible self-concordance parameter is  [nesterov1995interiorpoint, Proposition 2.3.6]. The situation was better understood for convex cones, on which the canonical barrier was shown to be -self-concordant independently by R. Hildebrand and D. Fox [hildebrand2014canonicalbarrier, fox2015canonical]. Then, in [bubeckeldan2019entropic], S. Bubeck and R. Eldan introduced the entropic barrier and showed that it is -self-concordant on general convex bodies, and -self-concordant on convex cones; further, they showed that the universal barrier is also -self-concordant on convex cones. Subsequently, Y. Lee and M. Yue settled the question of obtaining optimal self-concordant barriers for general convex bodies by proving that the universal barrier is always -self-concordant [leeyue2021universalbarrier].

The purpose of this note is to describe the following observation.

###### Theorem 2.

The entropic barrier on any convex body is an -self-concordant barrier.

Besides improving the result of [bubeckeldan2019entropic], the theorem shows that the entropic barrier provides a second example of an optimal self-concordant barrier for general convex bodies; to the best of the author’s knowledge, no other optimal self-concordant barriers are known.

We will provide two distinct proofs of 2. First, we will observe that 2 is an immediate consequence of the following theorem, which was obtained independently in [nguyen2014dimensionalvariance, wang2014heatcapacity]; see also [fradelizimadimanwang2016infocontent].

###### Theorem 3.

Let be a log-concave density on . Then,

 \varμV ≤n.

In turn, as discussed in [nguyen2014dimensionalvariance, bolleygentilguillin2018brascamplieb]3 is related to certain dimensional improvements of the Brascamp-Lieb inequality. We state a version of this inequality which is convenient for the present discussion.

###### Theorem 4 ([bolleygentilguillin2018brascamplieb, Proposition 4.1]).

Let be a log-concave density on , where is of class and . Then, for all compactly supported , it holds that

 \varμg ≤\Eμ⟨∇g,(∇2V)−1∇g⟩−\covμ(g,V)2n−\varμV.

It is straightforward to see that 4 implies 3. Indeed, via a routine approximation argument, we may assume that satisfies the hypothesis of 4. Taking (which is justified via another approximation argument) and rearranging the inequality of 4 yields

 \varμV ≤n\Eμ⟨∇V,(∇2V)−1∇V⟩n+\Eμ⟨∇V,(∇2V)−1∇V⟩≤n.

Next, in our second approach to 2, we observe that a key step in the proof of 3 given by [wang2014heatcapacity]

is a tensorization principle. It is then natural to wonder whether such a principle can be applied directly to deduce

2. Indeed, we have the following elementary lemma.

###### Lemma 1.

Suppose that for each and each convex body , we have a function such that is a -self-concordant barrier for . Also, suppose that the following consistency condition holds:

 ϕm+n,K×K′(x,x′) =ϕm,K(x)+ϕn,K′(x′), (2)

for all , all convex bodies , , and all , . Then, is a -self-concordant barrier for .

We will check that the entropic barrier satisfies the consistency condition described in the previous lemma in Section 4. Combined with the second statement in 1, it yields another proof of 2.

The remainder of this note is organized as follows. In Section 2, we will explain the connection between 2 and 3, thereby deducing the former from the latter. Then, so as to make this note more self-contained, in Section 3 we will provide two proofs of the dimensional Brascamp-Lieb inequality (4). The first proof follows [bolleygentilguillin2018brascamplieb] and proceeds via a dimensional improvement of Hörmander’s method. The second “proof”, which is only sketched, shows how the dimensional Brascamp-Lieb inequality may be obtained from a convexity principle: the entropy functional is convex along generalized Wasserstein geodesics which arise from Bregman divergence couplings [ahnchewi2021mirrorlangevin]. The second argument appears to be new. Finally, in Section 4, we present the tensorization argument as encapsulated in 1.

## 2 From the entropic barrier to the dimensional Brascamp-Lieb inequality

In this section, we follow [bubeckeldan2019entropic]

. The entropic barrier has a fruitful interpretation in terms of an exponential family of probability distributions defined over the convex body

. For each , we define the density on via

 pθ(x) :=exp⟨θ,x⟩∫Kexp⟨θ,x′⟩\Dx′\one{x∈K}. (3)

Since (defined in (1

)) is essentially the logarithmic moment-generating function of

, then the derivatives of yield cumulants of . In particular,

 ∇f(θ) =\EpθX,∇2f(θ)=\covpθX.

By convex duality, the mappings and

are inverses of each other. From the classical duality between the logarithmic moment-generating function and entropy, we can also deduce that

 f⋆(x) =\euH(p∇f⋆(x)),

where denotes the entropy functional222Note the sign convention, which is opposite the usual one in information theory. We use this convention as it is convenient for to be convex.

 \euH(p) :=∫plnp. (4)

The self-concordance parameter of is the least such that

 ⟨∇f⋆(x),[∇2f⋆(x)]−1∇f⋆(x)⟩≤ν,for all x∈\interiorK.

Taking , equivalently we require

 ⟨θ,∇2f(θ)θ⟩≤ν,for all θ∈\Rn,

which has the probabilistic interpretation

 \varpθ⟨θ,X⟩≤ν,% for all θ∈\Rn. (5)

From the definition (3), we see that the density is log-concave, where for . By applying 3 to , we immediately deduce that (5) holds with .

## 3 Proof of the dimensional Brascamp-Lieb inequality

Next, we wish to give some proofs of the dimensional Brascamp-Lieb inequality (4). Classically, the Brascamp-Lieb inequality reads as follows.

###### Theorem 5 ([brascamplieb1976]).

Let be a density on , where is a convex function of class . Then, for every locally Lipschitz ,

 \varμg ≤\Eμ⟨∇g,(∇2V)−1∇g⟩. (6)

The Brascamp-Lieb inequality is a Poincaré inequality for the measure corresponding to the Newton-Langevin diffusion [chewietal2020mirrorlangevin]. When is strongly convex, , it recovers the usual Poincaré inequality

 \varμg≤1α\Eμ[\norm∇g2].

See [bobkovledoux2000brunnmintobrascamplieblsi, bakrygentilledoux2014, cordero2017transport] for various proofs of 5.

Since the inequality (6) makes no explicit reference to the dimension, it actually holds in infinite-dimensional space. In contrast, 4 asserts that (6) can be improved by subtracting an additional non-negative term from the right-hand side in any finite dimension. This is referred to as a dimensional improvement of the Brascamp-Lieb inequality.

### 3.1 Proof by Hörmander’s L2 method

We now present the proof of 4 given in [bolleygentilguillin2018brascamplieb]. The starting point for Hörmander’s method is to first dualize the Poincaré inequality.

###### Proposition 1 ([barthecorderoerausquin2013invariances, Lemma 1]).

Let

be a probability density on

, where is of class . Define the corresponding generator on smooth functions via

 \msLg :=−Δg+⟨∇V,∇g⟩.

Suppose is a matrix-valued function mapping into the space of symmetric positive definite matrices such that for all smooth ,

 \Eμ[(\msLu)2] ≥\Eμ⟨∇u,A∇u⟩. (7)

Then, for all , it holds that

 \varμg ≤\Eμ⟨∇g,A−1∇g⟩.
###### Proof.

We may assume . This condition is certainly necessary for the equation to be solvable; in order to streamline the proof, we will assume that a solution exists. (This assumption can be avoided by invoking [corderoerausquinfradelizimaurey2014bconj] and using a density argument; see [barthecorderoerausquin2013invariances] for details.)

Using the integration by parts formula for the generator,

 \Eμ[g\msLu]=\Eμ⟨∇g,∇u⟩,

we obtain

 \varμg =\Eμ[g2]=2\Eμ[g\msLu]−\Eμ[(\msLu)2]≤2\Eμ⟨∇g,∇u⟩−\Eμ⟨∇u,A∇u⟩.

Next, since for all , it implies

 \varμg ≤\Eμ⟨∇g,A−1∇g⟩.\qed

The key idea now is that the condition (7) can be verified with the help of the curvature of the potential . Indeed, assume now that is of class and that . By direct calculation, one verifies the commutation relation

 ∇\msLu =(\msL+∇2V)∇u. (8)

Hence,

 \Eμ[(\msLu)2]=\Eμ⟨∇u,∇\msLu⟩=\Eμ⟨∇u,(\msL+∇2V)∇u⟩=\Eμ⟨∇u,∇2V∇u⟩+\Eμ[\norm∇2u2HS], (9)

where the last equality follows from the integration by parts formula for the generator applied to each coordinate separately: . Since the second term is non-negative, 1 now implies the Brascamp-Lieb inequality (5).

In order to obtain the dimensional improvement of the Brascamp-Lieb inequality (4), we will imitate the proof of 1, only now we will use the additional term in the above identity.

###### Proof of 4.

As before, let . However, we introduce an additional trick and consider not necessarily satisfying ; this will help to optimize the bound at the end of the argument. Following the computations in 1 and using the key identity (9), we obtain

 \varμg =\Eμ[g2]=\Eμ[(g−\msLu)2]+2\Eμ[g\msLu]−\Eμ[(\msLu)2] =\Eμ[(g−\msLu)2]+2\Eμ⟨∇g,∇u⟩−\Eμ⟨∇u,∇2V∇u⟩−\Eμ[\norm∇u2HS] ≤\Eμ[(g−\msLu)2]+\Eμ⟨∇g,(∇2V)−1∇g⟩−\Eμ[\norm∇u2HS].

For the second term, we use the inequality

 \Eμ[\norm∇u2HS]≥1n(\EμΔu)2.

From integration by parts,

 \EμΔu =\Eμ⟨∇V,∇u⟩=\Eμ[V\msLu]=\covμ(g,V)+\Eμ[V(\msLu−g)].

We now choose for some to be chosen later. For brevity of notation, write and . Then,

 \varμg−\Eμ⟨∇g,(∇2V)−1∇g⟩≤a2\mbV−1n(\mbC+a\mbV)2 =−\mbV(n−\mbV)n(a−\mbCn−\mbV)2−\mbC2\mbVn(n−\mbV)−\mbC2n.

Observe that this inequality entails , or else we could send and arrive at a contradiction. Optimizing over , we obtain

 \varμg ≤\Eμ⟨∇g,(∇2V)−1∇g⟩−\mbC2n−\mbV.\qed

### 3.2 Proof by convexity of the entropy along Bregman divergence couplings

It is well-known that Poincaré inequalities are obtained from linearizing transportation inequalities. In [cordero2017transport], D. Cordero-Erausquin obtained the Brascamp-Lieb inequality (5) by linearizing the following inequality:

 \euDV(ρ\mmidμ) ≤\msfKL(ρ\mmidμ),for all ρ∈\mcP(\Rn). (10)

Here, on ; denotes the space of probability measures on ; is the Kullback-Leibler (KL) divergence; and is the Bregman divergence coupling cost, defined as

 \euDV(ρ\mmidμ) =infγ∈\msfcouplings(ρ,μ)∫DV(x,y)\Dγ(x,y),

with

 DV(x,y) :=V(x)−V(y)−⟨∇V(y),x−y⟩.

On the other hand, together with K. Ahn in [ahnchewi2021mirrorlangevin], the author obtained the transportation inequality (10) as a consequence of a convexity principle in optimal transport. It is therefore natural to ask whether the dimensional Brascamp-Lieb inequality (4) can be obtained directly from (a strengthening of) this principle. This is indeed the case, and it is the goal of the present section to describe this argument.

Making the argument fully rigorous, however, would entail substantial technical complications which would detract from the focus of this note. In any case, a complete proof of the dimensional Brascamp-Lieb inequality is already present in [bolleygentilguillin2018brascamplieb]. Hence, we will work on a purely formal level and assume that everything is smooth, bounded, etc. Also, the computations are rather similar to the proof of 4 given in the previous section. Nevertheless, the argument seems interesting enough to warrant presenting it here.

The main difference with the preceding proof is that the Bochner formula (implicit in the commutation relation (8)) is replaced by the convexity principle.

###### Proof sketch of 4.

Throughout the proof, let be small. Let be bounded and satisfy , so that defines a valid probability density on . Our aim is to first strengthen the transportation inequality (10), at least infinitesimally, and then to linearize it.

Let be an optimal coupling for the Bregman divergence coupling cost . In [ahnchewi2021mirrorlangevin], the following facts were proven:

1. There is a function such that , and is convex.

2. The entropy functional (defined in (4)) is convex in the sense that

 \euH(με) ≥\euH(μ)+\E⟨[∇W2\euH(μ)](X),Xε−X⟩. (11)

Here, is the Wasserstein gradient of the entropy functional, c.f. [ambrosio2008gradient, villani2009ot, santambrogio2015ot].

Write . Since , the change of variables formula implies

 μ(x)με(Tε(x)) =μ(x)μ(Tε(x))(1+εh(Tε(x)))=det∇Tε(x). (12)

To linearize this equation, write and . Then, the definition of yields

 ∇V(x) =(∇V−∇uε)(x+εT(x)+o(ε)) =∇V(x)+ε∇2V(x)T(x)−ε∇u(x)+o(ε)

which implies

 Tε(x) =x+ε[∇2V(x)]−1∇u(x)+o(ε).

Taking logarithms and expanding to first order in ,

 lnμ(x)−lnμ(Tε(x))−ln(1+εh(Tε(x))) =−ε⟨∇lnμ(x),[∇2V(x)]−1∇u(x)⟩−εh(x)+o(ε) =ε⟨∇V(x),[∇2V(x)]−1∇u(x)⟩−εh(x)+o(ε)

and

 lndet∇Tε(x) =lndet∇(\id+ε[∇2V]−1∇u+o(ε))(x) =lndet(In+ε∇([∇2V]−1∇u)(x)+o(ε)) =ε\divergence([∇2V]−1∇u)(x)+o(ε).

To interpret this, we introduce a new generator, denoted to avoid confusion with the previous section, defined by

 ^\msLu :=\divergence([∇2V]−1∇u)−⟨∇V,[∇2V]−1∇u⟩.

This new generator satisfies the integration by parts formula

 \Eμ[u^\msLv] =\Eμ⟨∇u,[∇2V]−1∇v⟩.

In this notation, the preceding computations yield

 ^\msLu =−h+o(1).

Next, to strengthen (11), we repeat the proof. From (12),

 \euH(με) =∫μεlnμε=∫μln(με∘Tε)=∫μlnμdet∇Tε =\euH(μ)−∫μlndet∇Tε.

From the second-order expansion of around ,

 −∫μlndet∇Tε ≥−∫μlndetIn−∫μ⟨In,∇Tε−In⟩+12∫μ\norm∇Tε−In2HS+o(ε2) ≥−∫μ\tr(∇Tε−In)+12n(∫μ\tr(∇Tε−Id))2+o(ε2) =−∫μ\divergence(Tε−\id)+12n(∫μ\divergence(Tε−\id))2+o(ε2) =∫μ⟨∇lnμ,Tε−\id⟩+12n(∫μ⟨∇lnμ,Tε−\id⟩)2+o(ε2).

Recalling that , we have established

 \euH(με)−\euH(μ)−\E⟨[∇W2\euH(μ)](X),Xε−X⟩ ≥12n(∫μ⟨∇V,Tε−\id⟩)2+o(ε2) =ε22n{\Eμ[V^\msLu]}2+o(ε2).

The next step is to write down the strengthened transportation inequality. Indeed, if we add a suitable additive constant to so that , then

 \msfKL(με\mmidμ) =\EμεV+\euH(με) ≥\EV(X)+\euH(μ)=\msfKL(μ\mmidμ)=0+\E⟨[∇V+∇W2\euH(μ)](X),Xε−X⟩=[∇W2\msfKL(⋅\mmidμ)](μ)=0 +\E[V(Xε)−V(X)−⟨∇V(X),Xε−X⟩]=\euDV(με\mmidμ) +ε22n{\Eμ[hV]}2+o(ε2) ≥\euDV(με\mmidμ)+ε22n{\Eμ[hV]}2+o(ε2).

Finally, it remains to linearize the transportation inequality. On one hand, it is classical that

 \msfKL(με\mmidμ) =ε22\Eμ[h2]+o(ε2).

On the other hand, we can guess that

 \euDV(με\mmidμ) =12\E⟨Xε−X,∇2V(X)(Xε−X)⟩+o(ε2) =ε22\Eμ⟨∇u,(∇2V)−1∇u⟩+o(ε2) ≥ε22{\Eμ⟨∇g,(∇2V)−1∇u⟩}2\Eμ⟨∇g,(∇2V)−1∇g⟩+o(ε2) =ε22{\Eμ[g^\msLu]}2\Eμ⟨∇g,(∇2V)−1∇g⟩+o(ε2) =ε22{\Eμ[gh]}2\Eμ⟨∇g,(∇2V)−1∇g⟩+o(ε2).

A rigorous proof of this inequality is given as [cordero2017transport, Lemma 3.1].

Thus, we obtain

 12{\Eμ[gh]}2\Eμ⟨∇g,(∇2V)−1∇g⟩+12n{\Eμ[hV]}2 ≤12\Eμ[h2]+o(1).

Now we let and choose for some . Writing and , it yields

 (\varμg+a\mbC)2\Eμ⟨∇g,(∇2V)−1∇g⟩+1n(\mbC+a\mbV)2 ≤\varμg+2a\mbC+a2\mbV.

Actually, choosing to optimize this inequality and simplifying the resulting expression may be cumbersome, so with our foresight from the earlier proof of 4, we now take . After some algebra,

 (\varμg+\mbC2/(n−\mbV))2\Eμ⟨∇g,(∇2V)−1∇g⟩ ≤\varμg+\mbC2n−\mbV,

which of course yields

 \varμg ≤\Eμ⟨∇g,(∇2V)−1∇g⟩−\mbC2n−\mbV.\qed

## 4 A tensorization trick

We begin by verifying that the entropic barrier has the consistency property (2). Let denote the function (1), where we now explicitly denote the dependence on the convex body . Also, let denote the corresponding entropic barrier. Then, we see that

 fK×K′(θ,θ′) =ln∫K×K′exp(⟨θ,x⟩+⟨θ′,x′⟩)\Dx\Dx′ =ln∫Kexp⟨θ,x⟩\Dx+ln∫K′exp⟨θ′,x′⟩\Dx′=fK(θ)+fK′(θ′).

Hence,

 f⋆K×K′(x,x′) =supθ,θ′∈\Rn{⟨θ,x⟩+⟨θ′,x′⟩−fK(θ)−fK′(θ′)}=f⋆K(x)+f⋆K′(x′).

Finally, we check that the tensorization property automatically improves the bound on the self-concordance parameter of obtained in [bubeckeldan2019entropic].

###### Proof of 1.

Let . By assumption, the self-concordant barrier on satisfies . Also, we are given that

 ∇2ϕkn,Kk(\bsx) ⪰1ν(kn)∇ϕkn,Kk(\bsx)∇ϕkn,Kk(\bsx)\T. (13)

Via elementary calculations,

 ∇ϕkn,Kk(\bsx) =(∇ϕn,K(x1),…,∇ϕn,K(xk))

and

 ∇2ϕkn,Kk(\bsx) =⎡⎢ ⎢ ⎢⎣∇2ϕn,K(x1)⋱∇2ϕn,K(xk)⎤⎥ ⎥ ⎥⎦.

Let and let . Also, take . By (13), we know that

 k⟨v,∇2ϕn,K(x)v⟩ =⟨\bsv,∇2ϕnk,Kk(\bsx)\bsv⟩≥1ν(kn)⟨\bsv,∇ϕkn,Kk(\bsx)⟩2 =k2ν(kn)⟨v,∇ϕn,K(x)⟩2

which proves

 ∇2ϕn,K(x) ⪰kν(kn)∇ϕn,K(x)∇ϕn,K(x)\T

and gives the claim. ∎

###### Proof of 2.

According to 1, we know that the entropic barrier in dimensions is -self-concordant, with as . By 1, it is actually -self-concordant, for any . Let to deduce that it is in fact -self-concordant. ∎