DeepAI

# On the Existence of the Augustin Mean

The existence of a unique Augustin mean and its invariance under the Augustin operator are established for arbitrary input distributions with finite Augustin information for channels with countably generated output σ-algebras. The existence is established by representing the conditional Rényi divergence as a lower semicontinuous and convex functional in an appropriately chosen uniformly convex space and then invoking the Banach–Saks property in conjunction with the lower semicontinuity and the convexity. A new family of operators is proposed to establish the invariance of the Augustin mean under the Augustin operator for orders greater than one. Some members of this new family strictly decrease the conditional Rényi divergence, when applied to the second argument of the divergence, unless the second argument is a fixed point of the Augustin operator.

• 17 publications
• 6 publications
03/21/2018

### The Augustin Capacity and Center

The existence of a unique Augustin mean is established for any positive ...
10/01/2019

### Monotonically Decreasing Sequence of Divergences

Divergences are quantities that measure discrepancy between two probabil...
01/13/2020

### Fixed Points of the Set-Based Bellman Operator

Motivated by uncertain parameters encountered in Markov decision process...
04/28/2021

### Fortin Operator for the Taylor-Hood Element

We design a Fortin operator for the lowest-order Taylor-Hood element in ...
08/07/2021

### Learning to Transfer with von Neumann Conditional Divergence

The similarity of feature representations plays a pivotal role in the su...
06/14/2018

### A Sauer-Shelah-Perles Lemma for Sumsets

We show that any family of subsets A⊆ 2^[n] satisfies A≤ O(n^d/2), wher...

## I Introduction

In sixties and seventies, Shannon’s fundamental result has been strengthened for memoryless channels in terms of three exponent functions:

1. For codes operating at rates below the Shannon capacity, the exponential decay rate of the error probability with the block length is bounded from below by the random coding exponent

[1, 2, 3, 4, 5] and from above by the sphere packing exponent [6, 7, 4, 5].

2. For codes operating at rates above the Shannon capacity, the exponential rate that the correct transmission (decoding) probability vanishes with the block length is equal to the strong converse exponent, [8, 9, 10].

These exponent functions have been characterized in terms of Gallager’s functions [11], auxiliary channels [12, 13], and Augustin information measures [5]. To obtain the right exponent functions for cost constrained codes in terms of Gallager’s functions, one has to apply the Lagrange multipliers method in a somewhat non-standard way described in [1, 2, 3]. The corresponding modification works for convex composition constraints, as well; see [5, 14]. This non-standard application of the Lagrange multipliers method to Gallager’s function has recently been shown to be equivalent to the standard application of the Lagrange multipliers method to the Augustin information measures in [15, §5]. However, the Lagrange multipliers method is unnecessary to express the exponent functions in terms of Augustin information measures, either for composition constrained codes or for cost constrained codes. The right exponent functions are obtained by imposing the same constraints to the domain of the supremum defining Augustin capacity in terms of Augustin information [5, 16, 17, 18, 19, 20, 21, 15, 22, 23, 24, 25]. Such characterizations permit relatively simple derivations of tight polynomial prefactors under certain symmetry hypothesis [23, 24].

Both the Augustin information and the Rényi information (i.e. a scaled and reparametrized version of Gallager’s function [26]), can be seen as generalizations of the mutual information. However, unlike the mutual information and the Rényi information, the Augustin information does not have a closed form expression. The order Augustin information for the input distribution is defined as

 Iα(p;W) (1)

where is the set of all probability measures on the output space. For the case when the output set is a finite set (e.g. when is a discrete memoryless channel as in [17, 27]), the compactness of , the lower semicontinuity of Rényi divergence in its second argument [28, Thm 15], and the extreme value theorem imply the existence of an order Augustin mean satisfying

 Iα(p;W) =Dα(W∥qα,p∣∣p). (2)

The Augustin mean is unique because of the strict convexity of the Rényi divergence in its second argument described in [28, Thm 12]. Other properties of the Augustin mean and information established in [5, 15] can be derived independently, once the existence of a unique Augustin mean is established.

For channels whose output space is an arbitrary measurable space , we no longer have the compactness of and establishing the existence of the Augustin mean becomes a more delicate issue. It has been established for the case when is a probability mass function with a finite support set for arbitrary channels in [5, 15]

. In addition, the closed form expression for the Augustin mean has been derived for certain special cases: for Gaussian input distributions on scalar or vector Gaussian channels in

[15] and for Augustin capacity achieving input distribution on additive exponential noise channels with a mean constraint in [25]. But a general existence result for the Augustin mean has not been proved yet; see Remark 4 of §IV for a discussion regarding [25].

In this paper, we prove, under finite Augustin information hypothesis, the existence of a unique Augustin mean, its invariance under the Augustin operator, and its equivalence to the defined in (31), which is absolute continuous in the output distribution generated by the input distribution . Our presentation will be as follows: In §II, we introduce our model and notation and prove that the infimum defining the Augustin information in (1) can be taken over the probability measures that are absolutely continuous in , rather than the whole . In §III, we first use Radon–Nikodym theorem to express this optimization in for some , with the help of a functional corresponding to the conditional Rényi divergence. Then we show that this functional inherits the convexity and the norm lower semicontinuity from the conditional Rényi divergence and use them together with the Banach–Saks property to establish the existence of a unique Augustin mean. In §IV, we propose a new family of operators related to the Augustin operator, establish a new monotonicity property for the conditional Rényi divergence, see Lemma 6, and use it to establish the invariance of the Augustin mean under the Augustin operator. In §V, we briefly discuss the novelty of our approach in comparison to the previous analysis methods, as we see it.

## Ii Preliminaries

For any measurable space , we denote the set of all probability measures on by . With a slight abuse of notation we denote the set of all probability measures that are absolutely continuous with respect to a finite measure by . For finite measures, we use instead of . We use for the total variation norm and corresponding metric.

###### Definition 1.

For any , , and the order Rényi divergence between and is

 Dα(w∥q)

where is any measure satisfying and .

If , then is positive unless by [28, Thm. 8] and the following Pinsker’s inequality holds by [28, Thms. 3 and 31],

 Dα(w∥q) ≥1∧α2∥w−q∥2 (3)

We denote the set of all transition probabilities111See [26, Definition 9], [29, Definition 10.7.1] for the formal definition. from to by and model the channel as a transition probability in . Thus [29, Thm. 10.7.2]

ensures the existence of a joint distribution

on for any input distribution in . We call the -marginal of the output distribution induced by and denote it by .

 qp(E) ∀E∈Y. (4)

Applying [29, Thm. 10.7.2] for we get

 qp(E) =∫XW(E|x)p(dx) ∀E∈Y. (5)

With a slight abuse of notation, for a and , we denote the probability measure by , whenever it is possible to do so without any ambiguity.

###### Definition 2.

For any , countably generated -algebra of subsets of , , , and the order conditional Rényi divergence for the input distribution is

 Dα(W∥q|p) ≜ ∫Dα(W(x)∥q)p(dx). (6)

We assume to be countably generated, so as to ensure the -measurablity of the integrand in (6) by222[15, Lemma 37] establishes -measurability for and case, but a similar proof works for and case. [15, Lemma 37].

For case, one can confirm by substitution that the conditional Rényi divergence can be expressed in terms of the joint distribution induced by as follows

 D1(W∥q|p) ∀q∈P(Y), (7)

where is the product measure. Furthermore, (5) and (7) can be used to confirm by substitution that

 D1(W∥q|p) ∀q∈P(Y). (8)
###### Definition 3.

For any , countably generated -algebra , , and the order Augustin information for the input distribution is given by (1).

For case, (8) provides us a closed form expression of the Augustin information by (3): . For other orders, however, a general closed form expression does not exist either for the Augustin information or for the probability measure that achieves the infimum given in (1), called the Augustin mean. Nevertheless , can be used to restrict the domain of the optimization problem defining Augustin information as follows.

###### Lemma 1.

For any , countably generated -algebra , , and ,

 Iα(p;W) (9)
###### Proof.

Any can be written as the sum of absolutely continuous and singular components with respect to by the Lebesgue decomposition theorem [29, Thm. 3.2.3], i.e. there exist and such that Hence, there exists an satisfying and because . Then -a.s. by (5) and consequently

 Dα(W(x)∥q) =Dα(W(x)∥q∼) p-a.s.

Thus for all satisfying and

 Dα(W∥q|p) =Dα(W∥q∼∥q∼∥∣∣p)−ln∥q∼∥ (10)

for all satisfying . Then we can replace with in (1), without changing the value of the infimum because and . ∎

## Iii Existence of a Unique Augustin Mean

The uniform convexity333Usually, rather than is used to name the norm and the associated Banach space. We deviate from the convention to reserve the symbol for the input distributions. of for , plays a central role in our proof of the existence of a unique Augustin mean for input distributions with finite Augustin information. Let us first recall the definition of the -norm. For any and -measurable function , the -norm of is

 ∥f∥τ ≜ (∫|f(y)|τqp(dy))\sfrac1τ. (11)

The set of all finite -norm functions form a complete normed vector space, i.e. Banach space, under the pointwise addition and the scalar multiplication by [29, Thm. 4.1.3]

 Lτ(qp) ≜ {f:∥f∥τ<∞}. (12)

As a result of Radon–Nikdoym theorem [29, Thm. 3.2.2], we know that elements of can be represented via their Radon–Nikodym derivatives with respect to , which will be non-negative functions of unit norm in . By taking pointwise root of these Radon–Nikodym derivatives, we can obtain analogous representations in for any positive . Motivated by these observations we define the following subsets of :

 Bτ(qp) ≜ {f∈Lτ(qp):f(y)≥0 qp-a.s.}, (13) Bτ1(qp) ≜ {f∈Bτ(qp):∥f∥τ=1}, (14) Bτ≤1(qp) ≜ {f∈Bτ(qp):∥f∥τ≤1}. (15)

Let be the function defined through the following relation

 ωτ(f)(E) ≜ ∫E[f(y)]τqp(dy) ∀f∈Bτ(qp),E∈Y. (16)

Using the conditional Rényi divergence and , we can define the functional on , which inherits the convexity and norm lower semicontinuity from the Rényi divergence by the linearity and continuity of . Lemmas 2 and 3 demonstrate that for an appropriately chosen , the functional on inherits the convexity and norm lower semicontinuity, as well. These observations are important because, unlike , is uniformly convex for any , and thus it has the Banach–Saks property.

###### Definition 4.

Let be

 Dα(f) ≜ Dα(W∥ωτα(f)|p) (17)

for all and , where

 τα ≜ {2α∈[0.5,∞]11−αα∈(0,0.5). (18)
###### Lemma 2.

For all , functional , defined in (17), is convex on .

###### Lemma 3.

For all , functional , defined in (17), is norm lower semicontinuous on .

Proofs of Lemmas 2 and 3 are presented in Appendix -A and Appendix -B.

###### Lemma 4.

For all , there exists an satisfying and

 Dα(fα) =Iα(p;W). (19)
###### Proof.

Note that for all and by (16). Thus

 =Dα(f)+ln∥f∥τατα. (20)

for all by (17). Consequently,

 =inff∈Bτα1(qp)Dα(f).

Hence the definition of , the Radon–Nikdoym theorem [29, Thm 3.2.2], and Lemma 1 imply

 =Iα(p;W). (21)

Thus there exists a sequence satisfying444For example let be such that .

 Dα(fn) ↓Iα(p;W), (22) ∑n∈Z+[Dα(fn)−Iα(p;W)] <∞. (23)

has the Banach–Saks property for by [29, Cor. 4.7.17], because it is uniformly convex by [29, Thm. 4.7.15]. Thus for the norm bounded sequence , there exist a subsequence and an such that

 limk→∞∥∥∥fn1+⋯+fnkk−fα∥∥∥τα =0. (24)

Furthermore, because is closed and for all by the non-negativity of ’s and the triangle inequality of .

The norm lower semicontinuity of established in Lemma 3, , and (24) imply

 Dα(fα) ≤liminfk→∞Dα(fn1+⋯+fnkk). (25)

On the other hand, the convexity of established in Lemma 2 implies

 ≤Dα(fn1)+⋯+Dα(fnk)k. (26)

by (22), (23), (25) and (26). Hence, (19) follows from (21) and the fact that . Furthermore, as a result of (19), (20), and (21). ∎

For finite orders, Lemma 5, expresses Lemma 4 in terms of probability measures and strengthens it with uniqueness assertion for the finite Augustin information case.

###### Lemma 5.

For any , channel with a countably generated output -algebra , and input distribution with a finite order Augustin information, there exists a unique satisfying

 Iα(p;W)=Dα(W∥qα,p∣∣p), (27)

called the order Augustin mean for the input distribution . Furthermore, is absolutely continuous in , i.e. .

Proof of Lemma 5 is presented in the Appendix -C.

## Iv Fixed Point Properties of Augustin Mean

The existence of a unique Augustin mean and its absolute continuity in are important observations. But they do not provide an easy way to decide whether for a or not. For input distributions that are probability mass functions with finite support set, this issue was addressed by characterizing as the only fixed point of the Augustin operator that is equivalent to , see555This is the case even for certain quantum models [30, Proposition 4]. [5, Lemma 34.2], [15, Lemma 13]. Our main goal in this section is to establish an analogous characterization of the Augustin mean for a general input distribution merely by assuming that is finite, see Lemma 7. Let , , and be

 Qα,p ≜ {q∈P(Y):Dα(W∥q|p)<∞}, Xqα,p ≜ {x:Dα(W(x)∥q)<∞}, Xqα,p ≜ {E∩Xqα,p:E∈X}.
###### Definition 5.

For any , countably generated -algebra of subsets of , , , and

 dWqα(x)dν ≜ e(1−α)Dα(W(x)∥q)(dW(x)dν)α(dqdν)1−α. (28)

Then defines a transition probability called the order tilted channel .

###### Remark 1.

If , then . Hence, for input distributions that are absolutely continuous in , the fact that is an element of rather than is inconsequential.

###### Definition 6.

Under the hypothesis of Lemma 5, the Augustin operator is defined as

 Tα,p(q)(E) ≜ Ep[Wqα(E|X)] ∀E∈Y, q∈Qα,p. (29)

Furthermore, for any satisfying , the tilted Augustin operator is defined as

 dTβα,p(q)dν ≜ e(1−β)Dβ(Tα,p(q)∥∥q)(dTα,p(q)dν)β(dqdν)1−β. (30)

The Augustin operator has been used before either implicitly [31, 7, 16] or explicitly [5, 15, 30, 25]. However, to the best of our knowledge, the tilted Augustin operator is first defined and analyzed in the present work.

###### Lemma 6.

Under the hypothesis of Lemma 5, if either and , or and , then for any we have

 Dα(W∥q|p)−Dα(W∥Tβα,p(q)∣∣p) ≥βD1−β|α−1|+(Tα,p(q)∥q)+(1−β)Dβ(Tα,p(q)∥q) ≥β(2−β(α∨1))2∥Tα,p(q)−q∥2.

A particular case of Lemma 6 for and was proved in [5, p. 236] and [15, (B.4)], and was used to show that the Augustin mean is a fixed point of the Augustin operator666Although we will not rely on it, it is worth mentioning that holds either for all positive real ’s or for none. in [5, Lemma 34.2] and [15, Lemma 13 (c)] for . Lemma 6 allows us to invoke this simpler argument for establishing the fixed point property for case.

###### Proof.
 Dα(W∥q|p)−Dα(W∥Tβα,p(q)∣∣p) =11−αEp[ln∫(dTβα,p(q)dq)1−αWqα(dy|X)] (a)≥⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩11−αEp[∫ln(dTβα,p(q)dq)1−αWqα(dy|X)]if~{}α<111−αlnEp[ ∫(dTβα,p(q)dq)1−αWqα(dy|X)]if~{}α>1 (b)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩∫dTα,p(q)dqln(dTβα,p(q)dq)q(dy)if~{}α<111−α