# On the Conditional Smooth Renyi Entropy and its Applications in Guessing and Source Coding

A novel definition of the conditional smooth Renyi entropy, which is different from that of Renner and Wolf, is introduced. It is shown that our definition of the conditional smooth Renyi entropy is appropriate to give lower and upper bounds on the optimal guessing moment in a guessing problem where the guesser is allowed to stop guessing and declare an error. Further a general formula for the optimal guessing exponent is given. In particular, a single-letterized formula for mixture of i.i.d. sources is obtained. Another application in the problem of source coding with the common side-information available at the encoder and decoder is also demonstrated.

## Authors

• 2 publications
• ### Asymptotic Expansions of Smooth Rényi Entropies and Their Applications

This study considers the unconditional smooth Rényi entropy, the smooth ...
03/11/2020 ∙ by Yuta Sakai, et al. ∙ 0

• ### Entropy of the Conditional Expectation under Gaussian Noise

This paper considers an additive Gaussian noise channel with arbitrarily...
06/08/2021 ∙ by Arda Atalik, et al. ∙ 0

• ### Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures

This paper provides upper and lower bounds on the optimal guessing momen...
01/04/2018 ∙ by Igal Sason, et al. ∙ 0

• ### Rényi Bounds on Information Combining

Bounds on information combining are entropic inequalities that determine...
04/29/2020 ∙ by Christoph Hirche, et al. ∙ 0

• ### Sharp Second-Order Pointwise Asymptotics for Lossless Compression with Side Information

The problem of determining the best achievable performance of arbitrary ...
05/21/2020 ∙ by Lampros Gavalakis, et al. ∙ 0

• ### Fundamental Limits of Lossless Data Compression with Side Information

The problem of lossless data compression with side information available...
12/12/2019 ∙ by Lampros Gavalakis, et al. ∙ 0

• ### Generic Variance Bounds on Estimation and Prediction Errors in Time Series Analysis: An Entropy Perspective

In this paper, we obtain generic bounds on the variances of estimation a...
04/09/2019 ∙ by Song Fang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Let us consider the problem of guessing the value of a random variable

by asking question of the form “Is equal to ?”. This guessing game was introduced by Massey [1], where the average of the number of guesses required when was investigated. Subsequently Arikan [2] gave a tight bound on the guessing moment for . He also investigated the guessing problem of with the side-information . The result of Arikan [2] shows that the Rényi entropy [3] (resp. the conditional Rényi entropy [4]) plays an important role to give upper and lower bounds on (resp. ).

In this paper, we consider a variation of the problem of guessing with the side-information

such that the guesser is allowed to stop guessing and declare an error. We evaluate the expected value of the cost of guessing under the constraint on the probability of the error. To do this, we introduce the

conditional smooth Rényi entropy.

The concept of the “smoothed” version of the Rényi entropy was introduced by Renner and Wolf [5, 6]. They defined the conditional -smooth Rényi entropy of order , and showed the significance of two special cases of in coding problems; Roughly speaking, they showed (i) characterizes the minimum codeword length in the source coding problem of with the side-information available at the decoder under the constraint that the probability of decoding error is at most , and (ii) characterizes the amount of uniform randomness that can be extracted from .

Seeing the results of Arikan[2] and Renner and Wolf [5, 6], it is natural to expect that we can use to characterize the cost of guessing arrowing error. However, the definition of is not appropriate to be used in the analysis of the guessing problem. In this paper, we introduce another “smoothed” version of the conditional Rényi entropy. Then, by using , we give lower and upper bounds on the minimum cost of guessing the value of with the side-information under the constraint that the guessing error probability is at most . Further, we demonstrate another application of in the source coding problem. Our contributions are summarized as follows.

### I-a Contributions

First we introduce a novel definition of the conditional -smooth Rényi entropy of order , and then investigate its properties. In our definition, similar to that of Renner and Wolf, the minimization over the set of non-negative functions satisfying a particular condition is involved. Our first contribution, Theorem 1, characterizes the non-negative function attaining the minimum in the definition of for . This characterization is useful in the proof of our theorems in guessing mentioned below. Further, we investigate the asymptotic behavior of by using the information spectrum method [7]. Particularly, in Theorem 2, we show that the asymptotic value of the conditional smooth Rényi entropy for the mixture of i.i.d. sources is determined by the conditional entropy of a component of the mixture. This result allows as to give a singe-letterized formula in guessing and source coding mentioned below.

Next we investigate the problem of “guessing allowing error”, i.e., the problem of guessing with the side-information where the guesser can stochastically choose (i) to give up and declare an error or (ii) to continue guessing at each step of guessing. The cost of guessing is evaluated in the same way as Arikan [2]; the cost is for some if the value is correctly guessed at the -th step. We consider the minimization of the expected value of the guessing cost under the the constraint that the error probability is at most . Our results, Theorems 3 and 4, give lower and upper bounds on the minimum cost by using . Further, a general formula for the exponent of the optimal guessing cost is derived; see Theorem 5. 111By general formula, we mean that we consider sequences of guessing problems and do not place any underlying structure such as stationarity, memorylessness and ergodicity on the source [7, 8]. In particular, a single-letterized formula is given for mixture of i.i.d. sources.

The last contribution of this paper is to show the significance of our conditional smooth Rényi entropy in the problem of source coding. We consider the variable-length lossless coding problem of the source with the common side-information available at the encoder and decoder. We allow the decoder to make a decoding error with probability at most . Then, we evaluate the exponential moment of the codeword length. In a similar manner as in the guessing problem, our results show that can be used to characterize the minimum value of ; Theorems 6 and 7 give lower and upper bounds on the minimum value of by using , and then Theorem 8 gives a general formula for the exponent of the minimum value of .

### I-B Related Work

As mentioned above, the concept of smooth Rényi entropy was first introduced by Renner and Wolf [5, 6]. Properties of the smooth Rényi entropy is investigated by Koga [9] by using majorization theory. As shown in Corollary 1, one of results in [9] can be obtained as a corollary of our Theorem 1.

It is known that two special cases, and , of the smooth Rényi entropies have clear operational meaning respectively in the fixed-length source coding [5, 6, 10] and the intrinsic randomness problem [5, 6, 11]. Similarly, the smooth Rényi divergence also finds applications in several coding problems; see, e.g., [12, 13, 14]. To the author’s best knowledge, this paper first gives clear operational meaning of the conditional smooth Rényi entropy of order in guessing and source coding.

As mentioned above, Arikan [2] showed the significance of the Rényi entropy in the problem of guessing. Recently, tighter bounds on guessing moments was given by Sason and Verdú [15], where the Rényi entropy is also used. The guessing problem has been studied in various contexts such as the problem of guessing subject to distortion [16, 17], and investigation of a large deviation perspective of guessing [18, 19], etc.; see, e.g., [15] and references therein.

Campbell [20] proposed the exponential moment of the codeword length as an alternative to the average codeword length as a criterion for variable-length lossless source coding, and gave upper and lower bounds on the exponential moment in terms of the Rényi entropy. 222It should be mentioned here that a general problem for the optimization of the exponential moment of a given cost function was investigated by Merhav [21, 22]. On the other hand, the problem of variable-length source coding allowing errors was investigated under the criterion of the average codeword length by Koga and Yamamoto [23] and Kostina et al. [24, 25]. In [26], the author gave a generalization of Campbell’s result to the case where the decoding error is allowed. Recently, a similar result without the prefix condition of codewords was given by Sason and Verdú [15]. Our results in Section IV can be seen as a generalization of [26], since the result of [26] is obtained by letting in our results.

In this paper, two problems of guessing and source coding are investigated. The relations between the limiting guessing exponent and the limiting exponent of the moment generating function of codeword lengths in source coding was pointed out by Arikan and Merhav

[16]; see also [18]. As expected, Theorems 5 and 8 below reveal the equivalence between the optimal asymptotic exponent of guessing cost and that of the exponential moment of codeword lengths in source coding.

### I-C Paper Organization

The rest of the paper is organized as follows. In Section II, the conditional -smooth Rényi entropy of order is defined, and its properties are investigated. The problems of guessing and source coding are investigated in Section III and Section IV respectively. Concluding remarks and directions for future work are provided in Section V. To ensure that the main ideas are seamlessly communicated in the main text, we relegate all proofs to the appendices.

## Ii Conditional Smooth Rényi Entropy

Let and be finite or countably infinite sets. For

on , let be the set of non-negative functions with domain such that , for all , and . Then, for , the conditional -smooth Rényi entropy of order is defined as333Throughout this paper, log denotes the natural logarithm.

 Hεα(X|Y)≜α1−αlogrεα(X|Y) (1)

where

 rεα(X|Y)≜infQ∈Bε(PXY)∑y∈Y[∑x∈X[Q(x,y)]α]1/α. (2)

In the following, we assume that .444As seen below, the conditional -smooth Rényi entropy of order plays an important role in guessing and source coding. Hence, can be rewritten as

 Hεα(X|Y) =infQ∈Bε(PXY)α1−αlog∑y∈Y[∑x∈X[Q(x,y)]α]1/α (3) =infQ∈Bε(PXY)α1−αlog∑y∈YPY(y)[∑x∈X[Q(x,y)PY(y)]α]1/α. (4)

In the case of , is equivalent to the conditional Rényi entropy of order :555It is introduced by Arimoto [4].

 H0α(X|Y) =α1−αlog∑y∈Y[∑x∈X[PXY(x,y)]α]1/α (5)

In the case of , is equivalent to the -smooth Rényi entropy of order , which is defined as

 Hεα(X) ≜infQ∈Bε(PX)α1−αlog[∑x∈X[Q(x)]α]1/α (6) =infQ∈Bε(PX)11−αlog∑x∈X[Q(x)]α (7)

where is the set of non-negative functions with domain such that , for all , and .

Here, it should be emphasized that our definition (1) of is slightly different from that of Renner and Wolf [6]. In [6] the conditional smooth Rényi entropy is defined as

 ~Hεα(X|Y)≜11−αlog~rεα(X|Y) (8)

where

 (9)

To see the difference, rewrite for as

 ~Hεα(X|Y) =infQ∈Bε(PXY)11−αlogmaxy∈Y:PY(y)>0∑x∈X[Q(x,y)PY(y)]α (10) =infQ∈Bε(PXY)α1−αlogmaxy∈Y:PY(y)>0[∑x∈X[Q(x,y)PY(y)]α]1/α. (11)

Comparing (11) with (4), we can see that the average of is taken in , while the maximum is taken in . Hence it is apparent that

 Hεα(X|Y)≤~Hεα(X|Y) (12)

and the equality does not hold in general.

###### Remark 1.

The author thinks that the availability of the side-information makes the difference between and . In the problem of guessing considered in Section III, the guesser can change the strategy according to the given . Similarly, in the problem of source coding with common side-information considered in Section IV, the encoder can choose the encoding function according to the given . Hence, in these problems, “the average with respect to ” has the significance in the coding theorems. On the other hand, in the problems considered in [6], the encoder (or the extractor) cannot access to and have to prepare for the worst case. Hence, “the maximum with respect to ” has the significance in [6].

Now, we show several properties of . First we investigate attaining the in the definition of . To do this, we introduce some notations. For each and , let be the -th probable given ; i.e., is defined so that

 PX|Y(x1y|y)≥PX|Y(x2y|y)≥PX|Y(x3y|y)≥⋯. (13)

Then, for each and a given satisfying , let be the minimum integer such that

 i∗y∑i=1PX|Y(xiy|y)≥1−εy (14)

and let

 Q∗εy(xiy|y)≜⎧⎪ ⎪⎨⎪ ⎪⎩PX|Y(xiy|y),i=1,2,…,i∗y−1,1−εy−∑i∗−1i=1PX|Y(xiy|y),i=i∗y,0,i>i∗y. (15)

For , let and for all .

###### Theorem 1.

By using notations introduced above, we have

 Hεα(X|Y) =inf(εy)y∈Y∈E0(ε)α1−αlog∑y∈YPY(y)⎡⎣i∗y∑i=1[Q∗εy(xiy|y)]α⎤⎦1/α (16)

where is the set of satisfying , for all , and .

Theorem 1 will be proved in Appendix A. As a corollary, we have a known property of , which is proved in (A) of Theorem 1 of [9].

###### Corollary 1.

Assume that are sorted so that and let be the minimum integer such that . Then

 Q∗ε(xi)≜⎧⎪⎨⎪⎩PX(xi),i=1,2,…,i∗−1,1−ε−∑i∗−1i=1PX|Y(xi),i=i∗,0,i>i∗ (17)

attains the inifimum in the definition of ; i.e.,

 Hεα(X) =11−αlogi∗∑i=1[Q∗ε(xi)]α. (18)

Next, we investigate the asymptotic behavior of the conditional -smooth Rényi entropy by using the information spectrum method [7]. Let us consider a pair of correlated general sources , which is a sequence of pairs of correlated random variables on the -th Cartesian product of and on

. The joint distribution of

is denoted by , which is not required to satisfy the consistency condition. Given , , and , let us define as

 Hεα(X|Y)≜limδ↓0limsupn→∞1nHε+δα(Xn|Yn). (19)

As shown in the following sections, this quantity plays an important role in the general formulas of guessing and source coding.

Here it is worth to note that is non-negative for all and . Indeed, we can prove the stronger fact that

 liminfn→∞1nHεα(Xn|Yn)≥0,α∈(0,1),ε∈[0,1). (20)

We will prove (20) in Appendix B.

To give a single-letterized form of , we consider a mixture of i.i.d. sources. Let us consider distributions () on . A general source is said to be a mixture of if there exists satisfying (i) , (ii) (), and (ii) for all , all and ,

 PXnYn(xn,yn) =m∑i=1αiPXniYni(xn,yn) (21) =m∑i=1αin∏t=1PXiYi(xt,yt). (22)

For the later use, let () and . Further, to simplify the analysis, we assume that

 H(X1|Y1)>H(X2|Y2)>⋯>H(Xm|Ym) (23)

where is the conditional entropy determined by :

 H(Xi|Yi)≜∑x∈XPXiYi(x,y)log1PXi|Yi(x|y). (24)

Then, of the mixture is characterized as in the following theorem.

###### Theorem 2.

Let be a mixture of i.i.d. sources satisfying (23). Fix , , and . Then, we have

 Hεα(X|X)=H(Xi|Yi). (25)

Theorem 2 will be proved in Appendix C.

###### Remark 2.

Although Theorem 2 assumes that components are i.i.d., this assumption is not crucial. Indeed, the property of i.i.d. sources used in the proof of the theorem is only that the AEP [27] holds, i.e.,

 limn→∞Pr{∣∣ ∣∣1nlog1PXni(Xni|Yni)−H(Xi|Yi)∣∣ ∣∣>ζ}=0 (26)

for all and any . Hence, it is straightforward to extend the theorem so that it can be applied for the mixture of stationary and ergodic sources.

## Iii Guessing

In this section, we assume that the alphabet is finite; we assume that and .666We need the finiteness of only to make the proof of converse theorem simple.

A guessing strategy for given is defined by a collection of pairs , for each , of (i) a permutation on

and (ii) a vector

satisfying for all . Given the side information , the “guesser” corresponding to the strategy guesses the value of as the following manner. At the th step (), the guesser determines whether to “give up” or not; the guesser gives up and stops guessing with probability and the error of guessing is declared. If the guesser does not give up then the value satisfying is chosen as the “guessed value”. The guessing will be continued until when the guesser gives up or when the value of is correctly guessed (i.e., at the th step). It should be noted here that the guessing function studied in [2] corresponds to the guessing strategy such that for all and .

In this paper, we evaluate the “cost” of guessing as follows. If the guessing is stopped before the value of is correctly guessed then a constant cost is given as “penalty”. Otherwise, the cost of guessing is given by when the value of is correctly guessed at the th step, where is a constant. For each , let

 λy(i)≜i∏j=1(1−πy(j)),i=1,2,…,K. (27)

Then we can see that, given , the conditional probability of the event “the value of is correctly guessed at the -th step before give up” is

 λy(i)PX|Y(σ−1y(i)|y) (28)

and thus, the conditional probability of the event “the guesser gives up before guessing the the value of correctly” is

 1−K∑i=1λy(i)PX|Y(σ−1y(i)|y) =1−∑x∈Xλy(σy(x))PX|Y(x|y). (29)

Hence, the error probability , i.e., the average probability such that the guessing is stopped before the value of is correctly guessed, is given by

 pe =∑y∈YPY(y){1−K∑i=1λy(i)PX|Y(σ−1y(i)|y)} (30) =∑y∈YPY(y){1−∑x∈Xλy(σy(i))PX|Y(x|y)} (31) =1−∑(x,y)∈X×Yλy(σy(x))PXY(x,y), (32)

and the expected value of the cost is given by

 ¯C′ρ =∑y∈YPY(y){K∑i=1λy(i)PX|Y(σ−1y(i)|y)iρ}+pece. (33)

For some applications, it may be natural to simply minimize the cost . However, for some applications, it is not easy to evaluate the precise value of the penalty for stopping the guessing.777Although we assume that is a constant, it may depend on the true value of , the number of guesses before giving up, etc; It heavily depends on the application. Hence we leave the general cost case as a future work, and concentrate on the first term of (33). In such a situation, we may consider the cost of guessing and the penalty separately, and minimize

 ¯Cρ(G|X,Y) ≜∑y∈YPY(y){K∑i=1λy(i)PX|Y(σ−1y(i)|y)iρ} (34) =∑y∈YPY(y){∑x∈Xλy(σy(x))PX|Y(x|y)σy(x)ρ} (35) =∑(x,y)∈X×Yλy(σy(x))PXY(x,y)σy(x)ρ (36)

under the constraint on the probability of stopping the guessing. Further, if the minimum value of under the constraint that is known, it is not hard to optimize ; the optimal value can be written as . So, we study the minimizing problem of under the constraint that for a given constant ; the results are summarized in the following theorems:

###### Theorem 3.

Fix and . For any guessing strategy satisfying , the expected value of the cost must satisfy

 ¯Cρ≥(1+logK)−ρexp{ρHε11+ρ(X|Y)}. (37)
###### Theorem 4.

Fix and . There exists a guessing strategy such that the error probability satisfies and the expected value of the cost satisfies

 ¯Cρ≤exp{ρHε11+ρ(X|Y)}. (38)

Theorems 3 and 4 will be proved in Appendix D.

###### Remark 3.

The proof of Theorem 4 reveals the fact that “the optimal guesser throws a dice at most once”; i.e., for all , of the optimal strategy satisfies

 πy(i)={0:ii∗y (39)

and for some (). In other words, the optimal guesser makes guesses times (or until the value of is correctly guessed), and then, moves on the th guessing with the probability .

Now, let us consider the asymptotic behavior of the cost of guessing. Particularly we investigate the asymptotic behavior of the exponent of the cost. So, we define the achievability of the exponential value as follows.

###### Definition 1.

Given a constant and a general source , a value is said to be -achievable if there exists a sequence of strategies satisfying

 limsupn→∞pe(Gn|Xn,Yn)≤ε (40)

and

 limsupn→∞1nlog¯Cρ(Gn|Xn,Yn)≤Eg. (41)

The infimum of -achievable values is denoted by .

Then we have the following theorem, where defined in (19) characterized the optimal asymptotic exponent of the guessing cost.

###### Theorem 5.

For any and ,

 Eg(ρ,ε|X,Y)=ρHε1/(1+ρ)(X|Y). (42)

The theorem will be proved in Appendix D-C. Combining Theorem 5 with Theorem 2, we can obtain the single-letterized characterization of for a mixed source .

###### Corollary 2.

Let be a mixture of i.i.d. sources satisfying (23). Then, for any and ,

 Eg(ρ,ε|X,Y)=ρH(Xi|Yi) (43)

where is determined so that .

## Iv Source Coding

A variable-length source code for with the common-side-information is determined a collection of triplets , for each , of (i) a set of finite-length binary strings, (ii) an stochastic encoder mapping , and (iii) a decoder mapping . Without loss of generality, we assume that . Further, we assume that satisfies the prefix condition for each .

Note that we allow the encoder mapping to be stochastic. Let be the probability that is encoded in by . Then, the error probability of the code is defined as

 pe ≜∑y∈YPY(y)Pr{X≠ψy(φ(X))} (44) =∑y∈YPY(y)⎧⎨⎩∑x∈XPX|Y(x|y)∑c:x≠ψy(c)Wφy(c|x)⎫⎬⎭ (45) =∑(x,y)∈X×PXY(x,y)∑c:x≠ψy(c)Wφy(c|x). (46)

The length of the codeword of (in bits) is denoted by . Let be the length function (in nats):

 ℓ(x|y)≜∥∥φy(x)∥∥log2. (47)

In this study, we focus on the exponential moment of the length function. For a given , let us define the exponential moment of the length function as

 Mρ ≜EPXY[exp{ρℓ(X|Y)}] (48) =∑y∈YPY(y)⎧⎨⎩∑x∈XPX|Y(x|y)∑c∈CyWφy(c|x)exp{ρ∥c∥log2}⎫⎬⎭ (49) =∑(x,y)∈X×YPXY(x,y)∑c∈CyWφy(c|x)exp{ρ∥c∥log2}. (50)

subject to , where denotes the expectation with respect to the distribution .

###### Remark 4.

Without loss of optimality we can assume that the decoder mapping is deterministic for all . Indeed, for a given , we can choose so that

 ψy(c)=argmaxx∈XWφy(c|x)PX|Y(x|y). (51)

We consider the problem of minimizing under the constraint that for a given constant ; the results are summarized in the following theorems:

###### Theorem 6.

Fix and . For any code satisfying , the moment must satisfy

 Mρ≥exp{ρHε11+ρ(X|Y)}. (52)
###### Theorem 7.

Fix and . There exists a code such that and satisfies

 Mρ≤22ρexp{ρHε11+ρ(X|Y)}+ε2ρ. (53)

Theorems 6 and 7 and will be proved in Appendix E.

###### Remark 5.

While we allow the encoder mapping to be stochastic in Theorem 7, we can see the fact that “the optimal encode throws a dice at most once”; cf. Remark 3. Hence, it is not hard to modify the theorem for the case where only deterministic encoder mappings are allowed. We omit the details, but see Proposition 1 of [26] for the case of .

Now, let us consider the asymptotic behavior of the exponential moment of the length function. In a same way as , we define the achievability of the exponential value as follows.

###### Definition 2.

Given a constant and a general source , a value is said to be -achievable if there exists a sequence of variable-length codes satisfying

 limsupn→∞pe(Φn|Xn,Yn)≤ε (54)

and

 limsupn→∞1nlogMρ(Φn|Xn,Yn)≤Es. (55)

The infimum of -achievable values is denoted by .

Then we have the following general formula, which will be proved in Appendix E-C.

###### Theorem 8.

For any and ,

 Es(ρ,ε|X,Y)=ρHε1/(1+ρ)(X|Y). (56)

Combining Theorem 8 with Theorem 2, we can obtain the single-letterized characterization of for a mixed source .

###### Corollary 3.

Let be a mixture of i.i.d. sources satisfying (23). Then, for any and ,

 Es(ρ,ε|X,Y)=ρH(Xi|Yi) (57)

where is determined so that .

## V Concluding Remarks

In this paper, a novel definition of the conditional smooth Rényi entropy was introduced, and its significance in the problems of guessing and source coding was demonstrated.

Although properties of and are investigated in Section II, we consider only the case of . It is an important future work to investigate the properties of and of order . On the other hand, in the coding theorems in Sections III and IV, it is sufficient to consider the the conditional smooth Rényi entropy of order . It is an important future work to find the operational meaning of of order .

In Section IV, we assume that the common side-information is available at encoder and decoder. As mentioned in Remark 1, the availability of at the encoder is very important. The author conjectures that, in the case such that is available only at decoder, instead of