DeepAI

# Renyi Entropy Rate of Stationary Ergodic Processes

In this paper, we examine the Renyi entropy rate of stationary ergodic processes. For a special class of stationary ergodic processes, we prove that the Renyi entropy rate always exists and can be polynomially approximated by its defining sequence; moreover, using the Markov approximation method, we show that the Renyi entropy rate can be exponentially approximated by that of the Markov approximating sequence, as the Markov order goes to infinity. For the general case, by constructing a counterexample, we disprove the conjecture that the Renyi entropy rate of a general stationary ergodic process always converges to its Shannon entropy rate as α goes to 1.

• 2 publications
• 9 publications
• 38 publications
• 13 publications
07/12/2018

### Shannon and Rényi entropy rates of stationary vector valued Gaussian random processes

We derive expressions for the Shannon and Rényi entropy rates of station...
11/10/2017

### Estimating the Entropy Rate of Finite Markov Chains with Application to Behavior Studies

Predictability of behavior has emerged an an important characteristic in...
02/24/2010

### Nonparametric Estimation and On-Line Prediction for General Stationary Ergodic Sources

We proposed a learning algorithm for nonparametric estimation and on-lin...
07/27/2022

### Asymptotic equipartition property for a Markov source having ambiguous alphabet

We propose a generalization of the asymptotic equipartition property to ...
11/21/2020

### Monotonicity of the Trace-Inverse of Covariance Submatrices and Two-Sided Prediction

It is common to assess the "memory strength" of a stationary process loo...
12/06/2019

### Information theory for non-stationary processes with stationary increments

We describe how to analyze the wide class of non stationary processes wi...
03/10/2020

### On a Class of Markov Order Estimators Based on PPM and Other Universal Codes

We investigate a class of estimators of the Markov order for stationary ...

## 1 Introduction

Let be a finite alphabet. Let

be a sequence of random variables over

with distribution and let denote its realization. Given , the -th order Rényi entropy of , first suggested by Alfred Rényi [21], is defined as

 Hα(Zn1)=⎧⎪ ⎪⎨⎪ ⎪⎩log∑zn1(μn(zn1))α1−αif α≠1,H(Zn1)if α=1,

where

 H(Zn1)≜−∑zn1μn(zn1)logμn(zn1)

is the Shannon entropy of . An easy application of L’Hôpital’s rule shows that

 limα→1Hα(Zn1)=H(Zn1). (1)

Rényi entropy is a fundamental notion in a number of scientific and engineering disciplines, such as coding theory [5], chaotic dynamical systems [7], statistical mechanics [16], statistical inference [18], quantum mechanics [3], multi-fractal analysis [14], economics [12], guessing [1], hypothesis testing [2], and so forth.

Now, consider a stationary stochastic process over the alphabet . Let

 H(Z)≜limn→∞H(Zn1)n

be the Shannon entropy rate of . Then, the -th order Rényi entropy rate of is defined as

 Hα(Z)≜limn→∞Hα(Zn1)n,

when the limit exists. As opposed to Rényi entropy, which has been extensively studied, there has long been a lack of understanding on some basic properties of Rényi entropy rate. To name a few, first of all, the fundamental problem of the well-definedness of the Rényi entropy rate for a general stationary ergodic process remains unknown. Second, regarding its connection with the Shannon entropy rate, given (1), one is natually tempted to propose the following natural conjecture:

###### Conjecture 1.1.

Let be a stationary ergodic process. Then

 limα→1Hα(Z)=H(Z).

However, this conjecture is neither proved nor disproved in the literature.

On the positive side, some special cases have been handled and feature clean solutions. When is an independent and identically distributed (i.i.d.) process, boils down to nothing but . For a finite-state ergodic Markov process , using the Perron-Frobenius theory (see, e.g., [17, 22]), it has been proved in [19] that

 Hα(Z)=logλmax1−α (2)

and converges to the Shannon entropy rate as goes to , where

is the largest real eigenvalue of the

-dimensional matrix with It turns out that similar results are also valid for mixing processes: for a weakly -mixing process , it has been shown in [11] that is well-defined for and always goes to as goes to ; on the other hand, using Kingman’s subadditive ergodic theorem [15], it has been proved in [25] that the Rényi entropy rate of any order exists for the so-called weakly mixing processes.

The contributions of this paper can be summarized as follows. We first focus our attention on the Rényi entropy rate of a special family of stationary ergodic processes which contains hidden Markov processes [6] as special cases. More precisely, we will examine a random process under the “uniform boundedness” and “exponential forgetting” properties (see Section 2 for details). Using a refined Bernstein blocking method [4], we first show that the Rényi entropy rate exists, and the convergence rate of to is , where can be arbitrarily close to . Note that for the special case when (the Shannon entropy case), it is well known (see, e.g., [9]) that the convergence rate is . So, in some sense, the derived convergence rate is sharp. Borrowing results from the theory of nonnegative matrices, we also establish that can be exponentially approximated by the Rényi entropy rate of the approximating Markov process, as the Markov order goes to infinity. Undoubtedly, as opposed to the polynomial convergence rate of , this exponential convergence rate allows us to compute more efficiently, at least for some special situations.

We then examine the Rényi entropy rate of general stationary ergodic processes, for which we show that Conjecture 1.1 is not true. Note that the answer to Conjecture 1.1 is clearly negative if the ergodicity assumption is dropped: the example in Section IV of [19]

shows that for some reducible Markov chain

, fails to converge to as goes to . Although the existing results for i.i.d., Markov [20] and weakly -mixing processes [11] might suggest a positive answer to Conjecture 1.1, we will construct a stationary ergodic counterexample whose Rényi entropy rate does not converge to the Shannon entropy rate as the Rényi order goes to . The main tool employed in the construction is the cutting and stacking method, which is a well-known method in ergodic theory but somehow attracts little attention in the field of information theory.

The remainder of this paper is organized as follows. First, we focus our attention on the special random process mentioned above. We show in Section 2.1 that the normalized Rényi entropy converges to polynomially. By introducing the Markov approximation sequence, we prove in Section 2.2 that the Rényi entropy rate of this sequence of Markov chains does converge to , and moreover, the rate of convergence is exponential. Next, we turn to the construction of the stationary ergodic counterexample that disproves Conjecture 1.1. Some preliminaries on the cutting and stacking method are given in Section 3.1. Then, based on this method, the construction of our counterexample is presented in Section 3.2, followed by the derivation of several properties of the counterexample in Section 3.3. As elaborated on in Section 3.4, these properties immediately imply that as goes to , the Rényi entropy rate fails to converge to the Shannon entropy rate for the constructed stationary ergodic process.

## 2 Rényi Entropy Rate of a Special Class of Random Processes

In this section, we focus on a stationary process satisfying the following two conditions:

1. uniform boundedness: there exist such that for any realization sequence ,

 CL≤p(yn|yn−11)≤CU;
2. exponential forgetting: for any fixed , there exist and such that for any and for any two realization sequences and with , it holds that

 |pα(yk|yk−11)−pα(^y^k|^y^k−11)|≤CFρnF;

A typical example satisfying the above conditions is given below.

###### Example 2.1.

A hidden Markov chain is a finite-state Markov chain observed through a discrete memoryless channel. To be more specific, let be the input alphabet, be the output alphabet, be a finite-state Markov chain and

be the channel transition probabilities. Then the distribution of a hidden Markov process

is given by

 p(zn1)=∑xn1p(xn1,zn1)=p(x1)p(z1|x1)n∏i=2p(xi|xi−1)p(zi|xi)

for any realization sequence . If we further assume that satisfies the following two conditions:

1. the input Markov chain is irreducible and aperiodic,

2. the channel transition probability matrix is strictly positive,

then it has been verified in [10] that satisfies Conditions (i) and (ii). Here, we remark that as special cases, i.i.d. processes and irreducible and aperiodic finite-state Markov chains also satisfy Conditions (i) and (ii).

In the remainder of this section, we will first prove that for any fixed , exists and the convergence rate of is polynomial. Then, making use of the Markov approximation, we show that when is small enough, the Rényi entropy rate of the Markov approximating sequence converges exponentially to . Note that the requirement for

to be small can be justified in some practical situations: for a binary symmetric channel operating at the high signal-to-noise ratio regime, or roughly, its crossover probability is “close” to

, it has been observed (see, e.g., [10]) that is also “close” to .

Before moving to the next section, let us introduce the following definition.

###### Definition 2.2.

For a stochastic process , its -th order Markov approximation [8] is a stochastic process with distribution such that:

• is an -th order Markov process, that is, for any realization with

 p(m)(xn1)=p(m)(xm1)⋅p(m)(xm+1|xm1)⋯p(m)(xn|xn−1n−m);
• the -dimensional distribution of and are the same, namely,

 p(m)(xm+11)=p(xm+11).
###### Remark 2.3.

If satisfies Conditions and , then for any , also satisfies these two conditions with the same constants (which are independent of ).

Throughout the remainder of this section, we will always assume that since corresponds to the Shannon entropy rate case. Furthermore, we always use to denote a stationary process satisfying Conditions and and to denote the -th order Markov approximation of .

### 2.1 Convergence of {Hα(Yn1)/n}

The following theorem establishes the existence of the Rényi entropy rate ; moreover, it establishes the convergence of to and gives a rate of convergence. Here, we note from Remark 2.3 that the theorem also applies to the -th order Markov approximation for any .

###### Theorem 2.4.

For any , there exists a constant such that for all ,

###### Proof.

We only prove the theorem for the case , since the cases and can be similarly handled.

For any constant , let

 λ=1+γ2,β=1−γ2,p=nλ,q=nβ2,ω=n1−λ.

Now we use the Bernstein blocking method (see [4]) to consecutively partition the sequence into small pieces of length , and . To be more specific, define

 ξi≜pα(y(i−1)p+q(i−1)p+1∣∣y(i−1)p1),ηi≜pα(y(i−1)p+2q(i−1)p+q+1∣∣y(i−1)p+q1),ζi≜pα(yip(i−1)p+2q+1∣∣y(i−1)p+2q1)

and their truncated versions

 ^ξi≜pα(y(i−1)p+q(i−1)p+1∣∣y(i−1)p(i−2)p+q+1),^ηi≜pα(y(i−1)p+2q(i−1)p+q+1),^ζi≜pα(yip(i−1)p+2q+1∣∣y(i−1)p+2q(i−1)p+q+1).

Then, using the fact that for , the -sequences associated with and are both of length and their index sets are non-overlapping, we have

 ∑yn1pα(yn1) =∑yn1ξ1η1ζ1ξ2η2ζ2⋯ξωηωζω (a)≤∑yn1CαqωUη1ζ1η2ζ2⋯ηωζω (b)≤∑yn1CαqωU(CUCL)αqω^η1ζ1^η2ζ2⋯^ηωζω (c)≤∑yn1(C2UCL)αqω(1+CFρqFCαL)(p−2q)ω⋅^η1^ζ1^η2^ζ2⋯^ηω^ζω (d)≤(C2UCL)αqω(1+CFρqFCαL)(p−2q)ω⋅∑yn1[(1CαL)qω(^η1^ζ1^ξ2)⋯(^ηω^ζω^ξω+1)] =(CUCL)2αqω(1+CFρqFCαL)(p−2q)ω⋅(∑ynλ1pα(ynλ1))ω, (3)

where for and , we have used Condition to drop all ’s and replaced all ’s by their truncated versions; for , we have applied Conditions and to replace all ’s by their truncated versions; and for , we have applied Condition to add .

Taking logarithm and dividing both sides of (3) by , we obtain

 log∑yn1pα(yn1)n =ωlog∑ynλ1pα(ynλ1)n+2αqωlog(CUCL)n+(p−2q)ωlog(1+CFρqFCαL)n ≤log∑ynλ1pα(ynλ1)nλ+αlog(CUCL)n−γ+CFρqFCαL.

Note that and implies

 CFρqFCαL≤αlog(CUCL)n−γ

for sufficient large . It then follows that

 log∑yn1pα(yn1)n≤log∑ynλ1pα(ynλ1)nλ+2αlog(CUCL)n−γ,

which immediately implies

 Hα(Yn1)n≤Hα(Ynλ1)nλ+C1n−γ (4)

for some constant . Applying a parallel argument to the other direction, we obtain that for sufficiently large ,

 Hα(Y)n≥Hα(Ynλ1)nλ+C2n−γ (5)

for some constant . Choosing , we derive from (4) and (5) that

 ∣∣ ∣∣Hα(Yn1)n−Hα(Ynλ1)nλ∣∣ ∣∣<˜Cn−γ. (6)

Now consider any with , where is a sufficiently large number to be determined later. Pick a number between and (e.g., ). Let be the positive integer such that

 t′≜t+logξlogξn−logξlogξm∈[1,2).

Then, and . Let

 λ1=ξ−t′,γ1=2ξ−t′−1,λ2=ξ−1,γ2=2ξ−1−1.

Then, and

 ∣∣∣Hα(Ym1)m−Hα(Yn1)n∣∣∣ (e)≤˜Cm−γ1/λ1+˜Cn−γ2/λ2+˜Cn−γ2/λ22+⋯+˜Cn−γ2/λt2 (f)≤˜Cmξ2−2+˜Cn−(2−ξ)1−n−(2−ξ)(ξ−1), (7)

where follows from the inequality (6) and follows from the fact that for any . For any given , by choosing a sufficiently large such that

 ˜CNξ2−2<ε/2and˜CN−(2−ξ)1−N−(2−ξ)(ξ−1)<ε/2,

we derive from (2.1) that

 ∣∣∣Hα(Y)m−Hα(Y)n∣∣∣<ε

for any Thus the sequence is Cauchy, and thereby convergent. Furthermore, for any positive integers and with , we have

 ∣∣∣Hα(Yn1)n−Hα(Yn1/λk1)n1/λk∣∣∣ ≤∣∣∣Hα(Yn1)n−Hα(Yn1/λ1)n1/λ∣∣∣+∣∣∣Hα(Yn1/λ1)n1/λ−Hα(Yn1/λ21)n1/λ2∣∣∣ \ \ \ +⋯+∣∣∣Hα(Yn1/λk−11)n1/λk−1−Hα(Yn1/λk1)n1/λk∣∣∣ ≤˜Cn−γ/λ+˜Cn−γ/λ2+⋯+˜Cn−γ/λk ≤˜Cn−γ1−nγ−γ/λ≤˜Cnγ−n2γ21+γ≤2˜Cnγ.

Then, letting tend to infinity, we have, for all sufficiently large ,

The proof is then complete with an appropriately chosen common constant for all . ∎

### 2.2 Convergence of {Hα(Y(m))}

When it comes to the computation of , the convergence of as in Theorem 2.4 may be too slow to be applied in practice. In this section, we show that under some additional assumptions, can be approximated by another exponentially convergent sequence that can be efficiently computed.

Our motivation comes from the fact that the Rényi entropy rate of a Markov process features a simple formula as in (2). For any , let be the -th order Markov approximation of . It is obvious form Definition 2.2 that as goes to infinity, converges in distribution to the original process ; moreover, we note from [19] that is well-defined for all . Indeed, we have the following theorem.

###### Proof.

Note that for any and , we have

 |Hα(Y(m))−Hα(Y)| (8)

We first deal with the first and third terms of the RHS of (8). It follows from Theorem 2.4 (applied to and , which satisfy Conditions and ) that for any given , there exists such that for any and any ,

 ∣∣∣Hα(Y)−Hα(Yn1)n∣∣∣≤ε/3,∣∣ ∣ ∣∣Hα(Y(m))−Hα(Y(m)n \ \ \ \ 1)n∣∣ ∣ ∣∣≤ε/3.

Now, for the second term in the RHS of (8), we have

 ∣∣ ∣ ∣∣Hα(Y(m)n \ \ \ \ 1)n−Hα(Y)n∣∣ ∣ ∣∣ =∣∣∣1(1−α)n[log∑y(m)n \ \ \ \ 1pα(y(m)n \ \ % \ \ 1)−log∑yn1pα(yn1)]∣∣∣ =∣∣ ∣ ∣∣1(1−α)nlog∑y(m)n % \ \ \ \ 1pα(y(m)n \ \ \ \ 1)∑yn1pα(yn1)∣∣ ∣ ∣∣ =∣∣ ∣∣1(1−α)nlog∑yn1pα(ym1)pα(ym+1|ym1)⋯pα(yn|yn−1n−m)∑yn1pα(ym1)pα(ym+1|ym1)⋯pα(yn|yn−11)∣∣ ∣∣.

Replacing ’s with simpler notations ’s and ’s, we continue to derive

 ∣∣ ∣ ∣∣Hα(Y(m)n \ \ \ \ 1)n−Hα(Yn1)n∣∣ ∣ ∣∣ =∣∣ ∣∣1(1−α)nlog∑yn1∏n−m+1i=1ai∑yn1∏n−m+1i=1bi∣∣ ∣∣ =∣∣ ∣ ∣∣1(1−α)nlog⎡⎢ ⎢⎣1+∑yn1(∏n−m+1i=1ai−∏n−m+1i=1bi)∑yn1∏n−m+1i=1bi⎤⎥ ⎥⎦∣∣ ∣ ∣∣
 ≤∣∣ ∣ ∣∣1(1−α)n∑yn1(∏n−m+1i=1ai−∏n−m+1i=1bi)∑yn1∏n−m+1i=1bi∣∣ ∣ ∣∣ ≤1|1−α|n∑yn1∑n−m+1i=1|ai−bi|∑yn1b1⋯bn−m+1 (g)≤1|1−α|n|Y|n(n−m+1)CFρmF|Y|nCn−m+1L, ≤1|1−α|CFρmFCn−m+1L, (9)

where Condition is used in . Noting that , , we deduce that there exists such that . Setting we have , which, together with (9), implies that for the given above, there exists an such that for all ,

 ∣∣ ∣ ∣∣Hα(Y(m)n \ \ \ \ 1)n−Hα(Yn1)n∣∣ ∣ ∣∣≤ε/3.

It then follows from (8) that

 |Hα(Y(m))−Hα(Y)|≤ε (10)

as long as . The desired convergence then follows from the arbitrariness of . ∎

Having established the convergence of to , we now turn to its convergence rate.

First of all, for any fixed , by a usual -step blocking argument, we can transform into a first-order Markov chain over a larger alphabet. To be more specific, define a new process such that

 W(m)i=(Y(m)i,Y(m)i+1,⋯,Y(m)i+m−1),i=1,2,⋯.

Apparently, is a first-order Markov chain over the alphabet . Let be the transition probability matrix of , be the matrix obtained by taking the -th power of each entry of , and let be the largest eigenvalue of . Recalling from (2) that

 Hα(Y(m))=logλ(m)1−α,

in order to derive the convergence rate of , we only need to compare and . Observing that and are the largest eigenvalues of two matrices whose dimensions are different, we first “upscale” the matrix by viewing as an -th order Markov chain with the corresponding -dimensional transition probability matrix . It can then be readily verified that has the same largest eigenvalue as . Hence, it suffices for us to compare and , both of which are of dimension

Assuming is small enough, the following theorem uses the previous observation to establish the exponential convergence of as .

###### Theorem 2.6.

If , then exponentially as .

###### Proof.

According to Theorem 2.5, it suffices for us to show the exponential convergence of the sequence .

Let . It follows from Condition that the absolute value of each nonzero entry of is upper bounded by . Applying the Collatz-Wielandt formula (see, e.g., [13]), we have

 λ(m+1) =maxx>0mini[R(m+1)x]ixi =maxx>0mini[(˜R(m)+Δm)x]ixi =maxx>0mini⎧⎨⎩[˜R(m)x]ixi+[Δmx]ixi⎫⎬⎭ ≥maxx>0⎧⎨⎩mini[˜R(m)x]ixi+minj[Δmx]jxj⎫⎬⎭, (11)

where is a

column vector and

denote the -th component of . Let the vector

be the right eigenvector of

such that the equality is achieved (Note from the Perron-Frobenius theorem [17, 22] that is a positive vector since is a nonnegative irreducible matrix). Then we continue from (2.2) as follows:

 λ(m+1) ≥mini[˜R(m)v]ivi+minj[Δmv]jvj =λ(m)+minj[Δmv]jvj ≥λ(m)−CFρmF⋅|Y|⋅maxi,jvivj, (12)

where we have used the fact that each row of has exactly strictly positive entries.

We now claim that for any , can be bounded by

 (CLCU</