Let be a finite alphabet. Let
be a sequence of random variables overwith distribution and let denote its realization. Given , the -th order Rényi entropy of , first suggested by Alfred Rényi , is defined as
is the Shannon entropy of . An easy application of L’Hôpital’s rule shows that
Rényi entropy is a fundamental notion in a number of scientific and engineering disciplines, such as coding theory , chaotic dynamical systems , statistical mechanics , statistical inference , quantum mechanics , multi-fractal analysis , economics , guessing , hypothesis testing , and so forth.
Now, consider a stationary stochastic process over the alphabet . Let
be the Shannon entropy rate of . Then, the -th order Rényi entropy rate of is defined as
when the limit exists. As opposed to Rényi entropy, which has been extensively studied, there has long been a lack of understanding on some basic properties of Rényi entropy rate. To name a few, first of all, the fundamental problem of the well-definedness of the Rényi entropy rate for a general stationary ergodic process remains unknown. Second, regarding its connection with the Shannon entropy rate, given (1), one is natually tempted to propose the following natural conjecture:
Let be a stationary ergodic process. Then
However, this conjecture is neither proved nor disproved in the literature.
On the positive side, some special cases have been handled and feature clean solutions. When is an independent and identically distributed (i.i.d.) process, boils down to nothing but . For a finite-state ergodic Markov process , using the Perron-Frobenius theory (see, e.g., [17, 22]), it has been proved in  that
and converges to the Shannon entropy rate as goes to , where
is the largest real eigenvalue of the-dimensional matrix with It turns out that similar results are also valid for mixing processes: for a weakly -mixing process , it has been shown in  that is well-defined for and always goes to as goes to ; on the other hand, using Kingman’s subadditive ergodic theorem , it has been proved in  that the Rényi entropy rate of any order exists for the so-called weakly mixing processes.
The contributions of this paper can be summarized as follows. We first focus our attention on the Rényi entropy rate of a special family of stationary ergodic processes which contains hidden Markov processes  as special cases. More precisely, we will examine a random process under the “uniform boundedness” and “exponential forgetting” properties (see Section 2 for details). Using a refined Bernstein blocking method , we first show that the Rényi entropy rate exists, and the convergence rate of to is , where can be arbitrarily close to . Note that for the special case when (the Shannon entropy case), it is well known (see, e.g., ) that the convergence rate is . So, in some sense, the derived convergence rate is sharp. Borrowing results from the theory of nonnegative matrices, we also establish that can be exponentially approximated by the Rényi entropy rate of the approximating Markov process, as the Markov order goes to infinity. Undoubtedly, as opposed to the polynomial convergence rate of , this exponential convergence rate allows us to compute more efficiently, at least for some special situations.
We then examine the Rényi entropy rate of general stationary ergodic processes, for which we show that Conjecture 1.1 is not true. Note that the answer to Conjecture 1.1 is clearly negative if the ergodicity assumption is dropped: the example in Section IV of 
shows that for some reducible Markov chain, fails to converge to as goes to . Although the existing results for i.i.d., Markov  and weakly -mixing processes  might suggest a positive answer to Conjecture 1.1, we will construct a stationary ergodic counterexample whose Rényi entropy rate does not converge to the Shannon entropy rate as the Rényi order goes to . The main tool employed in the construction is the cutting and stacking method, which is a well-known method in ergodic theory but somehow attracts little attention in the field of information theory.
The remainder of this paper is organized as follows. First, we focus our attention on the special random process mentioned above. We show in Section 2.1 that the normalized Rényi entropy converges to polynomially. By introducing the Markov approximation sequence, we prove in Section 2.2 that the Rényi entropy rate of this sequence of Markov chains does converge to , and moreover, the rate of convergence is exponential. Next, we turn to the construction of the stationary ergodic counterexample that disproves Conjecture 1.1. Some preliminaries on the cutting and stacking method are given in Section 3.1. Then, based on this method, the construction of our counterexample is presented in Section 3.2, followed by the derivation of several properties of the counterexample in Section 3.3. As elaborated on in Section 3.4, these properties immediately imply that as goes to , the Rényi entropy rate fails to converge to the Shannon entropy rate for the constructed stationary ergodic process.
2 Rényi Entropy Rate of a Special Class of Random Processes
In this section, we focus on a stationary process satisfying the following two conditions:
uniform boundedness: there exist such that for any realization sequence ,
exponential forgetting: for any fixed , there exist and such that for any and for any two realization sequences and with , it holds that
A typical example satisfying the above conditions is given below.
A hidden Markov chain is a finite-state Markov chain observed through a discrete memoryless channel. To be more specific, let be the input alphabet, be the output alphabet, be a finite-state Markov chain and be the channel transition probabilities. Then the distribution of a hidden Markov process
be the channel transition probabilities. Then the distribution of a hidden Markov processis given by
for any realization sequence . If we further assume that satisfies the following two conditions:
the input Markov chain is irreducible and aperiodic,
the channel transition probability matrix is strictly positive,
then it has been verified in  that satisfies Conditions (i) and (ii). Here, we remark that as special cases, i.i.d. processes and irreducible and aperiodic finite-state Markov chains also satisfy Conditions (i) and (ii).
In the remainder of this section, we will first prove that for any fixed , exists and the convergence rate of is polynomial. Then, making use of the Markov approximation, we show that when is small enough, the Rényi entropy rate of the Markov approximating sequence converges exponentially to . Note that the requirement for
to be small can be justified in some practical situations: for a binary symmetric channel operating at the high signal-to-noise ratio regime, or roughly, its crossover probability is “close” to, it has been observed (see, e.g., ) that is also “close” to .
Before moving to the next section, let us introduce the following definition.
For a stochastic process , its -th order Markov approximation  is a stochastic process with distribution such that:
is an -th order Markov process, that is, for any realization with
the -dimensional distribution of and are the same, namely,
If satisfies Conditions and , then for any , also satisfies these two conditions with the same constants (which are independent of ).
Throughout the remainder of this section, we will always assume that since corresponds to the Shannon entropy rate case. Furthermore, we always use to denote a stationary process satisfying Conditions and and to denote the -th order Markov approximation of .
2.1 Convergence of
The following theorem establishes the existence of the Rényi entropy rate ; moreover, it establishes the convergence of to and gives a rate of convergence. Here, we note from Remark 2.3 that the theorem also applies to the -th order Markov approximation for any .
For any , there exists a constant such that for all ,
We only prove the theorem for the case , since the cases and can be similarly handled.
For any constant , let
Now we use the Bernstein blocking method (see ) to consecutively partition the sequence into small pieces of length , and . To be more specific, define
and their truncated versions
Then, using the fact that for , the -sequences associated with and are both of length and their index sets are non-overlapping, we have
where for and , we have used Condition to drop all ’s and replaced all ’s by their truncated versions; for , we have applied Conditions and to replace all ’s by their truncated versions; and for , we have applied Condition to add .
Taking logarithm and dividing both sides of (3) by , we obtain
Note that and implies
for sufficient large . It then follows that
which immediately implies
for some constant . Applying a parallel argument to the other direction, we obtain that for sufficiently large ,
Now consider any with , where is a sufficiently large number to be determined later. Pick a number between and (e.g., ). Let be the positive integer such that
Then, and . Let
where follows from the inequality (6) and follows from the fact that for any . For any given , by choosing a sufficiently large such that
we derive from (2.1) that
for any Thus the sequence is Cauchy, and thereby convergent. Furthermore, for any positive integers and with , we have
Then, letting tend to infinity, we have, for all sufficiently large ,
The proof is then complete with an appropriately chosen common constant for all . ∎
2.2 Convergence of
When it comes to the computation of , the convergence of as in Theorem 2.4 may be too slow to be applied in practice. In this section, we show that under some additional assumptions, can be approximated by another exponentially convergent sequence that can be efficiently computed.
Our motivation comes from the fact that the Rényi entropy rate of a Markov process features a simple formula as in (2). For any , let be the -th order Markov approximation of . It is obvious form Definition 2.2 that as goes to infinity, converges in distribution to the original process ; moreover, we note from  that is well-defined for all . Indeed, we have the following theorem.
Note that for any and , we have
We first deal with the first and third terms of the RHS of (8). It follows from Theorem 2.4 (applied to and , which satisfy Conditions and ) that for any given , there exists such that for any and any ,
Now, for the second term in the RHS of (8), we have
Replacing ’s with simpler notations ’s and ’s, we continue to derive
where Condition is used in . Noting that , , we deduce that there exists such that . Setting we have , which, together with (9), implies that for the given above, there exists an such that for all ,
It then follows from (8) that
as long as . The desired convergence then follows from the arbitrariness of . ∎
Having established the convergence of to , we now turn to its convergence rate.
First of all, for any fixed , by a usual -step blocking argument, we can transform into a first-order Markov chain over a larger alphabet. To be more specific, define a new process such that
Apparently, is a first-order Markov chain over the alphabet . Let be the transition probability matrix of , be the matrix obtained by taking the -th power of each entry of , and let be the largest eigenvalue of . Recalling from (2) that
in order to derive the convergence rate of , we only need to compare and . Observing that and are the largest eigenvalues of two matrices whose dimensions are different, we first “upscale” the matrix by viewing as an -th order Markov chain with the corresponding -dimensional transition probability matrix . It can then be readily verified that has the same largest eigenvalue as . Hence, it suffices for us to compare and , both of which are of dimension
Assuming is small enough, the following theorem uses the previous observation to establish the exponential convergence of as .
If , then exponentially as .
According to Theorem 2.5, it suffices for us to show the exponential convergence of the sequence .
Let . It follows from Condition that the absolute value of each nonzero entry of is upper bounded by . Applying the Collatz-Wielandt formula (see, e.g., ), we have
where is a
column vector anddenote the -th component of . Let the vector
be the right eigenvector ofsuch that the equality is achieved (Note from the Perron-Frobenius theorem [17, 22] that is a positive vector since is a nonnegative irreducible matrix). Then we continue from (2.2) as follows:
where we have used the fact that each row of has exactly strictly positive entries.
We now claim that for any , can be bounded by