I Introduction
The Rényi entropy [11] of order
of a discrete distribution (probability mass function)
with finite support , defined asfor , is a generalization of the Shannon entropy,^{1}^{1}1For ease of reference, a table summarising the Shannon entropy and crossentropy measures as well as the KullbackLiebler (KL) divergence is provided in Appendix A. , in that . Similarly, the Rényi divergence (of order ) between two discrete distributions and with common finite support , given by
reduces to the KL divergence, , as .
Since the introduction of these measures, several other Rényitype information measures have been put forward, each obeying the condition that their limit as goes to one reduces to a Shannontype information measure (e.g., see [16] and the references therein for three different order extensions of Shannon’s mutual information due to Sibson, Arimoto and Csiszár.)
Many of these definitions admit natural counterparts in the (absolutely) continuous case (i.e., when the involved distributions have a probability density function (pdf)), giving rise to information measures such as the Rényi differential entropy for pdf
with support ,and the Rényi (differential) divergence between pdfs and with common support ,
The Rényi crossentropy between distributions and is an analogous generalization of the Shannon crossentropy . Two definitions for this measure have been suggested. In [12], mirroring the fact that Shannon’s crossentropy satisfies , the authors define Rényi crossentropy as
(1) 
In contrast, prior to [12], the authors of [15] introduced the Rényi crossentropy in their study of the socalled shifted Rényi measures (expressed as the logarithm of weighted generalized power means). Specifically, upon simplifying Definition 6 in [15], their expression for the Rényi crossentropy between distributions and is given by
(2) 
For the continuous case, the definition in (2) can be readily converted to yield the Rényi differential crossentropy between pdfs and :
(3) 
As the Rényi differential divergence and entropy were already calculated for numerous distributions in [5] and [14], respectively, determining the Rényi differential crossentropy using the definition in (1) is straightforward. As such, this paper’s focus is to establish closedform expressions of the Rényi differential crossentropy as defined in (3) for various distributions, as well as to derive the Rényi crossentropy rate for two important classes of sources with memory, Gaussian and Markov sources.
Motivation for determining formulae for the Rényi crossentropy extends beyond idle curiosity. The Shannon differential crossentropy was used as a loss function for the design of deep learning generative adversarial networks (GANs) [6]. Recently, the Rényi differential crossentropy measures in (3) and (1), were used in [1, 2] and [12], respectively, to generalize the original GAN loss function. It is shown that in [1] and [2] that the resulting Rényicentric generalized loss function preserves the equilibrium point satisfied by the original GAN based on the JensenRényi divergence [8], a natural extension of the JensenShannon divergence [9]. In [12], a different Rényitype generalized loss function is obtained and is shown to benefit from stability properties. Improved stability and system performance are shown in [1, 2] and [12] by virtue of the parameter that can be judiciously used to finetune the adopted generalized loss functions which recover the original GAN loss function as .
The rest of this paper is organised as follows. In Section II, basic properties of the Rényi crossentropy are examined. In Section III, the Rényi differential crossentropy for members of the exponential family is calculated. In Section IV, the Rényi differential crossentropy between two different distributions is obtained. In Section V, the Rényi differential crossentropy rate is derived for stationary Gaussian sources. Finally in Section VI, the Rényi crossentropy rate is established for finitealphabet timeinvariant Markov sources.
Ii Basic Properties of the Rényi crossentropy and differential crossentropy
For the Rényi crossentropy to deserve its name it would be preferable that it satisfies at least two key properties: it reduces to the Rényi entropy when and its limit as goes to one is the Shannon crossentropy. Similarly, it is desirable that the Rényi differential crossentropy reduces to the Rényi differential entropy when and its limit as tends to one yields the Shannon differential crossentropy. In both cases, the former property is trivial, and the latter property was proven in [2] for the continuous case under some finiteness conditions (in the discrete case, the result holds directly via L’Hôpital’s rule).
It is also proven in [2] that the Rényi differential crossentropy is nonincreasing in by showing that its derivative with respect to is nonpositive. The same monotonicity property holds in the disrcrete case.
Like its Shannon counterpart, the Rényi crossentropy is nonnegative (); while the Rényi differential crossentropy can be negative. This is easily verified when, for example, and and
are both Gaussian (normal) distributions with zero mean and variance
, and parallels the same lack of nonnegativity of the Shannon differential crossentropy.We close this section by deriving the crossentropy limit, . To begin with, for any nonzero constant , we have
(4) 
where and where we have used the fact that for the Rényi entropy, . Now, denoting the minimum and maximum values of over by and , respectively, we have that for ,
and  
and hence by (4) we obtain
(5) 
Iii Rényi Differential CrossEntropy for Exponential Family Distributions
(6) 
for some realvalued (measurable) functions , , and .^{2}^{2}2Note that and consequently
can be vectors in cases where the distribution admits multiple parameters.
Here is known as the natural parameter of the distribution, is the sufficient statistic and is the normalization constant in the sense that for all within the parameter spaceThe pdf in (6) can also be written as
(7) 
where
. Examples of distributions in the exponential family include the Gaussian, Beta, and exponential distributions.
Lemma 1.
Let and be pdfs of the same type in the exponential family with natural parameters and , respectively. Define as being of the same type as and but with natural parameter . Then
(8) 
where
Proof.
Remark.
If is a constant for all , then
In many cases, we have that on , and thus the term disappears in (8).
Table I lists Rényi differential crossentropy expressions we derived using Lemma 1 for some common distributions in the exponential family (which we describe in Appendix B for convenience). In the table, the subscript of is used to denote that a parameter belongs to pdf , .
Name 


Beta 

,  


Exponential 

Gamma 

,  
Gaussian  


() 
Iv Rényi differential CrossEntropy between different distributions
Let and be pdfs with common support . Below are some general formulae for the differential Rényi crossentropy between one specific (common) distribution and any general distribution. If is an interval below, then denotes its length.
Iva Distribution is uniform
IvB Distribution is uniform
Now suppose is uniformly distributed on . Then
IvC Distribution is exponentially distributed
Suppose the and is exponential with parameter
. Suppose also that the moment generating function (MGF) of
, exists. We haveIvD Distribution is Gaussian
Now assume that is a (normal) Gaussian distribution and that the MGF of , , exists, where
is a random variable with distribution
. ThenThe case where is a halfnormal distribution can be directly derived from the above. Given is a halfnormal distribution, on its support its pdf is the same as that of a normal distribution times 2. Hence if ’s support is , then .
V Rényi Differential CrossEntropy Rate for Stationary Gaussian Processes
Lemma 2.
The Rényi differential crossentropy between two zeromean multivariate dimensionGaussian distributions with invertible covariance matrices and , respectively, is given by
(9) 
where .
Proof.
Recall that the pdf of a multivariate Gaussian with mean and invertible covariance matrix is given by:
for . Note that this distribution is a member of the exponential family, where , , and . Hence the Rényi differential crossentropy between two zeromean multivariate Gaussian distributions with covariance matrices and , respectively, is
∎
Let and be stationary zeromean Gaussian processes. For a given , and are multivariate Gaussian random variables with mean 0 and covariance matrices and , respectively. Since and are stationary, their covariance matrices are Toeplitz. Furthermore, is Toeplitz.
Lemma 3.
Let , and be the power spectral densities of , and the zeromean Gaussian process with covariance matrix , respectively.
Then the Rényi differential crossentropy rate between and , , is given by
Proof.
From Lemma 2, we first note that . With this in mind the Rényi differential crossentropy can be rewritten using (9) as
It was proven in [7] that for a sequence of Toeplitz matrices with spectral density such that is Reimann integrable, one has
We therefore obtain that the Rényi differential crossentropy rate is given by
Note that . ∎
Vi Rényi crossentropy rate for Markov sources
Consider two timeinvariant Markov sources and with common finite alphabet and with transition distribution and , respectively. Then for any , their
dimensional joint distributions are given by
and
respectively, with arbitrary initial distributions, and , . Define the Rényi crossentropy rate between and as
Note that by defining the matrix using the formula
and the row vector s as having components , the Rényi crossentropy rate can be written as
(10) 
where 1 is a column vector whose dimension is the cardinatliy of the alphabet and with all its entries equal to 1.
A result derived by [10] for the Rényi divergence between Markov sources can thus be used to find the Rényi crossentropy rate for Markov sources.
Lemma 4.
Let , , s and R be defined as above. If is irreducible, then
(11) 
where
is the largest positive eigenvalue of
.Proof.
Since the nonnegative matrix is irreducible, by the Frobenius theorem (e.g., cf. [13, 4]), it has a largest positive eigenvalue
with associated positive eigenvector
b. Let and be the minimum and maximum elements, respectively, of b. Then due to the nonnegativity of s,where denotes the Euclidean inner product. Similarly, As a result,
Note that for all , is a constant. Thus
Similarly, we have
Hence,
∎
Another technique can be borrowed from [10] to generalize Lemma 4 to the case where is reducible. First is rewritten in the canonical form detailed in Proposition 1 of [10]. Let be the largest positive eigenvalue of each selfcommunicating submatrix of . For each inessential class let be the largest positive eigenvalue of each class that is reachable from . Define . Then (11) holds.
Appendix A: Shannontype information measures
Name 
Definition 






Appendix B: Distributions listed in Table I
Name  

(Parameters)  (Support) 
Beta  
(, )  
()  
Exponential  
()  
Gamma  
(, )  
Gaussian  
(, )  
Laplace  
(, ) 
Notes

is the Beta function.

is the Gamma function.
References
 [1] (2020) Rényi Generative Adversarial Networks. ArXiv:2006.02479v1. Cited by: §I.
 [2] (202108) Least kthorder and Rényi generative adversarial networks. Neural Computation 33 (9), pp. 2473–2510. External Links: ISSN 1530888X, Link, Document Cited by: §I, §II, §II.
 [3] (2002) Statistical inference. Cengage Learning. Cited by: §III.
 [4] (1996) Discrete stochastic processes. Springer. Cited by: §VI.
 [5] (2013) Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences 249, pp. 124–131. External Links: ISSN 00200255, Document, Link Cited by: §I.
 [6] (2014) Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems, Vol. 27, pp. 2672–2680. Cited by: §I.
 [7] (200110,) Toeplitz and circulant matrices: a review. Foundations and Trends® in Communications and Information Theory 2, pp. . External Links: Document Cited by: §V.
 [8] (2019) On JensenRényi and JeffreysRényi type divergences induced by convex functions. Physica A: Statistical Mechanics and its Applications. External Links: Document Cited by: §I.
 [9] (1991) Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 31, pp. 145–151. Cited by: §I.
 [10] (2001) Rényi’s divergence and entropy rates for finite alphabet Markov sources. IEEE Transactions on Information theory 47 (4), pp. 1553–1561. Cited by: §VI, §VI.
 [11] (1961) On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 547–561. Cited by: §I.
 [12] (2021) RGAN: rényi generative adversarial network. SN Computer Science 2 (1), pp. 17. External Links: Document Cited by: §I, §I.

[13]
(2006)
Nonnegative matrices and markov chains
. Springer Science & Business Media. Cited by: §VI.  [14] (200102) Rényi information, loglikelihood and an intrinsic distribution measure. J. Statistical Planning and Inference 93. External Links: Document Cited by: §I.
 [15] (2019) The case for shifting the Rényi entropy. Entropy 21, pp. 1–21. External Links: Link Cited by: §I.
 [16] (2015) Mutual information. In Proceedings of the IEEE Information Theory and Applications Workshop, pp. 1–6. Cited by: §I.