I Introduction
The Rényi entropy [11] of order
of a discrete distribution (probability mass function)
with finite support , defined asfor , is a generalization of the Shannon entropy,111For ease of reference, a table summarising the Shannon entropy and cross-entropy measures as well as the Kullback-Liebler (KL) divergence is provided in Appendix A. , in that . Similarly, the Rényi divergence (of order ) between two discrete distributions and with common finite support , given by
reduces to the KL divergence, , as .
Since the introduction of these measures, several other Rényi-type information measures have been put forward, each obeying the condition that their limit as goes to one reduces to a Shannon-type information measure (e.g., see [16] and the references therein for three different order extensions of Shannon’s mutual information due to Sibson, Arimoto and Csiszár.)
Many of these definitions admit natural counterparts in the (absolutely) continuous case (i.e., when the involved distributions have a probability density function (pdf)), giving rise to information measures such as the Rényi differential entropy for pdf
with support ,and the Rényi (differential) divergence between pdfs and with common support ,
The Rényi cross-entropy between distributions and is an analogous generalization of the Shannon cross-entropy . Two definitions for this measure have been suggested. In [12], mirroring the fact that Shannon’s cross-entropy satisfies , the authors define Rényi cross-entropy as
(1) |
In contrast, prior to [12], the authors of [15] introduced the Rényi cross-entropy in their study of the so-called shifted Rényi measures (expressed as the logarithm of weighted generalized power means). Specifically, upon simplifying Definition 6 in [15], their expression for the Rényi cross-entropy between distributions and is given by
(2) |
For the continuous case, the definition in (2) can be readily converted to yield the Rényi differential cross-entropy between pdfs and :
(3) |
As the Rényi differential divergence and entropy were already calculated for numerous distributions in [5] and [14], respectively, determining the Rényi differential cross-entropy using the definition in (1) is straightforward. As such, this paper’s focus is to establish closed-form expressions of the Rényi differential cross-entropy as defined in (3) for various distributions, as well as to derive the Rényi cross-entropy rate for two important classes of sources with memory, Gaussian and Markov sources.
Motivation for determining formulae for the Rényi cross-entropy extends beyond idle curiosity. The Shannon differential cross-entropy was used as a loss function for the design of deep learning generative adversarial networks (GANs) [6]. Recently, the Rényi differential cross-entropy measures in (3) and (1), were used in [1, 2] and [12], respectively, to generalize the original GAN loss function. It is shown that in [1] and [2] that the resulting Rényi-centric generalized loss function preserves the equilibrium point satisfied by the original GAN based on the Jensen-Rényi divergence [8], a natural extension of the Jensen-Shannon divergence [9]. In [12], a different Rényi-type generalized loss function is obtained and is shown to benefit from stability properties. Improved stability and system performance are shown in [1, 2] and [12] by virtue of the parameter that can be judiciously used to fine-tune the adopted generalized loss functions which recover the original GAN loss function as .
The rest of this paper is organised as follows. In Section II, basic properties of the Rényi cross-entropy are examined. In Section III, the Rényi differential cross-entropy for members of the exponential family is calculated. In Section IV, the Rényi differential cross-entropy between two different distributions is obtained. In Section V, the Rényi differential cross-entropy rate is derived for stationary Gaussian sources. Finally in Section VI, the Rényi cross-entropy rate is established for finite-alphabet time-invariant Markov sources.
Ii Basic Properties of the Rényi cross-entropy and differential cross-entropy
For the Rényi cross-entropy to deserve its name it would be preferable that it satisfies at least two key properties: it reduces to the Rényi entropy when and its limit as goes to one is the Shannon cross-entropy. Similarly, it is desirable that the Rényi differential cross-entropy reduces to the Rényi differential entropy when and its limit as tends to one yields the Shannon differential cross-entropy. In both cases, the former property is trivial, and the latter property was proven in [2] for the continuous case under some finiteness conditions (in the discrete case, the result holds directly via L’Hôpital’s rule).
It is also proven in [2] that the Rényi differential cross-entropy is non-increasing in by showing that its derivative with respect to is non-positive. The same monotonicity property holds in the disrcrete case.
Like its Shannon counterpart, the Rényi cross-entropy is non-negative (); while the Rényi differential cross-entropy can be negative. This is easily verified when, for example, and and
are both Gaussian (normal) distributions with zero mean and variance
, and parallels the same lack of non-negativity of the Shannon differential cross-entropy.We close this section by deriving the cross-entropy limit, . To begin with, for any non-zero constant , we have
(4) |
where and where we have used the fact that for the Rényi entropy, . Now, denoting the minimum and maximum values of over by and , respectively, we have that for ,
and | |||
and hence by (4) we obtain
(5) |
Iii Rényi Differential Cross-Entropy for Exponential Family Distributions
(6) |
for some real-valued (measurable) functions , , and .222Note that and consequently can be vectors in cases where the distribution admits multiple parameters.
The pdf in (6) can also be written as
(7) |
where
. Examples of distributions in the exponential family include the Gaussian, Beta, and exponential distributions.
Lemma 1.
Let and be pdfs of the same type in the exponential family with natural parameters and , respectively. Define as being of the same type as and but with natural parameter . Then
(8) |
where
Proof.
Remark.
If is a constant for all , then
In many cases, we have that on , and thus the term disappears in (8).
Table I lists Rényi differential cross-entropy expressions we derived using Lemma 1 for some common distributions in the exponential family (which we describe in Appendix B for convenience). In the table, the subscript of is used to denote that a parameter belongs to pdf , .
Name |
|
---|---|
Beta |
|
, | |
|
|
Exponential |
|
Gamma |
|
, | |
Gaussian | |
|
|
() |
Iv Rényi differential Cross-Entropy between different distributions
Let and be pdfs with common support . Below are some general formulae for the differential Rényi cross-entropy between one specific (common) distribution and any general distribution. If is an interval below, then denotes its length.
Iv-a Distribution is uniform
Iv-B Distribution is uniform
Now suppose is uniformly distributed on . Then
Iv-C Distribution is exponentially distributed
Suppose the and is exponential with parameter
. Suppose also that the moment generating function (MGF) of
, exists. We haveIv-D Distribution is Gaussian
Now assume that is a (normal) Gaussian distribution and that the MGF of , , exists, where
is a random variable with distribution
. ThenThe case where is a half-normal distribution can be directly derived from the above. Given is a half-normal distribution, on its support its pdf is the same as that of a normal distribution times 2. Hence if ’s support is , then .
V Rényi Differential Cross-Entropy Rate for Stationary Gaussian Processes
Lemma 2.
The Rényi differential cross-entropy between two zero-mean multivariate dimension-Gaussian distributions with invertible covariance matrices and , respectively, is given by
(9) |
where .
Proof.
Recall that the pdf of a multivariate Gaussian with mean and invertible covariance matrix is given by:
for . Note that this distribution is a member of the exponential family, where , , and . Hence the Rényi differential cross-entropy between two zero-mean multivariate Gaussian distributions with covariance matrices and , respectively, is
∎
Let and be stationary zero-mean Gaussian processes. For a given , and are multivariate Gaussian random variables with mean 0 and covariance matrices and , respectively. Since and are stationary, their covariance matrices are Toeplitz. Furthermore, is Toeplitz.
Lemma 3.
Let , and be the power spectral densities of , and the zero-mean Gaussian process with covariance matrix , respectively.
Then the Rényi differential cross-entropy rate between and , , is given by
Proof.
From Lemma 2, we first note that . With this in mind the Rényi differential cross-entropy can be rewritten using (9) as
It was proven in [7] that for a sequence of Toeplitz matrices with spectral density such that is Reimann integrable, one has
We therefore obtain that the Rényi differential cross-entropy rate is given by
Note that . ∎
Vi Rényi cross-entropy rate for Markov sources
Consider two time-invariant Markov sources and with common finite alphabet and with transition distribution and , respectively. Then for any , their
-dimensional joint distributions are given by
and
respectively, with arbitrary initial distributions, and , . Define the Rényi cross-entropy rate between and as
Note that by defining the matrix using the formula
and the row vector s as having components , the Rényi cross-entropy rate can be written as
(10) |
where 1 is a column vector whose dimension is the cardinatliy of the alphabet and with all its entries equal to 1.
A result derived by [10] for the Rényi divergence between Markov sources can thus be used to find the Rényi cross-entropy rate for Markov sources.
Lemma 4.
Let , , s and R be defined as above. If is irreducible, then
(11) |
where is the largest positive eigenvalue of
Proof.
Since the non-negative matrix is irreducible, by the Frobenius theorem (e.g., cf. [13, 4]), it has a largest positive eigenvalue
with associated positive eigenvector
b. Let and be the minimum and maximum elements, respectively, of b. Then due to the non-negativity of s,where denotes the Euclidean inner product. Similarly, As a result,
Note that for all , is a constant. Thus
Similarly, we have
Hence,
∎
Another technique can be borrowed from [10] to generalize Lemma 4 to the case where is reducible. First is rewritten in the canonical form detailed in Proposition 1 of [10]. Let be the largest positive eigenvalue of each self-communicating sub-matrix of . For each inessential class let be the largest positive eigenvalue of each class that is reachable from . Define . Then (11) holds.
Appendix A: Shannon-type information measures
Name |
Definition |
---|---|
|
|
|
|
|
Appendix B: Distributions listed in Table I
Name | |
---|---|
(Parameters) | (Support) |
Beta | |
(, ) | |
() | |
Exponential | |
() | |
Gamma | |
(, ) | |
Gaussian | |
(, ) | |
Laplace | |
(, ) |
Notes
-
is the Beta function.
-
is the Gamma function.
References
- [1] (2020) Rényi Generative Adversarial Networks. ArXiv:2006.02479v1. Cited by: §I.
- [2] (2021-08) Least kth-order and Rényi generative adversarial networks. Neural Computation 33 (9), pp. 2473–2510. External Links: ISSN 1530-888X, Link, Document Cited by: §I, §II, §II.
- [3] (2002) Statistical inference. Cengage Learning. Cited by: §III.
- [4] (1996) Discrete stochastic processes. Springer. Cited by: §VI.
- [5] (2013) Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences 249, pp. 124–131. External Links: ISSN 0020-0255, Document, Link Cited by: §I.
- [6] (2014) Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems, Vol. 27, pp. 2672–2680. Cited by: §I.
- [7] (2001-10,) Toeplitz and circulant matrices: a review. Foundations and Trends® in Communications and Information Theory 2, pp. . External Links: Document Cited by: §V.
- [8] (2019) On Jensen-Rényi and Jeffreys-Rényi type -divergences induced by convex functions. Physica A: Statistical Mechanics and its Applications. External Links: Document Cited by: §I.
- [9] (1991) Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 31, pp. 145–151. Cited by: §I.
- [10] (2001) Rényi’s divergence and entropy rates for finite alphabet Markov sources. IEEE Transactions on Information theory 47 (4), pp. 1553–1561. Cited by: §VI, §VI.
- [11] (1961) On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 547–561. Cited by: §I.
- [12] (2021) RGAN: rényi generative adversarial network. SN Computer Science 2 (1), pp. 17. External Links: Document Cited by: §I, §I.
-
[13]
(2006)
Non-negative matrices and markov chains
. Springer Science & Business Media. Cited by: §VI. - [14] (2001-02) Rényi information, loglikelihood and an intrinsic distribution measure. J. Statistical Planning and Inference 93. External Links: Document Cited by: §I.
- [15] (2019) The case for shifting the Rényi entropy. Entropy 21, pp. 1–21. External Links: Link Cited by: §I.
- [16] (2015) -Mutual information. In Proceedings of the IEEE Information Theory and Applications Workshop, pp. 1–6. Cited by: §I.