# Simulation of Random Variables under Rényi Divergence Measures of All Orders

The random variable simulation problem consists in using a k-dimensional i.i.d. random vector X^k with distribution P_X^k to simulate an n-dimensional i.i.d. random vector Y^n so that its distribution is approximately Q_Y^n. In contrast to previous works, in this paper we consider the standard Rényi divergence and two variants of all orders to measure the level of approximation. These two variants are the max-Rényi divergence D_α^max(P,Q) and the sum-Rényi divergence D_α^+(P,Q). When α=∞, these two measures are strong because for any ϵ>0, D_∞^max(P,Q)≤ϵ or D_∞^+(P,Q)≤ϵ implies e^-ϵ≤P(x)/Q(x)≤ e^ϵ for all x. Under these Rényi divergence measures, we characterize the asymptotics of normalized divergences as well as the Rényi conversion rates. The latter is defined as the supremum of n/k such that the Rényi divergences vanish asymptotically. In addition, when the Rényi parameter is in the interval (0,1), the Rényi conversion rates equal the ratio of the Shannon entropies H(P_X)/H(Q_Y), which is consistent with traditional results in which the total variation measure was adopted. When the Rényi parameter is in the interval (1,∞], the Rényi conversion rates are, in general, larger than H(P_X)/H(Q_Y). When specialized to the case in which either P_X or Q_Y is uniform, the simulation problem reduces to the source resolvability and intrinsic randomness problems. The preceding results are used to characterize the asymptotics of Rényi divergences and the Rényi conversion rates for these two cases.

## Authors

• 60 publications
• 77 publications
12/02/2021

### Interval extropy and weighted interval extropy

Recently, Extropy was introduced by Lad, Sanfilippo and Agrò as a comple...
11/01/2020

### Distances between probability distributions of different dimensions

Comparing probability distributions is an indispensable and ubiquitous t...
02/17/2018

### Domination of Sample Maxima and Related Extremal Dependence Measures

For a given d-dimensional distribution function (df) H we introduce the ...
02/02/2020

### Fast Generating A Large Number of Gumbel-Max Variables

The well-known Gumbel-Max Trick for sampling elements from a categorical...
02/03/2020

### Limit Distribution for Smooth Total Variation and χ^2-Divergence in High Dimensions

Statistical divergences are ubiquitous in machine learning as tools for ...
08/06/2018

### Beyond the Central Limit Theorem: Universal and Non-universal Simulations of Random Variables by General Mappings

The Central Limit Theorem states that a standard Gaussian random variabl...
02/10/2019

### Playing Games with Bounded Entropy: Convergence Rate and Approximate Equilibria

We consider zero-sum repeated games in which the players are restricted ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

How can we use a -dimensional i.i.d. random vector with distribution to simulate an -dimensional i.i.d. random vector so that its distribution is approximately ? This is so-called random variable simulation problem or distribution approximation problem [1]. In [1] and [2], the total variation (TV) distance and the Bhattacharyya coefficient (the Rényi divergence of order ) were respectively used to measure the level of approximation. In these works, the asymptotic conversion rate was studied. This rate is defined as the supremum of such that the employed measure vanishes asymptotically as the dimensions and tend to infinity. For both the TV distance and the Bhattacharyya coefficient, the asymptotic (first-order) conversion rates are the same, and both equal to the ratio of the Shannon entropies . Furthermore, in [2], Kumagai and Hayashi also investigated the asymptotic second order conversion rate. Note that by Pinsker’s inequality [3], the Bhattacharyya coefficient (the Rényi divergence of order ) is stronger than the TV distance, i.e., if the Bhattacharyya coefficient tends to (or the Rényi divergence of order tends to ), then the TV distance tends to . In this paper, we strengthen the TV distance and the Bhattacharyya coefficient by considering Rényi divergences of orders in .

As two important special cases of the distribution approximation problem, the source resolvability and intrinsic randomness problems have been extensively studied in the literature, e.g., [4, 5, 6, 7, 8, 9, 1].

1. [leftmargin=*]

2. Resolvability: When

is set to the Bernoulli distribution

, the distribution approximation problem reduces to the source resolvability problem

, i.e., determining how much information is needed to simulate a random process so that it approximates a target output distribution. If the simulation is realized through a given channel, and we require that the channel output approximates a target output distribution, then we obtain the

channel resolvability problem. These resolvability problems were first studied by Han and Verdú [4]. In [4]

, the total variation (TV) distance and the normalized relative entropy (Kullback-Leibler divergence) were used to measure the level of approximation. The resolvability problems with the

unnormalized relative entropy were studied by Hayashi [5, 6]. Recently, Liu, Cuff, and Verdú [7] and Yu and Tan [8] extended the theory of resolvability by respectively using the so-called metric with and various Rényi divergences of orders in to measure the level of approximation. In this paper, we extend the results in [8] to the Rényi divergences of orders in .

3. Intrinsic randomness: When is set to the Bernoulli distribution , the distribution approximation problem reduces to the intrinsic randomness, i.e., determining the amount of randomness contained in a source [9]. Given an arbitrary general source , we approximate, by using , a uniform random number with as large a rate as possible. Vembu and Verdú [9] and Han [1] determined the supremum of achievable uniform random number generation rates by invoking the information spectrum method. In this paper, we extend the results in [9] to the family of Rényi divergence measures.

### I-a Main Contributions

Our main contributions are as follows:

1. [leftmargin=*]

2. For the distribution approximation problem, we use the standard Rényi divergences and , as well as two variants, namely the max-Rényi divergence and the sum-Rényi divergence , to measure the distance between the simulated and target output distributions. For these measures, we consider all orders in . We characterize the asymptotics of these Rényi divergences, as well as the Rényi conversion rates, which are defined as the supremum of to guarantee that the Rényi divergences vanish asymptotically. Interestingly, when the Rényi parameter is in the interval for the measure and in for the measures and (or ), the Rényi conversion rates are simply equal to the ratio of the Shannon entropies . This is consistent with the existing results in [2] where the Rényi parameter is . In contrast if the Rényi parameter is in for the measure and for the measures and (or ), the Rényi conversion rates are, in general, larger than . It is worth noting that the obtained expressions for the asymptotics of Rényi divergences and the Rényi conversion rates involve Rényi entropies of all real orders, even including negative orders. To the best of our knowledge, this is the first time that an explicit operational interpretation of the Rényi entropies of negative orders is provided.

3. When specialized to the cases in which either or is uniform, the preceding results are used to derive results for the source resolvability and intrinsic randomness problems. These results extend the existing results in [4, 9, 1, 8], where the TV distance, the relative entropy, and the Rényi divergences of orders in were used to measure the level of approximation.

### I-B Paper Outline

The rest of this paper is organized as follows. In Subsections I-C and I-D, we introduce several Rényi information quantities and use them to formulate the random variable simulation problem. In Section II, we present our main results on characterizing asymptotics of Rényi divergences and Rényi conversion rates. As consequences, in Sections III and IV, we apply our main results to the problems of Rényi source resolvability and Rényi intrinsic randomness. Finally, we conclude the paper in Section V. For seamless presentation of results, the proofs of all theorems and the notations involved in these proofs are deferred to the appendices.

### I-C Notations and Information Distance Measures

The set of probability measures on

is denoted as , and the set of conditional probability measures on given a variable in is denoted as . For a distribution , the support of is defined as .

We use to denote the type (empirical distribution) of a sequence , and to respectively denote a type of sequences in and a conditional type of sequences in (given a sequence ). For a type , the type class (set of sequences having the same type ) is denoted by . For a conditional type and a sequence , the V-shell of (the set of sequences having the same conditional type given ) is denoted by . The set of types of sequences in is denoted as

 P(n)(X):={Txn:xn∈Xn}. (1)

The set of conditional types of sequences in given a sequence in with the type is denoted as

 P(n)(Y|TX) (2)

For brevity, sometimes we use

to denote the joint distributions

or .

The -typical set of is denoted as

 Tnϵ(QX) :={xn∈Xn:|Txn(x)−QX(x)|≤ϵQX(x),∀x∈X}. (3)

The conditionally -typical set of is denoted as

 Tnϵ(QXY|xn):={yn∈Xn:(xn,yn)∈Tnϵ(QXY)}. (4)

For brevity, sometimes we write and as and respectively.

For a distribution , the Rényi entropy of order111In the literature, the Rényi entropy was defined usually only for orders [10], except for a recent work [11], but here we define it for orders . This is due to the fact that our results involve Rényi entropies of all real orders, even including negative orders. Indeed, in the axiomatic definitions of Rényi entropy and Rényi divergence, Rényi restricted the parameter [10]. However, it is easy to verify that in [10], the postulates 1, 2, 3, 4, and 5’ in the definition of Rényi entropy with and the postulates 6, 7, 8, 9, and 10 in the definition of Rényi divergence with the same function are also satisfied when . It is worth noting that the Rényi entropy for is always non-negative, but the Rényi divergence for is always non-positive. The Rényi divergence of negative orders was studied in [3]. Observe that holds for . Hence we only need to consider the divergences and with , since these divergences completely characterize the divergences and with . Furthermore, it is also worth noting that the Rényi entropy is non-increasing and the Rényi divergence is non-decreasing in for [11, 3]. is defined as

 Hα(PX) :=11−αlog∑x∈supp(PX)PX(x)α, (5)

and the Rényi entropy of order is defined as the limit by taking , respectively. It is known that

 H−∞(PX) =−loginfx∈supp(PX)PX(x); (6) H1(PX) =H(PX) (7) :=−∑x∈supp(PX)PX(x)logPX(x); (8) H+∞(PX) =−logsupx∈supp(PX)PX(x). (9)

Hence the usual Shannon entropy is a special (limiting) case of the Rényi entropy. Some properties of Rényi entropies of all real orders (including negative orders) can be found in a recent work [11], e.g., is monotonically decreasing in throughout the real line, and is monotonically increasing in on and .

For a distribution , the mode entropy222Here the concept of “mode entropy” is consistent with the concept of “mode” in statistics. This is because, in statistics, the mode of a set of data values is the value that appears most often. On the other hand, for a product set , the type class with type has more elements than any other type class, and under the product distribution , the probability values of sequences in the type class is . Hence, under the product distribution , the probability value is the mode of the data values . is defined as

 Hu(PX) (10)

The mode entropy is also known as the cross (Shannon) entropy between and . For a distribution and , the -tilted distribution is defined as

 P(α)X(⋅) :=PαX(⋅)∑x′∈supp(PX)PαX(x′), (11)

and the -tilted cross entropy is defined as

 Huα(PX) :=−∑x∈supp(PX)P(α)X(x)logPX(x). (12)

Obviously, , and for .

Fix distributions . Then the Rényi divergence of order is defined as

 Dα(PX∥QX) :=1α−1log∑x∈supp(PX)PX(x)αQX(x)1−α, (13)

and the Rényi divergence of order is defined as the limit by taking , respectively. It is known that

 D0(PX∥QX) =−log{QX(supp(PX))}; (14) D1(PX∥QX) =D(PX∥QX) (15) :=∑x∈supp(PX)PX(x)logPX(x)QX(x); (16) D∞(PX∥QX) =logsupx∈supp(PX)PX(x)QX(x). (17)

Hence the usual relative entropy is a special case of the Rényi divergence.

We define the max-Rényi divergence as

 Dmaxα(P,Q)=max{Dα(P∥Q),Dα(Q∥P)}, (18)

and the sum-Rényi divergence as

 D+α(P,Q)=Dα(P∥Q)+Dα(Q∥P). (19)

The sum-Rényi divergence reduces to Jeffrey’s divergence [12] when the parameter is set to . Observe that . Hence is “equivalent” to in the sense that for any sequences of distribution pairs , if and only if . Hence in this paper, we only consider the max-Rényi divergence. For ,

 Dmax∞(P,Q) =supx∈X|logP(x)−logQ(x)| (20) =supA⊆X|logP(A)−logQ(A)|. (21)

This expression is similar to the definition of TV distance, hence we term as the logarithmic variation distance.333In [13], is termed the -closeness.

###### Lemma 1.

The following properties hold.

1. is a metric. Similarly, is also a metric.

2. For any , , hence

3. .

The proof of this lemma is omitted.

### I-D Problem Formulation and Result Summary

We consider the distribution approximation problem, which can be described as follows. We are given a target “output” distribution that we would like to simulate. At the same time, we are given a -length sequence of a memoryless source . We would like to design a function such that the distance, according to some divergence measure, of the simulated distribution with and independent copies of the target distribution is minimized. Here we let , where is a fixed positive number known as the rate. We assume the alphabets and are finite. We also assume and , i.e., and are the supports of and , respectively. There are now two fundamental questions associated to this simulation task: (i) As , what is the asymptotic level of approximation as a function of ? (ii) As , what is the maximum rate such that the discrepancy between the distribution and tends to zero? In contrast to previous works on this problem [1, 2], here we employ Rényi divergences , and of all orders to measure the discrepancy between and .

Furthermore, our results are summarized in Table I.

### I-E Mappings

The following two fundamental mappings, illustrated in Fig. 1, will be used in our constructions of the functions described in Subsection I-D.

Consider two (possibly unnormalized) nonnegative measures and . Sort the elements in as such that . Similarly, sort the elements in as such that . Consider two mappings from to  as follows:

• Mapping 1 (Inverse-Transform): If and/or are unnormalized, then normalize them first. Define and . Similarly, for , we define and . Consider the following mapping. For each , is mapped to where . The resulting distribution is denoted as . This mapping is illustrated in Fig. (a)a. For such a mapping, the following properties hold:

1. If where , then . Hence, .

2. If where , then and

 max{12QY(yj),QY(yj)−PX(xi)} ≤PY(yj)≤QY(yj)+PX(xi). (22)
• Mapping 2: Denote with as a sequence of integers such that for , , and or . Obviously . For each , map to . The resulting distribution is denoted as . This mapping is illustrated in Fig. (b)b. For such a mapping, we have

 QY(ym)≤PY(ym)

for ,

 PY(ym)

for , and for .

## Ii Rényi Distribution Approximation

### Ii-a Asymptotics of Rényi Divergences

We first characterize the asymptotics of Rényi divergences , , and , as shown by the following theorems.

###### Theorem 1 (Asymptotics of 1nDα(PYn∥QnY)).

For any , we have

 limn→∞1ninffDα(PYn∥QnY) =supt∈[0,1)⎧⎨⎩tH11−t(QY)−tRH11−α−1αt(PX)⎫⎬⎭. (25)
###### Theorem 2 (Asymptotics of 1nDα(QnY∥PYn)).

For any , we have

 limn→∞1ninffDα(QnY∥PYn) =⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∞,α∈[1,∞] and R>H0(PX)H0(QY);supt∈(0,∞)⎧⎨⎩tH11+α−1αt(QY)−tRH11+t(PX)⎫⎬⎭,α∈[1,∞] and R
###### Theorem 3 (Asymptotics of 1nDmaxα(PYn,QnY)).

For any , we have (27) (given on page 27),

where

 a(t′) =(α1−α−1)t′+1 (28) b(t′) =(1−α1−α)t′+α1−α. (29)
###### Remark 1.

For and , the asymptotic behavior of and depends on how fast converges to . In this paper, we set , i.e., the fastest case. For this case, , if ; and and