# Playing Games with Bounded Entropy: Convergence Rate and Approximate Equilibria

We consider zero-sum repeated games in which the players are restricted to strategies that require only a limited amount of randomness. Let v_n be the max-min value of the n stage game; previous works have characterized _n→∞v_n, i.e., the long-run max-min value. Our first contribution is to study the convergence rate of v_n to its limit. To this end, we provide a new tool for simulation of a source (target source) from another source (coin source). Considering the total variation distance as the measure of precision, this tool offers an upper bound for the precision of simulation, which is vanishing exponentially in the difference of Rényi entropies of the coin and target sources. In the second part of paper, we characterize the set of all approximate Nash equilibria achieved in long run. It turns out that this set is in close relation with the long-run max-min value.

## Authors

• 1 publication
• 9 publications
10/05/2021

### Convex-Concave Min-Max Stackelberg Games

Min-max optimization problems (i.e., min-max games) have been attracting...
10/28/2019

### Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

We study a wide class of non-convex non-concave min-max games that gener...
02/11/2021

### Common Information Belief based Dynamic Programs for Stochastic Zero-sum Games with Competing Teams

Decentralized team problems where players have asymmetric information ab...
07/11/2018

### Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization

Motivated by applications in Game Theory, Optimization, and Generative A...
10/15/2019

### Approximate Equilibria in Non-constant-sum Colonel Blotto and Lottery Blotto Games with Large Numbers of Battlefields

In the Colonel Blotto game, two players with a fixed budget simultaneous...
05/10/2010

### How to correctly prune tropical trees

We present tropical games, a generalization of combinatorial min-max gam...
05/31/2018

### Simulation of Random Variables under Rényi Divergence Measures of All Orders

The random variable simulation problem consists in using a k-dimensional...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Nash (1950) showed that all one-shot games have at least one equilibrium in the mixed strategies. Private randomness is required to implement mixed strategies, and consequently a Nash equilibrium may not exist if insufficient random bits are available to the players (See Hubáček et al. (2016) and Budinich and Fortnow (2011)).

Limited randomness in repeated zero-sum games was originally studied by Neyman and Okada (2000) and Gossner and Vieille (2002). Gossner and Vieille (2002) studied a repeated zero-sum game between Alice (the maximizer) and Bob (the minimizer). At the beginning of each stage of the game, Alice observed an independent drawing of a random source with a commonly known distribution. Next, the players played an action which was monitored by the other player. The only source of randomization available to Alice was the outcomes of random source . Thus, Alice had to choose the action of each stage as a deterministic function of the history of her observations, i.e., the random sources revealed up to that stage and the previous actions. However, Bob could freely randomize his actions, and hence, at each stage, he chose his action as a random function of the actions played previously. Generalizing the model of Gossner and Vieille (2002), Valizadeh and Gohari (2017) considered the possibility of leakage of Alice’s random source sequence to Bob; thus, they called it the repeated game with leaked randomness source. In other words, Bob monitored the random source of Alice through a noisy channel. Specifically, let , be a sequence of independent and identically distributed (i.i.d.

) random variables distributed according to a given distribution

. At arbitrary stage , before choosing the actions for that stage, Alice observed , and Bob observed . In this model, Alice and Bob could randomize their actions at each stage just by conditioning their actions to the history of their observations up to that stage.

In this paper, we study two different aspects of the repeated game with leaked randomness sources. Our first contribution is to study the max-min payoff that Alice can secure in a repeated game with finite number of stages. Note that Valizadeh and Gohari (2017) characterized the long run max-min value, i.e., the maximum payoff that Alice can secure regardless of what strategy Bob chooses when the number of stages tends to infinity. More precisely, let be the max-min value of the -stage repeated game with leaked randomness source. Valizadeh and Gohari (2017) characterized . In this paper, we investigate how converges to its limit. To do so, we develop and utilize a new tool for simulation of a source from another source, which we will introduce later in Section 1.1.

Our second contribution is to study the set of equilibria that is implementable by Alice and Bob in the repeated game with leaked randomness sources. As stated above, implementable Nash equilibria do not necessarily exist. However, a relaxed version of Nash equilibria called approximate Nash equilibria may exist. Let and be arbitrary positive numbers. We say a given strategy profile forms a -Nash equilibrium if Alice and Bob do not gain more than and , respectively, by unilaterally changing their corresponding strategies. We characterize the set of -Nash equilibria of the repeated game when the number of stages of the game tends to infinity. This set is characterized in terms of the maximum payoffs that Alice and Bob can secure in long run (long run max-min and min-max values).

Note that in previous works (Neyman and Okada (2000); Gossner and Vieille (2002); Valizadeh and Gohari (2017)), the max-min (or min-max) value of the zero-sum repeated game was achieved by autonomous strategies – a strategy that is indifferent about the actions of the opponent in past stages. Therefore, we address the question as to whether autonomous strategies are sufficient for achieving all implementable approximate Nash equilibria. To do this, we also characterize the set of all approximate Nash equilibria achieved by autonomous strategies in long run. It will turn out that the set of approximate equilibria achieved by autonomous strategies is absolutely smaller than the set of approximate equilibria achieved by arbitrary strategies.

### 1.1 A new tool

A key step in the proofs of Gossner and Vieille (2002) and Valizadeh and Gohari (2017) is to divide the total stages of the repeated game into some blocks such that the actions of the first player in each block (excluding the first block) is generated as a function of the randomness source observed during the previous block.111This strategy is known as the block Markov strategy in information theory and utilized in multi-hop communication settings. In other words, the actions of the first player in each block is simulated from the randomness source observed in the previous block. Since we are interested in the non-asymptotic regime where the number of stages

is given and fixed, we need to carefully optimize over the length of the blocks and also prove a fine estimate on the accuracy of simulation of the actions of each block from the observations of the previous block. Thus, in order to study the repeated game with

stages, we provide a new tool for simulation of a source from another source which is of independent interest.

More precisely, in abstract terms, let and

be arbitrary discrete random variables distributed according to some probability mass function

, and let be a target random variable distributed according to . We would like to simulate from (by using a deterministic function ) in such a way that the resulting random variable, , is almost independent of , and its distribution is close to . Intuitively, if the amount of uncertainty of given is much more than the amount of uncertainty of , then, one might find a simulator satisfying the above conditions. We take the Rényi entropy as our measure of uncertainty, and the total variation distance as our measure of similarity. We prove that there exists a mapping such that for arbitrary ,

 ∥pf(X)Y−pApY∥TV≤2−(1−1α)(Hα(X|Y)−H1α(A)+2), (1)

where denotes the total variation distance, denotes the conditional Rényi entropy (with parameter ) of given , and is the Rényi entropy of with parameter . The main idea to prove Equation (1

) is to relate it to norms of linear maps and then utilize the Riesz-Thorin Interpolation Theorem.

To better understand Equation (1), let us apply it to a sequence of random variables. Assume that , , …, are i.i.d. repetitions according to . Our goal is to simulate , which is an i.i.d. sequence according to . Applying Equation (1) to , , and , we obtain that there exists a mapping such that for arbitrary ,

 ∥pf(Xn)Yn−pAnpYn∥TV ≤2−(1−1α)(nHα(X|Y)−nH1α(A)+2) ≤2−n(1−1α)(Hα(X|Y)−H1α(A)), (2)

where we used the fact that and . Equation (2) shows that the accuracy of simulation is improving exponentially fast in the product of three terms: the block length , the term , and the entropy difference .

Moreover, Equation (1) can be interpreted in a different way: we say that is a measure of randomness if for any discrete random variable , is a non-negative real number. The value quantifies the amount of uncertainty in . Then, is a reasonable measure of randomness only if it is non-increasing under mappings. In other words, if random variable is a deterministic function of random variable , we expect . The question then arises whether the converse to this statement can also be true:

Question: Is there a suitable measure of randomness such that if and only if there is a function such that is distributed according to ?

While the answer to this question is negative, our tool shows that an approximate version of it holds. To see why the answer to this question is negative, let be a binary random variable. Then, has the same amount of randomness as if is a one-to-one function (), and is deterministic if . Therefore, and cannot take values lying between and . However, if we require to have a distribution that is “approximately” equal to , the above question can be revisited. In fact, our tool shows that Rényi entropy is an answer for the approximate version of the above question.

Relation of Equation (1) to previous works: The problem of simulation of a source from another source dates back to the work of Von Neumann (1951), who considered the problem of generating a sequence of i.i.d. fair bits from a given sequence of i.i.d. unfair bits. The algorithm presented by Von Neumann (1951) is universal in the sense that it does not need the knowledge of the distribution of the input bits, and it is exact in the sense that the output bits are exactly fair. Von Neumann (1951)

also offered a non-universal exact algorithm for simulation of a desired continuous distribution from a given continuous random variable with known distribution. A generalization of the algorithm of

Von Neumann (1951) for arbitrary Markov inputs can be found in Elias (1972) and Bernardini and Rinaldo (2018). There are other works that have considered non-exact simulation of a source. Considering the total variation distance as the measure of accuracy, Yassaee et al. (2014) studied non-universal generation of independent fair bits from an i.i.d. sequence of random variables with side information, and Han (2003) considered the simulation of a general sequence from a general input sequence with known distribution. Fundamental limits for generation of arbitrary random sequence from a general sequence of random variables under different measures of accuracy has been studied by Vembu and Verdú (1995) and Yu and Tan (2019).

Above works considered the simulation of an intended long sequence from a long input sequence. In contrast, a different approach for generating random bits (randomness extraction) is to provide results for arbitrary single-letter sources, and then, conclude results for sequences; works of Renner (2008), Hayashi (2011) and Mojahedian et al. (2018) on randomness extraction and privacy amplification lie in this category. The tool we present in this paper generalizes the results of Renner (2008), Hayashi (2011) and Mojahedian et al. (2018); in fact, they considered the special case of simulation of random variable having a uniform distribution over a set (when is uniform, simulating can be interpreted as extracting bits of randomness). Furthermore, in this paper, we adopt the total variation distance as the measure of accuracy which has a close relation with the expected payoff in games. We also use concentration inequalities to provide further refinements (Proposition 15).

The rest of this paper is organized as follows: In Section 2, we introduce the notations of this paper and present a brief discussion of Shannon and Rényi entropy. The repeated game with leaked randomness source is defined in Section 3, where we also provide our results on the convergence rate of the max-min payoff of games with finite number of stages. In Section 3.2, we introduce our tool for simulation of a source from another source. In Section 4, we characterize the set of approximate Nash equilibria achievable in long run. Some of the proofs are presented in Appendices.

## 2 Preliminaries

### 2.1 Notations

In this paper, we use the notation to represent a sequence of variables . The same notation is used to represent sequences of random variables, i.e., . Note that this notation is used for sequences that have two subscripts the same way, i.e., . Calligraphic letters such as represent finite sets, and denotes the cardinality of the finite set . Cartesian product of two sets and is denoted by , and stands for times cartesian product of . The set of natural numbers is represented by , and denotes the set of real numbers. For a real number , is the largest integer less than or equal to , and is the smallest integer greater than or equal to . Furthermore, let and be two real functions on the set of real numbers; we write if and only if there exists a real constant such that for all , we have . We use the notation for real sequences and in the same manner.

The probability mass function (pmf) of a random variable is represented by . When it is obvious from the context, we drop the subscript and use instead of . We say that is drawn i.i.d. from if

 p(xn)=n∏i=1p(xi).

We use to denote the probability simplex on alphabet , i.e.,

the set of all probability distributions on the finite set

. The total variation distance between pmfs and is denoted by and is defined as:

 ∥pX−qX∥TV≜12∑x∈X|pX(x)−qX(x)|.

Some of the properties of the total variation distance are summarized in the following lemma.

###### Lemma 1.

The following properties hold for the total variation distance:

Property 1:

;

Property 2:

;

Property 3:

.

### 2.2 Shannon Entropy

Let and be two random variables with joint probability distribution and respective marginal distributions and . The Shannon entropy of the random variable is defined to be:

 H(X)=∑x∈X−pX(x)log(pX(x)),

where by continuity, and all logarithms in this paper are in base two. Since the Shannon entropy is a function of the pmf , we sometimes write instead of .

The conditional Shannon entropy of given is defined as:

 H(X|Y) =∑(x,y)∈X×Y−pXY(x,y)log(pX|Y(x|y)) =∑y∈YpY(y)H(X|Y=y),

where .

The following properties hold for the entropy function:

• For arbitrary deterministic function , we have .

### 2.3 Rényi Entropy

Let and be two random variables with joint probability distribution and respective marginal distributions and . For arbitrary , the Rényi entropy of random variable with parameter is defined as follows:

 Hα(X)=α1−αlog⎛⎜⎝(∑x∈XpX(x)α)1α⎞⎟⎠=α1−αlog∥pX∥α,

where is the -norm of . Since the Rényi entropy is a function of the pmf , we sometimes write instead of .

The conditional Rényi entropy of given with parameter is defined as:

 Hα(X|Y)=α1−αlog(∑y∈YpY(y)∥pX|Y=y∥α),

where is the conditional distribution of given .

Rényi entropy is related to Shannon entropy by the following relations:

 limα→1Hα(X)=H(X),limα→1Hα(X|Y)=H(X|Y).

Let us fix and consider as a function of . is analytic for all , and hence, differentiable of all orders. In this paper, we are interested in the values of Rényi entropy for . Particularly, for we have:

 d1(X)≜−ddαHα(X)∣∣α=1=12loge(∑x∈Xp(x)(log(p(x)))2−H(X)2).

Note that , and function is convex. Therefore, Jensen’s inequality implies that is non-negative. Using the Taylor expansion, for , we have:

 Hα(X)=H(X)−d1(X)(α−1)+RX(α), (3)

where the remainder is bounded as

 |RX(α)|≤d2(X)(α−1)2, (4)

where

 d2(X)=12 max 1/2≤α′≤2∣∣∣d2Hα(X)dα2∣∣α=α′∣∣∣.

Since and are functions of , instead of them, we will sometimes write and , respectively. Similarly, for the conditional Rényi entropy and for , we have

 Hα(X|Y)=H(X|Y)−d1(X|Y)(α−1)+RX|Y(α), (5)

where is the remainder term, and

 d1(X|Y)=−ddαHα(X|Y)∣∣α=1=∑y∈YpY(y)d(pX|Y=y)+12loge(∑y∈YpY(y)H(X|Y=y)2−H(X|Y)2).

Again, Jensen’s inequality implies that is non-negative. Moreover, the remainder is bounded as

 |RX(α)|≤d2(X|Y)(α−1)2, (6)

where

 d2(X|Y)=12max1/2≤α′≤2∣∣∣d2Hα(X|Y)dα2∣∣α=α′∣∣∣.

A more detailed analysis of the Rényi entropy with respect to the parameter can be found in (Beck and Schögl, 1995, Section 5).

## 3 Repeated games with leaked randomness source: convergence rate

In this section, we revisit the repeated game of Gossner and Vieille (2002). Here, we focus on its general version with a leaked randomness source studied by Valizadeh and Gohari (2017). Valizadeh and Gohari (2017) characterized the max-min value of the repeated game when the number of the stages of the game tends to infinity. In contrast, we let the number of stages of the game be fixed to , and investigate the rate by which the max-min value of the -stage game converges to the long-run max-min value.

### 3.1 Problem statement and results

Consider an stage repeated zero-sum game between players Alice() and Bob() with respective pure action sets and . Let and be the alphabet of randomness sources of Alice and Bob, respectively, and let be a publicly known pmf on . At each stage , random variables and are drawn independent of previous drawings according to , where is observed by Alice and is observed by Bob. Then, Alice and Bob choose respective actions and . At the end of stage , players monitor the chosen actions and , and Alice gets stage payoff from Bob. In order to choose and , players use the history of their observations until stage . Let and denote the history of observation of Alice and Bob (respectively) up to stage . Then, and , where and are deterministic functions by which Alice and Bob map their observations into their actions at stage . Notice that the mappings and are deterministic which means that the only source of randomization are (for Alice) and (for Bob). We call the -tuples and the strategies of Alice and Bob, respectively. The expected average payoff for Alice up to stage induced by strategies and is denoted by :

 λ(σn,τn)=Eσn,τn[1nn∑t=1uAtBt], (7)

where denotes the expectation with respect to the distribution induced by i.i.d. repetitions of and strategies and . Alice wishes to maximize and Bob’s goal is to minimize it.

We will refer to the above game with “the repeated game with leaked randomness source”. Another variant of this game, called “the repeated game with non-causal leaked randomness source” is defined in the following remark.

###### Remark 2.

In the definition of the repeated game with leaked randomness source, we assumed that the randomness sources and are revealed to Alice and Bob causally as the game is played out. However, we can also consider the non-causal case in which the sources and are observed by Alice and Bob (respectively) before the game starts. In this case we have and . In order to distinguish the above two cases, we name the non-causal game as “the repeated game with non-causal leaked randomness source”.

###### Definition 3.

Let be an arbitrary real value:

• Alice can secure in the stage repeated game if there exists a strategy for Alice such that for all strategy of Bob we have . The maximum of the set of payoffs that Alice can secure in the stage repeated game is called the max-min value of the -stage game.

• Alice can secure in long run if there exists a sequence of strategies for Alice such that for all sequences of strategies of Bob we have . The supremum of the set of payoffs that Alice can secure in long run is called the long run max-min value of the game.

The set of all payoffs that can be secured in long run in the repeated game with leaked randomness source is characterized by Valizadeh and Gohari (2017) and restated here as Theorem 5. Before presenting Theorem 5, we need the following definition.

###### Definition 4.

In a stage game, the security level of mixed action for Alice is denoted by , and is defined as follows:

 U(A)(pA)=minb∈B∑a∈ApA(a)uab. (8)

Furthermore, the maximum payoff that Alice can secure in a stage game, by playing mixed actions of entropy at most , is denoted by , and is defined as:

 J(A)(h)=maxpA∈Δ(A),H(pA)≤hU(A)(pA). (9)
###### Theorem 5 (Valizadeh and Gohari (2017)).

Let be the upper concave envelope of defined in Definition 4. In the repeated game with leaked randomness source, Alice can secure in long run if and only if . Furthermore, in stage game, Alice can secure only if .

Theorem 5 implies that the long run max-min value of the repeated game with leaked randomness source is . Moreover, the max-min value of the -stage game is at most . In the following theorems we discuss how the max-min value of the -stage game converges to as increases.

###### Theorem 6.

In the repeated game with leaked randomness source, there exist real numbers , , and , such that the following property holds: for arbitrary sequences , and satisfying , , and , one can find a sequence of strategies such that for all sequences of strategies of Bob and for all we have

 λ(σn,τn)≥J(A)cav(H(X|Y))−μ(1n+1fn+fnn+gn+2−12(nfn−1)hn(βgn−γhn)). (10)

We give an intuitive description of the terms in Equation (10) in Discussion 8 below. The formal proof of Theorem 6 is presented in Section 3.3.

###### Corollary 7.

In the repeated game with leaked randomness source, for each , let denote the max-min value of the -stage game. converges to with a rate of at least . To see this, let be the values in the statement of Theorem 6, and let be an arbitrary positive number such that and . Define , , and . Then, Theorem 6 implies that there exists a sequence of strategies such that for all sequences of strategies of Bob, and for all , we have

 λ(σn,τn)≥J(A)cav(H(X|Y))−O(gn)=J(A)cav(H(X|Y))−O(√logn4√n).

To see this, observe that is decaying faster than . And

 2−12(nfn−1)hn(βgn−γhn)=O(2−√ng2n2r2)=O(1√n).
###### Discussion 8.

We explain Equation (10) at an intuitive level. To generate the strategies of Theorem 6, we divide the total stages almost uniformly into blocks such that the actions of each block (besides the first block) is generated as a function of the randomness source observed during the previous block, and in all stages of the first block, an arbitrary action is played. Therefore, some payoff is lost during the first block; the term in Equation (10) corresponds with this loss. On the other hand, by dividing the total stages into blocks we get blocks of length at least . This affects the precision of the simulation of the intended distribution of actions from the randomness source observed in previous block, which is reflected in the term

 2−12(nfn−1)hn(βgn−γhn). (11)

This equation should be compared with (2), where the exponent of the simulation error is expressed as the product of three terms: the block length, a term , and the entropy difference . The term appears in Equation (11) as the block length (the lengths of each of the blocks is at least ). The sequence is a proxy for the term . Finally, considering the last term , we see that larger entropy difference yields better simulation performance. On the other hand, requirement of a large entropy difference restricts the set of action distributions and results in a payoff loss. The sequence is responsible for this trade-off. Larger results in more loss in payoff (the term in Equation 10) but a more accurate simulation (the term in the exponent of the exponential term in Equation 10).

Next, consider the repeated game with non-causal leaked randomness source (see Remark 2), where the players observe the whole sequence of their corresponding randomness sources before the game starts. We claim the following result:

###### Theorem 9.

In the repeated game with non-causal leaked randomness source (as described in Remark 2), there exist real numbers , , and with the following property: for arbitrary sequences of positive numbers and satisfying and , there exists a sequence of strategies such that for all sequences of strategies of Bob and for all we have

 λ(σn,τn)≥J(A)cav(H(X|Y))−μ(1n+gn+2−12nhn(βgn−γhn)). (12)

Proof of Theorem 9 is given in Section 3.4.

###### Corollary 10.

In the repeated game with non-causal leaked randomness source, for each , let denote the max-min value of the -stage game. converges to with a rate of at least . To see this, let be the values in the statement of Theorem 9, and let be an arbitrary positive number such that and . Define , and . Then, using similar calculations as in Corollary 7, Theorem 9 implies that there exists a sequence of strategies such that for all sequences of strategies of Bob, and for all , we have

 λ(σn,τn)≥J(A)cav(H(X|Y))−O(√logn√n).

Theorem 6 and Theorem 9 provide a convergence rate for general games. However, in some special cases we can derive faster convergence rates for the max-min value of the game. The following theorem provides a special case in which an exponential convergence is obtained.

###### Theorem 11.

Let be an equilibrium strategy for Alice in the one stage game, i.e.,

 qA∈argmaxpA∈Δ(A) minb∈B∑a∈ApA(a)uab.

If , then, in the repeated game with non-causal leaked randomness source, there exist real numbers , and a sequence of strategies such that for all sequences of strategies of Bob and for all , we have

 λ(σn,τn)≥J(A)cav(H(X|Y))−γ2−βn. (13)

The proof of Theorem 11 is provided in Section 3.5.

### 3.2 A technical tool: simulation of a source from another source

To prove the results of Section 3.1, we need a technical tool provided in this section. Here, we study the simulation of a desired single letter source from a given single letter source . We assume that is correlated with a side information , and we would like the generated source to be almost independent of the side information . More precisely, we have the following definition:

###### Definition 12.

Let be distributed according to , and be distributed according to . We say that the deterministic mapping simulates from with precision if we have

 ∥pf(X)Y−pApY∥TV≤ϵ,

where

is the joint distribution of

and .

According to the above definition, we are interested in a deterministic mapping that simulates from . However, we utilize the probabilistic method and random mappings, as a tool to ultimately prove existence of a suitable deterministic mapping. Therefore, we now define a random mapping and proceed by proving some properties for it. These properties will then lead to the construction of the desired deterministic mapping.

To specify a deterministic mapping , we need to specify the value of for all . To specify a random mapping , we need to specify the joint distribution of the random variables for .

###### Definition 13.

is a random mapping constructed as follows: assume that for different values of are i.i.d. according to . In other words, given string of symbols for all ,

 \rm{Pr}[F(x)=ax,∀x∈X]=∏x∈X\rm{Pr}[F(x)=ax]=∏x∈XpA(ax),

The above construction of the random mapping defines a probability measure on the set of all mappings denoted by .

###### Lemma 14.

Let be distributed according to and according to . Furthermore, let be the random mapping defined in Definition 13. Then,

 ∑f∈FpF(f)∥pf(X)Y−pApY∥TV≤minα∈[1,2](2−(1−1α)(Hα(X|Y)−H1α(A)+2)), (14)

where is the joint distribution of and . Consequently, there exists a deterministic mapping such that for all , we have

 ∥pf(X)Y−pApY∥TV≤2−(1−1α)(Hα(X|Y)−H1α(A)+2). (15)

Proof of Lemma 14 is provided in Appendix A.

While the above inequality ensures the existence of a deterministic mapping where (15) holds, it does not provide an explicit mapping . An explicit construction is desirable from an algorithmic perspective. In the following, we address this issue by showing that any randomly chosen mapping would almost satisfy (15) with very high probability.

Let . The quantity is random because is random. Thus, random variable is a function of the random variable , i.e., takes value with probability . Hence, Lemma 14 implies that for all ,

 E[DTV]≤2−(1−1α)(Hα(X|Y)−H1α(A)+2).

We claim the following bound on how concentrates around its expected value.

###### Proposition 15.

For the random variable , we have

 \rm{Pr}[∣∣DTV−E[DTV]∣∣>t]≤2e−2t22H2(X).

Proof of Proposition 15 is presented in Appendix B.

One application of Proposition 15 is for simulation of i.i.d. sequences. Let be i.i.d. according to , and let be i.i.d. according to . Assume that so that simulation of with arbitrary precision is possible. Let be the random mapping of Definition 13, where is replaced with . Let us choose such that (note that such a real number exists since , and Rényi entropy converges to Shannon entropy as tends to ). Let be a positive number such that

 ϵ≤(1−1α)(Hα(X|Y)−H1α(A)),ϵ<12H2(X).

Then, Lemma 14 implies

 E[DTV]≤2−ϵn, (16)

where . Furthermore, from Proposition 15, for , we have

 \rm{Pr}[∣∣DTV−E[DTV]∣∣>2−ϵn]≤2e−2(H2(X)−2ϵ)n.

The above equation along with Equation (16) and definition implies

 \rm{Pr}[DTV≥2×2−ϵn]≤2e−2δn.

In other words, the outcome of the random mapping , with probability at least (converging double exponentially to ) will simulate with precision at most (decaying exponentially in ).

### 3.3 Proof of Theorem 6

Let us divide the total stages, , into blocks, where is the arbitrary sequence of natural numbers in the statement of the theorem. Let be the remainder of divided by , i.e., . Then, the number of stages in each block, , is computed as follows:

 Nn,i={⌊n/fn⌋+1i=1,…,kn⌊n/fn⌋i=kn+1,…,fn. (17)

In other words, first, all blocks get stages, then, the remaining stages are assigned to the first blocks.

Let and denote the sequence of actions played in block by Alice and Bob, respectively. Similarly, let