DeepAI

# Strong Asymptotic Composition Theorems for Sibson Mutual Information

We characterize the growth of the Sibson mutual information, of any order that is at least unity, between a random variable and an increasing set of noisy, conditionally independent observations of the random variable. The Sibson mutual information increases to an order-dependent limit exponentially fast, with an exponent that is order-independent. The result is contrasted with composition theorems in differential privacy.

• 3 publications
• 17 publications
• 14 publications
• 8 publications
12/27/2018

### On mutual information estimation for mixed-pair random variables

We study the mutual information estimation for mixed-pair random variabl...
11/25/2020

### Bounds for Algorithmic Mutual Information and a Unifilar Order Estimator

Inspired by Hilberg's hypothesis, which states that mutual information b...
02/22/2022

### Error Exponent and Strong Converse for Quantum Soft Covering

How well can we approximate a quantum channel output state using a rando...
03/15/2022

### On Suspicious Coincidences and Pointwise Mutual Information

Barlow (1985) hypothesized that the co-occurrence of two events A and B ...
03/10/2020

### Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion

Datasets containing sensitive information are often sequentially analyze...
03/23/2021

A well-known metric for quantifying the similarity between two clusterin...
10/09/2019

### On the Possibility of Rewarding Structure Learning Agents: Mutual Information on Linguistic Random Sets

We present a first attempt to elucidate an Information-Theoretic approac...

## 1 Introduction

In the context of information leakage, composition theorems characterize how leakage increases as a result of multiple, independent, noisy observations of the sensitive data. Equivalently, they characterize how security (or privacy) degrades under the “composition” of multiple observations (or queries). In practice, attacks are often sequential in nature, whether the application is side channels in computer security [8, 15, 16] or database privacy [7, 2, 10]. Thus composition theorems are practically useful. They also raise theoretical questions that are interesting in their own right.

Various composition theorems for differential privacy and its variants have been established [7, 2, 10]. For the information-theoretic metrics of mutual information and maximal leakage [6, 3, 5, 4] (throughout we assume discrete alphabets and base-2 logarithms)

 I(X;Y) =∑x,yP(x,y)logP(x,y)P(x)P(y) (1) L(X→Y) =log∑ymaxx:P(x)>0P(y|x) (2)

and -maximal leakage [9], less is known. While similar theorems have been studied in the case that not known  [13], we assume it is known. For the metrics in (1)-(2) it is straightforward to show the “weak” composition theorem that if are conditionally independent given , then

 I(X;Yn) ≤n∑i=1I(X;Yi) L(X→Yn) ≤n∑i=1L(X→Yi).

These bounds are indeed weak in that if are conditionally i.i.d. given , then as , the right-hand sides tend to infinity while the left-hand sides remain bounded. A “strong” (asymptotic) composition theorem would identify the limit and characterize the speed of convergence.

We prove such a result for both mutual information and maximal leakage. The limits are readily identified as the entropy and -support size, respectively, of the minimal sufficient statistic of given . In both cases, the speed of convergence to the limit is exponential, and the exponent turns out to the same. Specifically, it is the minimum Chernoff information among all pairs of distributions and , where and are distinct realizations of .

Mutual information and maximal leakage are both instances of Sibson mutual information [12, 14, 4], the former being order and the latter being order . The striking fact that the exponents governing the convergence to the limit are the same at these two extreme points suggests that Sibson mutual information of all orders satisfies a strong asymptotic composition theorem, with the convergence rate (but not the limit) being independent of the order. We show that this is indeed the case.

The composition theorems proven here are different in nature from those in the differential privacy literature. Here we assume that the relevant probability distributions are known, and characterize the growth of leakage with repeated looks in terms of those distributions. We also assume that

are conditionally i.i.d. given . Composition theorems in differential privacy consider the worst-case distributions given leakage levels for each of individually, assuming only conditional independence.

Although our motivation is averaging attacks in side channels, the results may have some use in capacity studies of channels with multiple conditionally i.i.d. outputs given the input [1, Prob. 7.20].

## 2 Sibson, Rényi, and Chernoff

The central quantity of this study is the Sibson mutual information.

###### Definition 1 ([12, 14]).

The Sibson mutual information of order between random variables and is defined by

 ISα(X;Y)=αα−1log∑y∈Y(∑x∈XP(x)P(y|x)α)1/α (3)

for and for and by its continuous extensions. These are

 IS1(X;Y) =I(X;Y) IS∞(X;Y) =L(X→Y)

defined in (1)-(2) above.

We are interested in how grows with when are conditionally i.i.d. given for . The question for is meaningful but is not considered here. For , we shall see that the limit is given by a Rényi entropy.

###### Definition 2.

The Rényi entropy of order of a random variable is given by:

 Hα(X)=11−αlog∑x∈XP(x)α (4)

for and for and by its continuous extensions. These are

 H0(X) =log|{x:P(x)>0}| (5) H1(X) =H(X). (6)

where is the regular Shannon entropy.

The speed of convergence of to its limit will turn out to be governed by a Chernoff information.

###### Definition 3 ([1]).

The Chernoff information between two probability mass functions, and , over the same alphabet is given as follows. First, for all and , let:

 Pλ(x)=Pλ(P1,P2,x)=P1(x)λP2(x)1−λ∑x′∈XP1(x′)λP2(x′)1−λ (7)

Then, the Chernoff information is given by:

 C(P1||P2)=D(Pλ∗||P1)=D(Pλ∗||P2) (8)

where is the value of such that the above two relative entropies are equal.

## 3 Main Result

Let be a random variable with finite alphabet . Let

be a vector of discrete random variables with a shared alphabet

. We assume that are conditionally i.i.d. given . We assume, without loss of generality, that and have full support. We may also assume, without loss of generality, that the distributions are unique over , which we call the unique row assumption. For if this is not the case, we can divide into equivalence classes based on their respective distributions and define to be the equivalence class of

. Then both Markov chains

and hold, so

 ISα(X;Yn)=ISα(~X;Yn)

by the data processing inequality for Sibson mutual information [11]. We may then work with in place of . Thus the unique row assumption is without loss of generality.

Note that, again by the data processing inequality, we have

 ISα(X;Yn)≤ISα(X;X)=H1/α(X)

for all and all . Our main result is the following.

###### Theorem 1.

Under the unique row assumption,

 limn→∞ISα(X;Yn)=H1/α(X) (9)

for any and the speed of convergence is independent of in the sense that for all ,

 limn→∞−1nlog(H1/α(X)−ISα(X;Yn))=minx≠x′C(Qx||Qx′).

We prove the result separately for the cases , , and in the next three sections. For this, the following alternate characterization of the exponent is useful. Let denote the distribution of given for a given , and let denote the set of all possible probability distributions over . For any , let denote such that is the smallest relative entropy across all elements of . Ties can be broken by the ordering of .

###### Lemma 2.
 infP∈PD(P||Qx2(P))=minx≠x′C(Qx||Qx′). (10)
###### Proof.

We will prove that:

 infP∈PD(P||Qx2(P))≤minx≠x′C(Qx||Qx′) (11)
 infP∈PD(P||Qx2(P))≥minx≠x′C(Qx||Qx′) (12)

To prove the upper bound, fix and consider and define such that . Then, certainly

 D(Pλ∗||Qx2(Pλ∗))≤C(Qx||Qx′) (13)

since we know of two -values whose corresponding distributions are equidistant to . Note that only depends on and and this inequality holds for any . Hence,

 D(Pλ∗||Qx2(Pλ∗))≤minx≠x′C(Qx||Qx′) (14)

Furthermore, since we know of at least one such that , it must also be true that

 infP∈PD(P||Qx2(P))≤minx≠x′C(Qx||Qx′) (15)

For the lower bound, we first define subsets of :

 Ex={P∈P | D(P||Qx)≤C(Qx||Qx′)} (16)
 Ex′={P∈P | D(P||Qx′)≤C(Qx||Qx′)} (17)

Note that and are convex sets since is convex and that achieves the minimum distance to in and the minimum distance to in (Cover and Thomas Section 11.8).

Choose any . There are three cases to consider, depending on the location of in the -space.

Case 1: and . By construction, and .

Case 2: . Using the Pythagorean theorem for relative entropy (Cover and Thomas Thm 11.6.1),

 D(P||Qx′)≥D(P||Pλ∗)+D(Pλ∗||Qx′) (18)

Case 3: . By the same argument,

 D(P||Qx)≥D(P||Pλ∗)+D(Pλ∗||Qx) (19)

Hence, for any ,

 max{D(P||Qx),D(P||Qx′)}≥C(Qx||Qx′) (20)

Since ,

 infP∈PD(P||Qx2(P))≥minx≠x′C(Qx||Qx′) (21)

Other Notation: We use to denote the set of all possible empirical distributions of . For any , let

 T(P)={yn∈Yn|Pyn=P}

where is the empirical distribution of . Note that may be empty if . We use to denote true distributions of and .

## 4 Proof for Mutual Information (α=1)

We derive separate upper and lower bounds for mutual information. Since , we can equivalently upper and lower bound . For the lower bound,

 −H(X|Yn)≡∑yn∈YnQ(yn)∑x∈XQ(x|yn)logQ(x|yn) (22) =∑P∈Pn∑yn∈T(P)Q(yn)∑x∈XQ(yn|x)Q(x)Q(yn)logQ(yn|x)Q(x)Q(yn) (23) ====⋅log1|T(P)|Q(T(P)|x)Q(x)∑x′∈X1|T(P)|Q(T(P)|x′)Q(x′) (24) =∑P∈Pn∑x∈XQ(T(P)|x)Q(x)logQ(T(P)|x)Q(x)∑x′∈XQ(T(P)|x′)Q(x′) (25) =−∑P∈Pn:Q(T(P))>0[Q(T(P)|x1(P))Q(x1(P)) ====⋅log∑x′∈XQ(T(P)|x′)Q(x′)Q(T(P)|x1(P))Q(x1(P)) ==+∑x≠x1(P):Q(T(P)|x)>0Q(T(P)|x)Q(x) ====⋅log∑x′∈XQ(T(P)|x′)Q(x′)Q(T(P)|x)Q(x)], (26)

due to the convention that . Then, replacing weighted sums over with their largest summand gives

 ≥−∑P∈Pn:Q(T(P))>0[Q(T(P)|x1(P))Q(x1(P)) ====⋅log(1+∑x′≠x1(P)Q(T(P)|x′)Q(x′)Q(T(P)|x1(P))Q(x1(P))) ==+maxx≠x1(P):Q(T(P)|x)>0{Q(T(P)|x)logmaxx′∈XQ(T(P)|x′)Q(T(P)|x)Q(x)}]. (27)

Note that the entire expression inside the summation over is 0 if . Letting and using for the term,

 ≥−∑P∈Pn:Q(T(P))>0[1ln2∑x′≠x1(P)Q(T(P)|x′)Q(x′) ==+maxx≠x1(P):Q(T(P)|x)>0{Q(T(P)|x)} ====⋅log1minx≠x1(P):Q(T(P)|x)>0Q(T(P)|x)⋅Qmin(X)] (28) ≥−∑P∈Pn:Q(T(P))>0[1ln22−nD(P||Qx2(P))+2−nD(P||Qx2(P)) ====⋅[nDsup+log(n+1)|X|Qmin(X)]] (29)

where

 Dsup≡supx,P′∈PD(P′||Qx)<∞D(P′||Qx) (30) =supx,P′∈P:D(P′||Qx)<∞∑y∈YP′(y)logP′(y)Q(y|x) (31) =supx,P′∈P:D(P′||Qx)<∞∑y∈YP′(y)log1Q(y|x)−H(P′) (32) ≤supxlog1minQ(y|x)>0Q(y|x)<∞. (33)

Hence,

 −H(X|Yn) ≥−(n+1)|X|2−nD∗n[1ln2+log(n+1)|X|Qmin(X)+nDsup] (34)

where

 D∗n=minP∈PnD(P||Qx2(P)) (35)

and is its minimizer.

For the upper bound,

 −H(X|Yn) =∑P∈Pn∑x∈XQ(T(P)|x)Q(x)logQ(T(P)|x)Q(x)∑x′∈XQ(T(P)|x′)Q(x′) (36) ≤∑x∈XQ(T(P∗n)|x)Q(x)logQ(T(P∗n)|x)Q(x)∑x′∈XQ(T(P∗n)|x′)Q(x′) (37) ≤Q(T(P∗n)|x1(P∗n))Q(x1(P∗n)) ====⋅logQ(T(P∗n)|x1(P∗n))Q(x1(P∗n))∑x′∈XQ(T(P∗n)|x′)Q(x′) (38) =Q(T(P∗n)|x1(P∗n))Q(x1(P∗n)) ====⋅log[1−∑x′≠x1(P∗n)Q(T(P∗n)|x′)Q(x′)∑x′∈XQ(T(P∗n)|x′)Q(x′)] (39) recalling that −ln(1−x)≥x, ≤−Q(T(P∗n)|x1(P∗n))Q(x1(P∗n)) ====⋅∑x′≠x1(P∗n)Q(T(P∗n)|x′)Q(x′)∑x′∈XQ(T(P∗n)|x′)Q(x′)⋅1ln2 (40)
 ≤−Q(T(P∗n)|x1(P∗n))Q(x1(P∗n)) ====⋅Q(T(P∗n)|x2(P∗n))Q(x2(P∗n))maxx′∈XQ(T(P∗n)|x′)⋅1ln2 (41) ≤−1(n+1)|X|2−nD(P∗n||Qx1(P∗n))Q(x1(P∗n)) ====⋅2−nD∗nQ(x2(P∗n))(n+1)|X|2−nD(P∗n||Qx1(P∗n))⋅1ln2 (42) (43)

As we have now shown that mutual information is upper and lower bounded by expressions of the form for some subexponential sequence , it remains to be shown that this exponent approaches the minimum Chernoff information as .

First, it can be shown using standard continuity arguments that

 limn→∞infP∈PnD(P||Qx2(P))=infP∈PD(P||Qx2(P)) (44)

since is a continuous function of . Finally, we arrive at the desired result using Lemma 2.

## 5 Proof for Maximal Leakage (α=∞)

While the lower bound on can be proven directly, due to space constraints we will instead note that the desired bound can be obtained from (73) to follow by letting . For the upper bound, for fixed , let

 Dx={P∈P|Q(T(P)|x)>Q(T(P)|x′) ∀x′≠x} (45)
 ¯Dx={P∈P|Q(T(P)|x)≥Q(T(P)|x′) ∀x′∈X} (46)

Note that for any and , and for all since

 Q(yn|x)=2−n(D(P||Qx)+H(P)) ∀yn∈T(P). (47)

Fix and a and let be a sequence such that for each and . Then eventually and

 IS∞(X;Yn)≤log∑x∈X∑P∈¯Dx∩PnQ(T(P)|x) (48) =log[|X|−∑x∈X∑P∈Pn∖¯DxQ(T(P)|x)] (49) ≤log[|X|−∑P∈Pn∖¯DxaQ(T(P)|xa)] (50) ≤log[|X|−Q(T(Pn)|xa)], (51)

eventually. Thus for sufficiently large ,

 IS∞(X;Yn) ≤log[|X|−1(n+1)|X|2−nD(Pn||Qxa)] (52) ≤log|X|−1|X|(n+1)|X|2−nD(Pn||Qxa) (53)

Thus,

 limsupn→∞−1nlog(|X|−IS∞(X;Yn)) ≤limn→∞D(Pn||Qxa)=D(P||Qxa). (54)

Since and were arbitrary, the result follows by Lemma 2.

## 6 Proof for (α∈(1,∞))

To lower bound , we use the sets defined in the previous proof:

 (55) =αα−1log∑P∈Pn(∑x∈XQ(x)Q(T(P)|x)α)1/α (56) ≥αα−1log∑x∈X∑P∈Dx∩Pn(∑x′∈XQ(x′)Q(T(P)|x′)α)1/α (57) ≥αα−1log∑x∈XQ(x)1/α∑P∈Dx∩PnQ(T(P)|x) (58) =αα−1log∑x∈XQ(x)1/α(1−∑P∈Pn∖DxQ(T(P)|x)) (59) =αα−1log(∑x∈XQ(x)1/α ==−∑x∈X∑P∈Pn∖DxQ(x)1/αQ(T(P)|x)) (60)

Letting

 R=∑x∈X∑P∈Pn∖DxQ(x)1/αQ(T(P)|x)∑x∈XQ(x)1/α, (61)

we have

 ISα(X;Yn)≥αα−1log{(∑x∈XQ(x)1/α)(1−R)} (62) =H1/α(X)+αα−1log(1−R). (63)

Note that

 ln(1−ϵ)=−∞∑i=1ϵii (64) ≥−ϵ−ϵ2(∞∑i=1ϵi)=−ϵ−ϵ22(1−ϵ) (65)

for . Hence,

 ISα(X;Yn)≥H1/α(X)+α(α−1)ln2(−R−R22(1−R)). (66)

Next we derive an upper bound for .

 (67) ≤∑x∈XQ(x)1/α(n+1)|X|⋅maxx′∈XmaxP∈Pn∖Dx′Q(T(P)|x′)∑x∈XQ(x)1/α (68) =(n+1)|X|⋅maxx∈XmaxP∈Pn∖DxQ(T(P)|x) (69) ≤(n+1)|X|⋅2−n(minx∈XminP∈Pn∖DxD(P||Qx)) (70) ≤(n+1)|X|⋅2−n(minx≠x′infP∈¯Dx′D(P||Qx)) (71) =(n+1)|X|⋅2−n⋅minx≠x′