# Strong Converse for Hypothesis Testing Against Independence over a Two-Hop Network

By proving a strong converse, we strengthen the weak converse result by Salehkalaibar, Wigger and Wang (2017) concerning hypothesis testing against independence over a two-hop network with communication constraints. Our proof follows by judiciously combining two recently proposed techniques for proving strong converse theorems, namely the strong converse technique via reverse hypercontractivity by Liu, van Handel, and Verdú (2017) and the strong converse technique by Tyagi and Watanabe (2018), in which the authors used a change-of-measure technique and replaced hard Markov constraints with soft information costs. The techniques used in our paper can also be applied to prove strong converse theorems for other multiterminal hypothesis testing against independence problems.

## Authors

• 7 publications
• 18 publications
• 81 publications
06/04/2020

### Strong Converse for Hypothesis Testing Against Independence Over A Noisy Channel

We revisit the hypothesis testing problem against independence over a no...
05/18/2022

### Strong Converses using Change of Measure and Asymptotic Markov Chains

The main contribution of this paper is a strong converse result for K-ho...
04/02/2020

### Strong Converse for Testing Against Independence over a Noisy channel

A distributed binary hypothesis testing (HT) problem over a noisy channe...
05/12/2018

### Strong Converse using Change of Measure Arguments

The strong converse for a coding theorem shows that the optimal asymptot...
06/06/2018

### Distributed Hypothesis Testing with Privacy Constraints

We revisit the distributed hypothesis testing (or hypothesis testing wit...
07/03/2019

### hyppo: A Comprehensive Multivariate Hypothesis Testing Python Package

We introduce hyppo, a unified library for performing multivariate hypoth...
02/07/2019

### Tail behavior of dependent V-statistics and its applications

We establish exponential inequalities and Cramer-type moderate deviation...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Motivated by situations where the source sequence is not available directly and can only be obtained through limited communication with the data collector, Ahlswede and Csiszár [1] proposed the problem of hypothesis testing with a communication constraint. In the setting of [1], there is one encoder and one decoder. The encoder has access to one source sequence and transmits a compressed version of it to the decoder at a limited rate. Given the compressed version and the available source sequence (side information), the decoder knows that the pair of sequences

is generated i.i.d. from one of the two distributions and needs to determine which distribution the pair of sequences is generated from. The goal in this problem is to study the tradeoff between the compression rate and the exponent of the type-II error probability under the constraint that the type-I error probability is either vanishing or non-vanishing. For the special case of testing against independence, Ahlswede and Csiszár provided an exact characterization of the rate-exponent tradeoff. They also derived the so-called strong converse theorem for the problem. This states that the rate-exponent tradeoff cannot be improved even when one is allowed a non-vanishing type-I error probability. However, the characterization the rate-exponent tradeoff for the general case (even in the absence of a strong converse) remains open till date.

Subsequently, the work of Ahlswede and Csiszár was generalized to the distributed setting by Han in [2] who considered hypothesis testing over a Slepian-Wolf network. In this setting, there are two encoders, each of which observes one source sequence and transmits a compressed version of the source to the decoder. The decoder then performs a hypothesis test given these two compression indices. The goal in this problem is to study the tradeoff between the coding rates and the exponent of type-II error probability, under the constraint that the type-I error probability is either vanishing or non-vanishing. Han derived an inner bound to the rate-exponent region. For the special case of zero-rate communication, Shalaby and Papamarcou [3] applied the blowing-up lemma [4] judiciously to derive the exact rate-exponent region and a strong converse theorem. Further generalizations of the work of Ahlswede and Csiszár can be categorized into two classes: non-interactive models where encoders do not communicate with one another [5, 6, 7, 8] and the interactive models where encoders do communicate [9, 10].

We revisit one such interactive model as shown in Figure 1. This problem was considered by Salehkalaibar, Wigger and Wang in [11] and we term the problem as hypothesis testing over a two-hop network

. The main task in this problem is to construct two hypothesis tests between two joint distributions

and . One of these two distributions governs the law of where each copy is generated independently either from and . As shown in Figure 1, the first terminal has knowledge of a source sequence and sends an index to the second terminal, which we call the relay; the relay, given side information and compressed index , makes a guess of the hypothesis and sends another index to the third terminal; the third terminal makes another guess of the hypothesis based on and its own side information . The authors in [11] derived an inner bound for the rate-exponent region and showed that the bound is tight for several special cases, including the case of testing against independence in which . However, even in this simpler case of testing against independence, which is our main concern in this paper, the authors in [11] only established a weak converse.

In this paper, we strengthen the result by Salehkalaibar, Wigger and Wang in [11] by deriving a strong converse for the case of testing against independence. Our proof follows by judiciously combining two recently proposed strong converse techniques by Liu et al. in [12] and by Tyagi and Watanabe in [13]. In [12], the authors proposed a framework to prove strong converse theorems based on functional inequalities and reverse hypercontractivity of Markov semigroups. In particular, they applied their framework to derive strong converse theorems for a collection of problems including the hypothesis testing with communication constraints problem in [1]. In [13], the authors proposed another framework for strong converse proofs, where they used a change-of-measure technique and replaced hard Markov constraints with soft information costs. They also leveraged variational formulas for various information-theoretic quantities; these formulas were introduced by Oohama in [14, 15].

#### Notation

Random variables and their realizations are in upper (e.g., ) and lower case (e.g., ) respectively. All sets are denoted in calligraphic font (e.g., ). We use to denote the complement of . Let

be a random vector of length

and its realization. Given any , we use to denote its type (empirical distribution). All logarithms are base . We use and to denote the set of non-negative real numbers and natural numbers respectively. Given any positive integer , we use to denote . We use to denote the indicator function and use standard asymptotic notation such as

. The set of all probability distributions on a finite set

is denoted as . Given any two random variables and any realization of , we use to denote the conditional distribution . Given a distribution and a function , we use to denote . For information-theoretic quantities, we follow [16]. In particular, when the joint distribution of is , we use and interchangeably. Throughout the paper, for ease of notation, we drop the subscript for distributions when there is no confusion. For example, when the joint distribution of is , we use and interchangeably. For ease of notation, for any , let denote the binary divergence function, i.e., .

## Ii Problem Formulation and Existing Results

### Ii-a Problem Formulation

Fix a joint distribution

satisfying the Markov chain

, i.e.,

 PXYZ(x,y,z)=PXY(x,y)PZ|Y(z|y). (1)

Let , and be induced marginal distributions of . As shown in Figure 1, we consider a two-hop hypothesis testing problem with three terminals. The first terminal, which we term the transmitter, observes a source sequence and sends a compression index to the second terminal, which we term the relay. Given and side information , the relay sends another compression index to the third terminal, which we term the receiver. The main task in this problem is to construct hypothesis tests at both the relay and the receiver to distinguish between

 H0:(Xn,Yn,Zn)∼PnXYZ=PnXYPnZ|Y, (2) H1:(Xn,Yn,Zn)∼PnXPnYPnZ. (3)

For subsequent analyses, we formally define a code for hypothesis testing over a two-hop network as follows.

###### Definition 1.

An -code for hypothesis testing over a two-hop network consists of

• Two encoders:

 f1 :Xn→M1:={1,…,N1}, (4) f2 :M1×Yn→M2:={1,…,N2}, and (5)
• Two decoders

 g1 :M1×Yn→{H0,H1}, (6) g2 :M2×Zn→{H0,H1}. (7)

Given an -code with encoding and decoding functions

, we define acceptance regions for the null hypothesis

at the relay and the receiver as

 AY,n :={(m1,yn):g1(m1,yn)=H0}, (8) AZ,n :={(m2,zn):g2(m2,zn)=H0} (9)

respectively. We also define conditional distributions

 PM1|Xn(m1|xn) :=1{f1(xn1)=m1}, (10) PM2|YnM1(m2|yn,m1) :=1{f2(m1,yn)=m2}. (11)

Thus, for a -code characterized by , the joint distribution of random variables under the null hypothesis is given by

 PXnYnZnM1M2(xn,yn,zn,m1,m2) =PnXYZ(xn,yn,zn)PM1|Xn(m1|xn)PM2|YnM1(m2|yn,m1), (12)

and under the alternative hypothesis is given by

 ¯PXnYnZnM1M2(xn,yn,zn,m1,m2)=PnX(xn)PnY(yn)PnZ(zn)PM1|Xn(m1|xn)PM2|YnM1(m2|yn,m1). (13)

Now, let and be marginal distributions induced by and let and be marginal distributions induced by . Then, we can define the type-I and type-II error probabilities at the relay as

 β1 :=PM1Yn(AcY,n), (14) β2 :=¯PM1Yn(AY,n) (15)

respectively and at the receiver as

 η1 :=PM2Zn(AcZ,n), (16) η2 :=¯PM2Zn(AZ,n) (17)

respectively. Clearly, , and are functions of but we suppress these dependencies for brevity.

Given above definitions, the achievable rate-exponent region for the hypothesis testing problem in a two-hop network is defined as follows.

###### Definition 2.

Given any , a tuple is said to be -achievable if there exists a sequence of -codes such that

 limsupn→∞1nlogNi ≤Ri,∀i∈{1,2}, (18) limsupn→∞β1 ≤ε1, (19) limsupn→∞η1 ≤ε2, (20) liminfn→∞−1nlogβ2 ≥E1, (21) liminfn→∞−1nlogη2 ≥E2. (22)

The closure of the set of all -achievable rate-exponent tuples is called the -rate-exponent region and is denoted as . Furthermore, define the rate-exponent region as

 R :=R(0,0). (23)

### Ii-B Existing Results

In the following, we recall the exact characterization of given by Salehkalaibar, Wigger and Wang [11, Prop. 2]. For this purpose, define the following set of joint distributions

 Q:={QXYZUV∈P(X×Y×Z×U×V):QXYZ=PXYZ, U−X−Y, V−Y−Z}. (24)

Given , define the following set

 R(QXYZUV):={(R1,R2,E1,E2):R1 ≥IQ(U;X),R2≥IQ(V;Y), E1 ≤IQ(U;Y),E2≤IQ(U;Y)+IQ(V;Z)} (25)

Finally, let

 R∗ :=⋃QXYZUV∈QR(QXYZUV). (26)
###### Theorem 1.

The rate-exponent region for the hypothesis testing over a two-hop network problem satisfies

 R=R∗. (27)

In the following, inspired by Oohama’s variational characterization of rate regions for multiuser information theory [15, 14], we provide an alternative characterization of . For this purpose, given any and any , let

 Rb,c,d(QXYZUV) :=−IQ(U;Y)+bIQ(U;X)−c(IQ(U;Y)+IQ(V;Z))+dIQ(V;Y). (28)

be a linear combination of the mutual information terms in (II-B). Furthermore, define

 Rb,c,d :=minQXYZUV∈QRb,c,d(QXYZUV). (29)

An alternative characterization of is given by

 (30)

## Iii Strong Converse Theorem

### Iii-a The case ε1+ε2<1

###### Theorem 2.

Given any such that and any , for any -code such that , , we have

 logβ2+blogN1+clogη2+dlogN2≥nRb,c,d+Θ(n3/4logn). (31)

The proof of Theorem 2 is given in Section IV. Several remarks are in order.

First, using the alternative expression of the rate-exponent region in (30), we conclude that for any such that , we have . This result significantly strengthens the weak converse result in [11, Prop. 2] in which it was shown that .

Second, it appears difficult to establish the strong converse result in Theorem 2 using existing classical techniques including image-size characterizations (a consequence of the blowing-up lemma) [4, 6] and the perturbation approach [17]. In Section IV, we judiciously combine two recently proposed strong converse techniques by Liu, van Handel, and Verdú [12] and by Tyagi and Watanabe [13]. In particular, we use the strong converse technique based on reverse hypercontractivity in [12] to bound the exponent of the type-II error probability at the receiver and the strong converse technique in [13], which leverages an appropriate change-of-measure technique and replaces hard Markov constraints with soft information costs, to analyze the exponent of type-II error probability at the relay. Finally, inspired by the single-letterization steps in [18, Lemma C.2] and [13], we single-letterize the derived multi-letter bounds from the previous steps to obtain the desired result in Theorem 2.

Third, we briefly comment on the apparent necessity of combining the two techniques in [12] and [13] instead of applying just one of them to obtain Theorem 2. The first step to apply the technique in [13] is to construct a “truncated source distribution” which is supported on a smaller set (often defined in terms of the decoding region) and is not too far away from the true source distribution in terms of the relative entropy. For our problem, the source satisfies the Markov chain . If we naïvely apply the techniques in [13], the Markovian property would not hold for the truncated source . On the other hand, it appears rather challenging to extend the techniques in [12] to the hypothesis testing over a multi-hop network problem since the techniques therein rely heavily on constructing semi-groups and it is difficult to devise appropriate forms of such semi-groups to be used and analyzed in this multi-hop setting. Therefore, we carefully combine the two techniques in [12] and [13] to ameliorate the aforementioned problems. In particular, we first use the technique in [13] to construct a truncated source and then let the conditional distribution of given be given by the true conditional source distribution to maintain the Markovian property of the source (see (56)). Subsequently, in the analysis of error exponents, we use the technique in [12] to analyze the exponent of type-II error probability at the receiver to circumvent the need to construct new semi-groups.

Finally, we remark that the techniques (or a subset of the techniques) used to prove Theorem 2 can also be used to establish a strong converse result for other multiterminal hypothesis testing against independence problems, e.g., hypothesis testing over the Gray-Wyner network [7], the interactive hypothesis testing problem [9] and the cascaded hypothesis testing problem [10]. In particular, for the testing against independence case in [10] (as shown in Figure 2), a strong converse result was established by a subset of the present authors in [19].

### Iii-B The case ε1+ε2>1

In this subsection, we consider the case where the sum of type-I error probabilities at the relay and the receiver is upper bounded by a quantity strictly greater than one. For ease of presentation of our results, let

 Q2 :={QXYZU1U2V∈Q(X×Y×Z×U1×U2×V): QXYZ=PXYZ,U1−X−Y, U2−X−Y, V−Y−Z}. (32)

Given any , define the following set of rate-exponent tuples

 ~R(QXYZU1U2V):={(R1,R2,E1,E2): R1≥max{IQ(U1;X),IQ(U2;X)}, R2≥IQ(V;X), E1≤IQ(U1;Y), E2≤IQ(U2;Y)+IQ(V;Z)}. (33)

Furthermore, define

 ~R :=⋃QXYZU1U2V~R(QXYZU1U2V). (34)

Given any and , define the following linear combination of the mutual information terms

 ~Rb1,b2,c,d(QXYZU1U2V) :=−IQ(U1;Y)+b1IQ(U1;X)+b2IQ(U2;X)−c(IQ(U2;Y)+IQ(V;Z))+dIQ(V;Y), (35)

and let

 ~Rb1,b2,c,d :=minQXYZU1U2V~Rb1,b2,c,d(QXYZU1U2V). (36)

Then, based on [15, 14], an alternative characterization of is given by

 ~R (37)

Analogously to Theorem 2, we obtain the following result.

###### Theorem 3.

Given any such that and any , for any -code such that , , we have

 logβ2+b1logN1+b2logN1+clogη2+dlogN2≥n~Rb1,b2,c,d+Θ(n3/4logn). (38)

The proof of Theorem 3 involves applying the proof of Theorem 2 to two special cases of the problem in Figure 1: i) hypothesis testing with communication constraint where the receiver does not exist, and ii) the relay is not required to output a decision. Thus, the proof of Theorem 3 is omitted for brevity.

Using Theorem 3, we obtain the following proposition, which together with the first remark of Theorem 2 provides a strong converse theorem for the problem of hypothesis testing against independence over a two-hop network when .

###### Proposition 4.

For any such that , we have

 R(ε1,ε2)=~R. (39)

The converse proof of Proposition 4 follows from Theorem 3 and the alternative characterization of in (37). The achievability proof is inspired by [6, Theorem 5] and is provided in Appendix -A. The main idea is that we can time-share between two close-to optimal coding schemes, each of which corresponds to one special case of the current problem as mentioned after Theorem 3.

Finally, we remark that the case when is not included. See [6, Sec. III.D] for a discussion of this subtle case.

## Iv Proof of Theorem 2

We present the proof of strong converse theorem for the hypothesis testing over the two-hop network in this section. The proof follows by judiciously combing the techniques in [12] and [13] and is separated into three main steps. First, we construct a truncated source distribution and show that this truncated distribution is not too different from in terms of the relative entropy. Subsequently, we analyze the exponents of type-II error probabilities at the relay and the receiver under the constraint that their type-I error probabilities are non-vanishing. Finally, we single-letterize the constraints on rate and error exponents to obtain desired result in Theorem 2.

To begin with, let us fix an -code with functions such that the type-I error probabilities are bounded above by and respectively, i.e., and .111We note from (19) and (20) that and . Since the terms are immaterial in the subsequent analyses, they are omitted for brevity.

### Iv-a Construction of A Truncated Distribution

Paralleling the definitions of acceptance regions in (8) and (9), we define the following acceptance regions at the relay and the receiver as

 DY,n={(xn,yn):g1(yn,f1(xn))=H0}, (40) DZ,n={(xn,yn,zn):g2(zn,f2(f1(xn),yn))=H0}, (41)

respectively. Note that the only difference between and lies in whether we consider the compression index or the original source sequence . Recalling the definitions of the type-I error probabilities for the relay denoted by in (14) and for the receiver denoted by in (16), and using (40) and (41), we conclude that

 PnXY(DY,n) =1−β1, (42) PnXYZ(DZ,n) =1−η1. (43)

For further analysis, given any , define a conditional acceptance region at the receiver (conditioned on ) as

 G(m2) :={zn:g2(zn,m2)=H0}. (44)

For ease of notation, given any , we use and (here plays the role of in (44)) interchangeably and define the following set

 Bn:={(xn,yn):PnZ|Y(G(xn,yn)|yn)≥1−ε1−ε21+3ε2−ε1}. (45)

Combining (41), (43) and (44), we obtain

 1−ε2 ≤PnXYZ(DZ,n) (46) =∑(xn,yn)∈BnPnXY(xn,yn)PnZ|Y(G(xn,yn)|yn)+∑(xn,yn)∉BnPnXY(xn,yn)PnZ|Y(G(xn,yn)|yn) (47) ≤PnXY(Bn)+(1−PnXY(Bn))1−ε1−ε21+3ε2−ε1. (48)

Thus, we have

 PnXY(Bn)≥3−3ε2+ε14. (49)

For subsequent analyses, let

 μ :=(miny:PY(y)>0PY(y))−1, (50) θn :=√3μnlog8|Y|1−ε1−ε2, (51)

and define the typical set as

 Tn(PY)={yn:|^Pyn(y)−PY(y)|≤θnPY(y)∀y∈Y}. (52)

Using the Chernoff bound, we conclude that when is sufficiently large,

 PnY(Tn(PY))≥1−1−ε1−ε24. (53)

Now, define the following set

 Cn :=Bn∩DY,n∩(Xn×Tn(PY)). (54)

Then, combining (42), (49) and (53), we conclude that when is sufficiently large,

 PnXY(Cn)≥1−PnXY(Bcn)−PnXY(DcY,n)−PnY(Tcn(PY))≥1−ε1−ε22. (55)

Let the truncated distribution be defined as

 P~Xn~Yn~Zn(xn,yn,zn) :=PnXY(xn,yn)1{(xn,yn)∈Cn}PnXY(Cn)PnZ|Y(zn|yn). (56)

Using the result in (55), we have that the marginal distribution satisfies that for any ,

 P~Xn(xn) =∑yn,znP~Xn~Yn~Zn(xn,yn,zn) (57) ≤PnX(xn)PnXY(Cn)≤2PnX(xn)1−ε1−ε2. (58)

Analogously to (58), we obtain that

 P~Yn(yn) ≤2PnY(yn)1−ε1−ε2,∀ yn∈Yn, (59) P~Zn(zn) ≤2PnZ(zn)1−ε1−ε2,∀ zn∈Zn. (60)

Finally, note that

 D(P~Xn~Yn~Zn∥PnXYZ) =D(P~Xn~Yn∥PnXY) (61) =log1PnXY(Cn) (62) ≤log21−ε1−ε2. (63)

### Iv-B Analyses of the Error Exponents of Type-II Error Probabilities

#### Iv-B1 Type-II error probability β2 at the relay

Let and be the outputs of encoders and respectively when the tuple of source sequences is distributed according to defined in (56). Thus, recalling the definitions in (10), (11) and (56), we find that the joint distribution of is given by

 P~Xn~Yn~Zn~M1~M2(xn,yn,zn,m1,m2)=P~Xn~Yn~Zn(xn,yn,zn)PM1|Xn(m1|xn)PM2|YnM1(m2|yn,m1). (64)

Let be induced by . Combining (8) and (56), we conclude that

 P~M1~Yn(AY,n) =∑xn,yn,zn,m1,m2:g1(m1,yn)=H0P~Xn~Yn~Zn~M1~M2(xn,yn,zn,m1,m2) (65) =∑xn,yn:g1(f1(xn),yn)=H0PnXY(xn,yn)1{(xn,yn)∈Cn}PnXY(Cn) (66) =∑xn,ynPnXY(xn,yn)1{(xn,yn)∈Cn}PnXY(Cn) (67) =1. (68)

where (67) follows from the definition of in (40) and the fact that .

Thus, using the data processing inequality for the relative entropy and the definition of in (15), we obtain that

 D(P~M1~Yn∥PM1PnY) ≥Db(P~M1