# Strong Converse for Testing Against Independence over a Noisy channel

A distributed binary hypothesis testing (HT) problem over a noisy channel studied previously by the authors is investigated from the perspective of the strong converse property. It was shown by Ahlswede and Csiszár that a strong converse holds in the above setting when the channel is rate-limited and noiseless. Motivated by this observation, we show that the strong converse continues to hold in the noisy channel setting for a special case of HT known as testing against independence (TAI). The proof utilizes the blowing up lemma and the recent change of measure technique of Tyagi and Watanabe as the key tools.

## Authors

• 9 publications
• 144 publications
06/04/2020

### Strong Converse for Hypothesis Testing Against Independence Over A Noisy Channel

We revisit the hypothesis testing problem against independence over a no...
08/16/2018

### Strong Converse for Hypothesis Testing Against Independence over a Two-Hop Network

By proving a strong converse, we strengthen the weak converse result by ...
06/04/2020

### Hypothesis Testing with Privacy Constraints Over A Noisy Channel

We consider a hypothesis testing problem with privacy constraints over a...
08/21/2019

### Distributed Hypothesis Testing over a Noisy Channel: Error-exponents Trade-off

A distributed hypothesis testing problem with two parties, one referred ...
07/24/2018

### Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations

It is often claimed that Bayesian methods, in particular Bayes factor me...
05/18/2022

### Strong Converses using Change of Measure and Asymptotic Markov Chains

The main contribution of this paper is a strong converse result for K-ho...
03/20/2013

### "Conditional Inter-Causally Independent" Node Distributions, a Property of "Noisy-Or" Models

This paper examines the interdependence generated between two parent nod...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

This work is supported in part by the European Research Council (ERC) through Starting Grant BEACON (agreement #677854).

In their seminal paper [1]

, Ahlswede and Csiszár studied a distributed binary hypothesis testing (HT) problem for the joint probability distribution of two correlated discrete memoryless sources. In their setting, one of the sources, denoted by

, is observed directly at the detector, which performs the test, and the other, denoted by , needs to be communicated to the detector from a remote node, referred to as the observer, over a noiseless channel with a transmission rate constraint. Given that independently drawn samples are available at the respective nodes, the two hypotheses are represented using the following null and alternate hypotheses:

 H0: (Un,Vn)∼n∏i=1PUV, (1a) H1: (Un,Vn)∼n∏i=1QUV. (1b)

The objective is to study the trade-off between the transmission rate, and the type I and type II error probabilities in HT. This problem has been extensively studied thereafter

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Also, several interesting variants of the basic problem have been considered which includes extensions to multi-terminal settings [15, 16, 17, 18, 19], HT under security or privacy constraints [20, 21, 22, 23], HT with lossy compression [24], HT in interactive settings [25, 26, 27], HT with successive refinement [28], to name a few.

In this work, we revisit the setting shown in Fig. 1 which has been considered previously in [11]. In here, the communication from the observer to the detector happens over a discrete memoryless channel (DMC). Denoting the transition probability matrix of the DMC by , the channel output given the input is generated according to the probability law . The observer encodes its observations according to the map111In [11], we allow bandwidth mismatch, i.e., the encoder map is given by , where and are positive integers satisfying for some fixed . Here, we consider the special case () for simplicity of notation. However, our results extend to any straightforwardly. to the stochastic map , where denotes the set of all probability distributions over . The detector outputs the decision according to the stochastic map , where and denotes the set of all probability distributions over support

. Denoting the true hypothesis as the random variable (r.v.)

, the type I and type II error probability for a given encoder-decoder pair, , are given by

 αn(f(n),g(n)) =P(^H=1|H=0) =P(g(n)(Yn,Vn)=1|H=0), (2)

and

 βn(f(n),g(n)) =P(^H=0|H=1) =P(g(n)(Yn,Vn)=0|H=1), (3)

respectively. In [1] and [11], the goal is to obtain a computable characterization of the optimal type II error exponent (henceforth referred to as the error-exponent), i.e., the maximum asymptotic value of the exponent of the type II error probability, for a fixed non-zero constraint,

, on the type I error probability. We next define the trade-off studied in

[11] more precisely.

###### Definition 1.

An error-exponent is -achievable if there exists a sequence of encoding functions and decision rules such that

 liminfn→∞−1nlog(βn(f(n),g(n)))≥κ, (4a) and αn(f(n),g(n))≤ϵ. (4b)

For , let

 κ(ϵ) :=sup{κ′:κ′ is ϵ-% achievable}. (5)

It is well known that since the quantity of interest is the type II error-exponent, can be restricted to be a deterministic map without any loss of generality (see [22, Lemma 3]). The decision rule can then be represented as for some , where denotes the indicator function.

It is shown in [11] that has an exact single-letter characterization for the special case known as testing against independence (TAI), in which, factors as a product of marginals of , i.e., . To state the result, let denote the capacity of the channel , and let

 θ(PUV,C):=sup{I(V;W):∃ W s.t. I(U;W)≤C,V−U−W.}. (6)

It is proved in [11, Proposition 7] that

 limϵ→0κ(ϵ)=θ(PUV,C). (7)

In this paper, we show the strong converse for the above result, namely, that

 κ(ϵ)=θ(PUV,C), ∀ ϵ∈(0,1). (8)

This result completes the characterization of in terms of for all values of , and extends the strong converse result proved in [1, Proposition 2] for the special case of rate-limited noiseless channels. However, it is to be noted that while the strong converse proved in [1] holds for all hypothesis tests given in (1), our result is limited to TAI.

Before delving into the proof, we briefly describe the technique and tools used in [1] to prove the strong converse, and highlight the challenges of extending their proof to the noisy channel setting. The key tools used to prove [1, Proposition 2] are the so-called blowing-up lemma [29] and a covering lemma [1]. However, it can be seen from the proof therein that the application of the covering lemma to prove the strong converse relies crucially on the fact that the channel from the encoder to the detector is noiseless (i.e. deterministic). Thus, it is not possible to directly follow their technique in our noisy channel setting and arrive at the strong converse result. Alternatively, we will use a change of measure technique introduced in [30], in conjunction with the blowing-up lemma to arrive at our desired result.

The change of measure technique by itself does not appear sufficient for proving a strong converse in our setting. This is so because a critical aspect for the technique to work is to find a (decoding) set of non-vanishing probability (with respect to

) under the null hypothesis such that for a given

satisfying the type I error probability constraint and each , with probability one (or tending to one with ), where . Note that in the noiseless channel case, the set satisfying the above conditions can be obtained by simply taking

as is done in [18] for a deterministic . However, this is no longer possible when the channel is noisy. To tackle this issue, we first obtain a set of sufficiently large probability under the null hypothesis such that for each , with a positive probability bounded away from zero. The blowing-up lemma then guarantees that it is possible to obtain a modified decision region such that uniformly for each , with an overwhelmingly large probability. This enables us to prove the strong converse in our setting via the technique in [30].

We next state a a non-asymptotic version of the blowing up lemma stated in [31], which will be used in the proof of Theorem 4 below. For any set , let denote the Hamming neighbourhood of , i.e.,

 Γl(D) :={~zn∈Zn:dH(zn,~zn)≤l for some zn∈D}, (9)

where

 dH(zn,~zn)=n∑i=11(zi≠~zi). (10)
###### Lemma 2.

[31] Let be independent r.v.’s taking values in a finite set . Then, for any set with ,

 PZn(Γl(D)) ≥1−e⎡⎢ ⎢⎣−2n⎛⎜⎝l− ⎷n2log(1PZn(D))⎞⎟⎠2⎤⎥ ⎥⎦, (11) ∀ l> ⎷n2log(1PZn(D)).

Lemma 3 stated below is a characterization of in terms of hyper-planes in the error exponent-capacity region.

###### Lemma 3.
 θ(PUV,C) =infμ>0θμ(PUV,C), (12)

where

 θμ(PUV,C):=supPW|U:V−U−WI(V;W)+μ(C−I(U;W)). (13)
###### Proof:

Let

 R:={(θ,C)∈R2: ∃ W s.t. V−U−W, θ≤I(V;W)  and I(U;W)≤C}.

By the Fenchel-Eggleston-Caratheodory theorem [32], it is sufficient to take in (I). Hence, the set is a closed convex set, and can be represented via the intersection of half spaces as

 R:=⋂μ>0{(θ,C):θ−μC≤Rμ}, (14)

where

 Rμ:=maxW:W−U−VI(V;W)−μI(U;W). (15)

This implies that

 θ(PUV,C) :=sup{θ:(θ,C)∈R} (16) =infμ>0Rμ+μC (17) =infμ>0θμ(PUV,C). (18)

## Ii Main result

The main result of the paper is stated next. We will assume that the channel transition matrix has non-zero entries, i.e.,

 \lx@converttounder¯p:=min(x,y)∈X×YPY|X(y|x)>0. (19)
###### Theorem 4.
 κ(ϵ)=θ, ∀ ϵ∈(0,1). (20)
###### Proof:

Let and denote an encoder-decoder pair specified by and , respectively, that satisfies (4b).
Constructing reliable decision regions and :
Note that can be written in the form

 An=⋃vn∈VnA(vn)×{vn}, (21)

where .
Let

 Bn(γ):={(un,vn,xn): PXn|Un(xn|un)>0 and  PYn|Xn(A(vn)|xn)≥γ}. (22)

Then, it follows from (4b) that for sufficiently large ,

Selecting yields

 (23)

Let

 ¯Bn :=B(1−ϵ2), Bvn :={(un,xn):(un,vn,xn)∈¯Bn}, ^Bn :={(vn,xn): (un,vn,xn)∈¯Bn for some un∈Un}, ln :=⌈1√2(√nb(n)+√nlog(1+ϵ1−ϵ))⌉, ¯A(vn) :=Γln(A(vn)),

where is a function (that will be optimized later) such that . It follows from Lemma 2 that

 PYn|Xn(¯A(vn)|xn) ≥ϵ′n:=1−e−b(n)(n)−−→1, (24)

for every , since

 PYn|Xn(A(vn)|xn)≥0.5(1−ϵ). (25)

Also, for any , using (9) we can write that

 PYn|Xn(¯A(vn)|xn) ≤∑yn∈A(vn) ∑~yn∈Γln(yn)PYn|Xn(~yn|xn) ≤∑yn∈A(vn) ∑~yn∈Γln(yn)PYn|Xn(yn|xn)\lx@converttounder¯p−ln (26) ≤∑yn∈A(vn)PYn|Xn(yn|xn) (nln) |Y|ln\lx@converttounder¯p−ln ≤(|Y|ne)ln(\lx@converttounder¯p ln)−lnPYn|Xn(A(vn)|xn), (27)

where, (26) follows since for each and ,

 PYn|Xn(~yn|xn) \lx@converttounder¯pln≤PYn|Xn(yn|xn),

and (27) is due to the inequality .

Let the new decision rule be given by , where

 ¯An:=⋃vn∈Vn¯A(vn)×{vn}. (28)

Note that it follows from (27) that

 βn(f(n),¯g(n))≤βn(f(n),g(n))(|Y|ne\lx@converttounder¯pln)ln. (29)

Change of measure via construction of a truncated
distribution:
We now use the change of measure technique in [30] by considering the new decision rule (with acceptance region for ) to prove the strong converse. To that purpose, define a new truncated distribution

 P~Un~Vn~Xn~Yn(un,vn,xn,yn) :=PUnVn(un,vn)PXn|Un(xn|un)PUnVnXn(¯Bn) 1((un,vn,xn)∈¯Bn) PYn|Xn(yn|xn). (30)

Bounding type II error-exponent via the weak converse:
From (24) and (30), note that the type I error probability for the hypothesis test between distributions and (under the null and alternate hypotheses, respectively), channel input , and decision rule tends to zero asymptotically as . Then, by the weak converse for HT based on the data processing inequality for KL divergence (see [1], [11]), it follows that

 −log(βn(f(n),¯g(n))) ≤1ϵ′n(D(P~Vn~Yn||\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0PVn×PYn)+log2). (31)

Next, note that for such that , we have

 P~Vn(vn) =∑(un,xn)∈BvnP~Un~Vn~Xn(un,vn,xn) =1P~Un~Vn~Xn(¯Bn)∑(un,xn)∈BvnPUnVnXn(un,vn) ≤PVn(vn)P~Un~Vn~Xn(¯Bn)\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0≤1+ϵ1−ϵPVn(vn). (32)

Similarly, for all , we have

 P~Yn(yn) ≤PYn(yn)P~Un~Vn~Xn(¯Bn)\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0≤1+ϵ1−ϵPYn(yn). (33)

Substituting (32) and (33) in (31) yields

 −log(βn(f(n),¯g(n))) +log2) =1ϵ′n(I(~Vn;~Yn)+2log(1+ϵ1−ϵ)+log2). (34)

Combining (34) with (29), we obtain that

 −log(βn(f(n),g(n))) ≤1ϵ′n(I(~Vn;~Yn)+2log(1+ϵ1−ϵ)+log2) +lnlog(|Y|ne\lx@converttounder¯pln) :=ζn+1ϵ′nI(~Vn;~Yn). (35)

Now, notice from (23) and (30) that

 D(P~Un||PUn) ≤D(P~Un~Vn~Xn~Yn||PUnVnXnYn) (36) =D(P~Un~Vn~Xn||PUnVnXn) =log(1PUnVnXn(¯Bn)) \definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0≤log(1+ϵ1−ϵ), (37)

where (36) follows from the log-sum inequality [33]. Also, observe from (30

) that the Markov chain

holds under , and that . From this, it follows via the data processing inequality that

 I(~Un;~Yn)≤I(~Xn;~Yn)≤nC. (38)

Thus, we have for any that

 −ϵ′nlog(βn(f(n),g(n))) ≤I(~Vn;~Yn)+nμC−μI(~Un;~Yn)+ϵ′nζn ≤I(~Vn;~Yn)+nμC−μI(~Un;~Yn)+ϵ′nζn −νI(~Vn;~Yn|~Un)−νD(P~Un~Vn||PUnVn) −μD(P~Un||PUn)+(ν+μ)log(1+ϵ1−ϵ) (39) =R(n)μ,ν+(ν+μ)log(1+ϵ1−ϵ)+ϵ′nζn, (40)

where

 R(n)μ,ν :=I(~Vn;~Yn)+nμC−μ(I(~Un;~Yn)+D(P~Un||PUn)) −ν(I(~Vn;~Yn|~Un)+D(P~Un~Vn||PUnVn)). (41)

Equation (39) follows from (37) and the fact that (which in turn holds due to the Markov chain under distribution ).
Single-letterization of and applying Lemma 3
We will show in Appendix A that single-letterizes, i.e.,

 R(n)μ,ν≤nRsμ,ν(PUV,C), (42)

where

 Rsμ,ν(PUV,C) (43)

By the Fenchel-Eggleston-Caratheodory theorem [32], can be restricted to be finite (a function of and ) in the maximization in (43). Thus, the supremum in (43) is actually a maximum. Assuming (42) holds, we can write from (40) that

 −ϵ′nlog(βn(f(n),g(n))) ≤nRsμ,ν(PUV,C)+(ν+μ)log(1+ϵ1−ϵ)+ϵ′nζn. (44)

For a given , , let achieve the maximum in (43). Then, we can write for that

 Rsμ,ν(PUV,C) (45) =I(Vμ,ν;Wμ,ν)+μC−μI(Uμ,ν;Wμ,ν) −νI(Vμ,ν;Wμ,ν|Uμ,ν)−(ν+μ)D(PUμ,νVμ,ν||PUV) (46) ≤I(Vμ,ν;Wμ,ν)+μC−μI(Uμ,ν;Wμ,ν) (47) ≤I(V;Wμ,ν)+μC−μI(U;Wμ,ν) (48) +|I(Vμ,ν;Wμ,ν)−I(V;Wμ,ν)| (49) +μ|I(Uμ,ν;Wμ,ν)−I(U;Wμ,ν)| (50) ≤θμ(PUV,C)+|I(Vμ,ν;Wμ,ν)−I(V;Wμ,ν)| (51) +μ|I(Uμ,ν;Wμ,ν)−I(U;Wμ,ν)|. (52)

We next upper bound the second and third terms in (52) similar in spirit to [18]. Note that

 Rsμ,ν(PUV,C) ≥infμ>0,ν>0Rsμ,ν(PUV,C) ≥θ(PUV,C)≥I(V;Wμ,ν)+μC−μI(U;Wμ,ν). (53)

Then, we can write that

 νD(PUμ,νVμ,νWμ,ν||PUVWμ,ν) =ν(I(Vμ,ν;Wμ,ν|Uμ,ν)+D(PUμ,νVμ,ν||PUV)) ≤|I(Vμ,ν;Wμ,ν)−I(V;Wμ,ν)| +μ|I(Uμ,ν;Wμ,ν)−I(U;Wμ,ν)| (54) (55)

where we used (46) and (53) to obtain (54). Thus, we have

 D(PUμ,νVμ,νWμ,ν||PUVWμ,ν)≤χ(μ)ν. (56)

Denoting the total variation distance between distributions and by

 d(PVμ,νWμ,ν,PVWμ,ν) :=12∑(v,w′)∈V×Wμ,ν|PVμ,νWμ