# On the Reliability Function of Distributed Hypothesis Testing Under Optimal Detection

The distributed hypothesis-testing problem with full side-information is studied. The trade-off (reliability function) between the type 1 and type 2 error exponents under limited rate is studied in the following way. First, the problem of determining the reliability function of distributed hypothesis-testing is reduced to the problem of determining the reliability function of channel-detection codes (in analogy to a similar result which connects the reliability of distributed compression and ordinary channel codes). Second, a single-letter random-coding bound based on an hierarchical ensemble, as well as a single-letter expurgated bound, are derived for the reliability of channel-detection codes. Both bounds are derived for the optimal detection rule. We believe that the resulting bounds are ensemble-tight, and hence optimal within the class of quantization-and-binning schemes.

## Authors

• 8 publications
• 4 publications
02/19/2018

### Distributed Hypothesis Testing Over Orthogonal Discrete Memoryless Channels

A distributed binary hypothesis testing problem is studied in which mult...
02/19/2018

### Distributed Hypothesis Testing Over Discrete Memoryless Channels

A distributed binary hypothesis testing problem is studied in which mult...
04/29/2021

### To code, or not to code, at the racetrack: Kelly betting and single-letter codes

For a gambler with side information, Kelly betting gives the optimal log...
07/05/2021

### Neyman-Pearson Hypothesis Testing, Epistemic Reliability and Pragmatic Value-Laden Asymmetric Error Risks

Neyman and Pearson's theory of testing hypotheses does not warrant minim...
01/11/2020

### Decentralized sequential active hypothesis testing and the MAC feedback capacity

We consider the problem of decentralized sequential active hypothesis te...
05/09/2019

### Limits of Deepfake Detection: A Robust Estimation Viewpoint

Deepfake detection is formulated as a hypothesis testing problem to clas...
04/02/2020

### Strong Converse for Testing Against Independence over a Noisy channel

A distributed binary hypothesis testing (HT) problem over a noisy channe...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In the hypothesis-testing (HT) problem, a detector needs to decide between two hypotheses regarding the underlying distribution of some observed data; the hypotheses are commonly known as the null hypothesis and the alternative hypothesis

. Two types of error probability are defined - the

type 1 error probability

of deciding on the alternative hypothesis when the null hypothesis prevails, and

type 2 error probability for the opposite event. The celebrated Neyman-Pearson lemma (e.g., [1, Prop. II.D.1]) states that the detector which achieves the optimal trade-off between the two error probabilities takes the form of comparing the likelihood-ratio to a threshold.

In the context of this work, we only consider data which is a sequence

of independent and identically distributed observations, and therefore the hypotheses correspond to the distribution of a random variable

. In information-theoretic literature (e.g., [2], [3, Ch. 11], [4, Ch. 1], [5, Sec. 2]), large-deviations theory, most notably Sanov’s theorem, is usually applied to this problem. In particular, Stein’s theorem provides the largest exponential decrease rate of the type 2 error probability when the type 1 error probability is bounded away from one. More generally, the reliability function, i.e., the optimal trade-off between the two types of exponents when both are strictly positive, is also known.

As a special case of the HT problem, one may consider a pair of random variables , instances of which are fully observed by the detector. If, however, the detector does not directly observe the data sequence, the characterization of the optimal performance is much more challenging. A common model for such a scenario is known as distributed hypothesis-testing (DHT), and its reliability function is the subject of this paper.

The DHT model was introduced by Berger [6] as an example for a problem at the intersection of multi-terminal information theory and statistical inference. In this model, one encoder observes , while the other observes the corresponding ; they produce codewords to be sent over limited-rate noiseless links to a common detector. The goal is to characterize the optimal detection performance, under a given pair of encoding rates. As a starting point, usually an asymmetric model (sometimes referred to as the side-information case) is studied, in which the -observations are fully available to the decoder, and thus there is only a single rate constraint.

The first major breakthrough on this problem was in a notable paper by Ahlswede and Csiszár [7], who addressed a special scenario termed testing against independence. In this case, the null hypothesis states that

have a given joint distribution

, whereas the alternative hypothesis states that they are independent, but with the same marginals as in the null hypothesis. This case is special since the Kullback–Leibler divergence, which is usually associated with Stein’s exponent of HT, can be identified as the mutual information between

and under , which is naturally related to compression rates. This allowed the authors to use distributed compression, specifically quantization-based encoding techniques from [8, 9], to derive Stein’s exponent for the side-information testing-against-independence problem [7, Th. 2]. Quantization-based encoding was also used for an achievable Stein’s exponent for a general pair of memoryless hypotheses [7, Th. 5], but without a converse bound.

As summarized in [10, Sec. IV], later progress on this problem for a pair of general hypotheses was nonconsecutive, and contributions were made by several groups of researchers. First, in [11], the achievable bound on Stein’s exponent from [7, Th. 5] was improved, and also generalized beyond the side-information case. Then, in [12], achievable bounds on the full trade-off between the two types of exponents were derived. In [13], Stein’s exponent for side-information cases was further significantly improved using binning, as will be described in the sequel.

Interestingly, when either the -marginal or the -marginal of the hypotheses is different, positive exponents can be obtained even for zero-rate encoders [10, Sec. V].111There is also a variant of one-bit encoding, see [10, Sec. V] and [14]. For this case, achievable Stein’s exponents and exponential trade-offs were derived in [10, Th. 5.5] and [11, 12, 15], along with matching converse bounds when some kind of assumptions are imposed.

In the last decade, a renewed interest in the problem arose, aimed both at tackling more elaborated models, as well as at improving the results on the basic model. As for the former, Stein’s exponents under positive rates were explored in successive refinement models [16], for multiple encoders [17], for interactive models [18, 19], under privacy constraints [20], combined with lossy compression [21], over noisy channels [22, 23], for multiple decision centers [24], as well as over multi-hop networks [25]. Exponents for the zero-rate problem were studied under restricted detector structure [14] and for multiple encoders [26]. The finite blocklength and second-order regimes were addressed in [27].

As for the basic model, which this work also investigates, the encoding approach used in [13] is currently the best known. It is based on quantization and binning, just as used, e.g., for distributed lossy compression (the Wyner-Ziv problem [28, Ch. 11] [29]). First, the encoding rate is reduced by quantization

of the source vector to a reproduction vector chosen from a codebook. Second, the rate is further reduced by

binning of the reproduction vectors. As the detector is equipped with side information, it can identify the true reproduction vector with high reliability. In the context of DHT, it can then decide on one of the hypotheses using this reproduction and the side information. In [17] it was shown that a quantization-and-binning scheme achieves the optimal Stein’s exponent in a testing against conditional independence problem (as well as in a model inspired by the Gel’fand-Pinsker problem [30], and also in a Gaussian model). In [31], the quantization-and-binning scheme was shown to be necessary for the case of DHT with degraded hypotheses. In [32, 33], improved exponents were derived (for a doubly-symmetric binary source) by refining the analysis of the effect of binning errors. In addition, a full achievable exponent trade-off was presented, and Körner-Marton coding [34] was used in order to extend the analysis to the symmetric-rate case (for the symmetric source). Finally, [35] suggested an improved detection rule, in which the reproduction vectors in the bin are exhausted one-by-one, and the null hypothesis is declared if a single vector is jointly typical with the side-information vector.

It is evident that the detectors above are all sub-optimal, as they are based upon the two-stage process of reproduction and then testing. However, the decoding of the source vector (or its quantized version) is totally superfluous for the DHT system, as the requirement from the latter is only to distinguish between the hypotheses. In fact, this detection procedure is still sub-optimal even if quantization is used without binning, and only the second stage of the detector is employed. While [35] offered an improved detector, it is still sub-optimal.222Recently in the zero-rate regime, [27] considered the use of an optimal Neyman-Pearson-like detector, rather than the possibly sub-optimal Hoeffeding-like detector [36] that was used in [12]. In this work, we investigate the performance of the optimal detector. In fact, the optimal detector directly follows from the standard Neyman-Pearson lemma (see Section III). Nonetheless, analyzing its performance is highly non-trivial for DHT.

Specifically, we study the reliability function in the side-information case, and will be guided by the following methodology. Recall that for distributed lossless compression systems (the Slepian-Wolf problem [28, Ch. 10] [37]), the side information helps to fully reproduce the source. The concept of binning the source vectors was originally conceived for this problem, and a common wisdom for this problem states that the source vectors which belong to the same bin should constitute a good channel code for the memoryless channel induced by the conditional distribution of given . This intuition was made precise in [38, Th. 1][39, 40], which showed that the reliability function333The reliability function of distributed lossless compression is the optimal exponential decrease of the error probability as a function of the compression rate. of distributed compression is directly related to the reliability of channel coding. The idea is to use structured binning,444Also mentioned in [17] for the DHT problem, though recognized as inessential. using a sequence of channel codes which achieves the channel reliability function. At a given blocklength, such a channel code corresponds to one bin of the distributed compression system.555More precisely, this is done type-by-type, i.e., per the subsets of source sequences that share the same empirical distribution. All other bins of the system are generated by permuting666This permutation technique, originally developed in [38, 41] will be useful here too, and will be reviewed in more detail in what follows. the source vectors of the first bin. Due to the memoryless nature of the problem, all bins generated this way are essentially as good as the original one, and this allows to directly link the reliability function of distributed compression to that of channel coding. While the reliability of channel codes is itself only known above the critical rate [4, Corollary 10.4], this characterization has two advantages nonetheless. First, analyzing channel codes is simpler than analyzing a distributed compression systems. Second, any bound on the reliability of channel codes immediately translates into a bound on the reliability of distributed compression systems. Specifically, the expurgated bound [4, Problem 10.18] and the sphere-packing bound [4, Th. 10.3] can be immediately used, rather than just a random-coding bound [4, Th. 10.2]. Noting the similarity between the distributed compression problem and the DHT problem, it is natural to ask whether structured binning is useful for the DHT problem, and what are the the properties of a “good” bin for the DHT problem?

To address these questions, we introduce the concept of channel-detection (CD) codes. Such codes are not required to carry information, but are rather designed for the task of distinguishing between two possible channel distributions.777 In [42], a a somewhat different channel-detection setting is considered, where the code is required to simultaneously be a good channel code (in the ordinary sense), as well as a good CD code. In this work, it will only be required that the codewords of the CD code are different from one another. Namely, a codeword from the code is chosen with a uniform probability over the codewords, and the detector should decide on the prevailing channel, based only on the output vector (and its knowledge of the codebook). It will be evident that this is the same problem encountered by the detector of a DHT system, given the encoded message. It will be shown that optimal DHT systems (in the exponential sense) can be generated by optimal CD codes, just like distributed compression systems are generated from ordinary channel codes. From this observation, the close relation between the reliability of DHT systems and CD codes will be determined. An illustration of the analogy between the relations distributed compression/channel coding and DHT/CD relations appears in Fig. 1 (with all quantities there will be formally defined in the sequel).

.

This intimate connection allows us to derive bounds on the reliability function of DHT using bounds on the reliability of CD codes. Concretely, we will derive both random-coding bounds and expurgated bounds on the reliability of CD codes under the optimal Neyman-Pearson detector. The analysis goes beyond that of [42] in two senses: first, it is based on a Chernoff distance characterization of the optimal exponents, which leads to simpler single-letter bounds; and second, the analysis is performed for an hierarchical ensemble 888Yielding superposition codes [43], such as the ones used for the broadcast channel, wee, e.g., [28, Ch. 5]), corresponding to quantization-and-binning schemes.

The outline of the rest of the paper is as follows. The system model and preliminaries, such as notation conventions and background on ordinary HT, will be given in Section II. The main result of the paper, namely an achievable bound on the reliability function of DHT under optimal detection, will be stated in Section III, along with some consequences. For the sake of proving these bounds, the reduction of the DHT reliability problem to the CD reliability problem will be considered in Section IV. While only achievability bounds will ultimately be derived, the reduction to CD codes has both an achievability part as well as a converse part. Derivation of single-letter achievable bounds on the reliability of CD codes will be considered in Section V. From this, the achievability bounds on the DHT reliability will immediately follow. Afterwards, a discussion on computational aspects along with a numerical example are given in Section VI. Several directions for further research are highlighted in Section VII.

## Ii System Model

### Ii-a Notation Conventions

Throughout the paper, random variables will be denoted by capital letters, specific values they may take will be denoted by the corresponding lower case letters, and their alphabets will be denoted by calligraphic letters. Random vectors and their realizations will be superscripted by their dimension. For example, the random vector ( positive integer), may take a specific vector value , the th order Cartesian power of , which is the alphabet of each component of this vector. The Cartesian product of and (finite alphabets) will be denoted by .

We will follow the standard notation conventions for probability distributions, e.g.,

will denote the probability of the letter under the distribution . The arguments will be omitted when we address the entire distribution, e.g., . Similarly, generic distributions will be denoted by , , and in other similar forms, subscripted by the relevant random variables/vectors/conditionings, e.g. , . The joint distribution induced by and will be denoted by .

In what follows, we will extensively utilize the method of types [4, 44] and use the following notations. The type class of a type at blocklength , i.e., the set of all with empirical distribution , will be denoted by . The set of all type classes of vectors of length from will be denoted by , and the set of all possible types over will be denoted by . Similar notations will be used for pairs of random variables (and larger collections), e.g., , and . The conditional type class of for a conditional type , namely, the subset of such that the joint type of is , will be denoted by (sometimes called the Q-shell of [4, Definition 2.4]). For a given , the conditional type classes such that is not empty when will be denoted by . The probability simplex for an alphabet will be denoted by .

For two distributions over the same finite alphabet , the variational distance ( norm) will be denoted by

 ∥PX−QX∥def=∑x∈X|PX(x)−QX(x)|.

When optimizing a function of a distribution over a probability simplex , the explicit display of the constraint will be omitted. For example, for a function , will be used instead of .

The expectation operator with respect to a given distribution, e.g., , will be denoted by where the subscript will be omitted if the underlying probability distribution is clear from the context. In general, information-theoretic quantities will be denoted by the standard notation [3], with subscript indicating the distribution of the relevant random variables, e.g. , under . As an exception, the entropy of under will be denoted by . The binary entropy function will be denoted by for . The conditional Kullback–Leibler divergence between conditional two distributions, e.g., and , when averaged with the distribution will be denoted by , and in case that is degenerated, the notation will be simplified to .

The Hamming distance between two vectors, and will be denoted by . The complement of a set will be denoted by . For a finite multiset , the number of distinct elements will be denoted by . The probability of the event will be denoted by , and its indicator function will be denoted by .

For two positive sequences, and the notation , will mean asymptotic equivalence in the exponential scale, that is, . Similarly, will mean , and so on. The notation will mean that . The ceiling function will be denoted by . The notation will stand for . Logarithms and exponents will be understood to be taken to the natural base. Throughout, for the sake of brevity, we will ignore integer constraints on large numbers. For example, will be written as . For , the set will be denoted by .

### Ii-B Ordinary Hypothesis-Testing

Before getting into the distributed scenario, we will shortly review in this section ordinary HT between a pair of hypotheses. Consider a random variable over a finite alphabet , whose distribution according to the hypothesis (respectively, ) is given by (respectively, ). It is common in the literature to refer to (respectively, ) as the null hypothesis (respectively, the alternative hypothesis). However, we will refrain from using such terminology, and the two hypotheses will be considered to have an equal stature.

Given i.i.d. observations , a (possibly randomized) detector

 ϕ:Zn→{H,¯¯¯¯¯H},

has type 1 and type 2 error probabilities999Also called the false-alarm probability and misdetection probability in engineering applications. given by

 p1(ϕ)def=P[ϕ(Zn)=¯¯¯¯¯H], (1)

and

 p2(ϕ)def=¯¯¯¯P[ϕ(Zn)=H]. (2)

The family of detectors which optimally trades between the two types of error probabilities is given by the Neyman-Pearson lemma [1, Prop. II.D.1], [3, Th. 11.7.1] by101010Since the two hypotheses are assumed to be discrete, randomized tie-breaking should be used if a given constraint on one of the error probabilities should be matched exactly. Nonetheless, this randomization has no effect on the exponential behavior, which is the focus of this paper (in the distributed scenario). Thus, we will not dwell on randomized detectors too much in what follows.

 (3)

where is a threshold parameter. This parameter controls the trade-off between the probability of the two types of error - the larger is, the type 1 error probability increases and the type 2 error probability decreases, and vice-versa.

To describe bounds on the error probabilities of the optimal detector, let us define the hypothesis-testing reliability function [2, Section II] as

 D2(D1;P,¯¯¯¯P)def=minQ:D(Q||P)≤D1D(Q||¯¯¯¯P). (4)

For brevity, we shall omit the dependence on as they remain fixed and can be understood from context. As is well known [2, Th. 3], for a given , there exists a proper choice of such that

 p1(ϕ∗n.T)≤exp(−n⋅D1), (5)
 p2(ϕ∗n.T)≤exp[−n⋅D2(D1)]. (6)

Furthermore, it is also known that this exponential behavior is optimal [2, Corollary 2], in the sense that if

 liminfn→∞−1nlogp1(ϕ∗n,T)≥D1

then

 limsupn→∞−1nlogp2(ϕ∗n,T)≤D2(D1).

It should be noted, however, that the detector (3) and the bounds on its error probability (5)-(6) are exactly optimal for any given . In fact, in what follows, we will use these relations for .

The function is known to be a convex function of , continuous on and strictly decreasing prior to any interval on which it is constant [2, Th. 3]. Furthermore, it is known [2, Th. 7] that it can be represented as

 D2(D1)=supτ≥0{−τ⋅D1+(τ+1)⋅dτ}, (7)

where

 dτdef=−log[∑z∈ZP\nicefracττ+1(z)¯¯¯¯P\nicefrac1τ+1(z)], (8)

is the Chernoff distance between distributions. The representation (7) will be used in the sequel to derive bounds on the reliability of DHT systems.

### Ii-C Distributed Hypothesis-Testing

Let be independent copies of a pair of random variables , where and are finite alphabets. Under , the joint distribution of is given by , whereas under , this distribution is given by . We assume that both probability measures are absolutely continuous with respect to one another, i.e., , as well as ,111111This implies that both types of Stein’s exponent for this problem are finite. and thus it can be assumed without loss of generality (w.l.o.g.) that and . For brevity, we denote the probability of an event under (respectively, ) by [respectively, ].

A DHT system , as depicted in Fig. 2, is defined by an encoder

 fn:Xn→[mn],

which maps a source vector into an index , and a detector (possibly randomized121212Randomized encoding can also be defined. In this case, the encoder takes the form , where is a probability vector whose th entry is the probability of mapping to the index . In the sequel, we will also use a rather simple form of randomized encoding, which does not require this general definition. There, the source vector will be used to randomly generate a new source vector , and the latter will be encoded by a deterministic encoder (see the proof of the achievability part of Theorem 6 in Appendix B-A).)

 φn:[mn]×Yn→{H,¯¯¯¯¯H}.

The inverse image of for , i.e.,

 f−1n(i)def={xn∈Xn:fn(xn)=i},

is called the bin associated with index . The rate of is defined as , the type 1 error probability of is defined as

 p1(Hn)def=P[φn(fn(Xn),Yn)=¯¯¯¯¯H],

and the type 2 error probability is defined as

In the sequel, conditional error probabilities given an event will be abbreviated as, e.g.,

 p1(Hn|A)def=P[φn(fn(Xn),Yn)=¯¯¯¯¯H|A].

A sequence of DHT systems will be denoted by . The sequence is associated with two different exponents for each of the two probabilities defined above. The infimum type 1 exponent of a sequence of systems is defined by

 liminfn→∞−1nlogp1(Hn), (9)

and the supremum type 1 exponent is defined by

 limsupn→∞−1nlogp1(Hn). (10)

Analogous exponents can be defined for the type 2 error probability.

The reliability function of a DHT system is the optimal trade-off between the two types of exponents achieved by any encoder-detector pair of a given rate . Specifically, the infimum DHT reliability function is defined by

 E−2(R,E1;PXY,¯¯¯¯PXY)def=supH{liminfn→∞−1nlogp2(Hn):∀n,mn≤enR,p1(Hn)≤e−n⋅E1},

and the supremum DHT reliability function is analogously defined, albeit with a . For brevity, the dependence on will be omitted henceforth whenever it is understood from context.

## Iii Main Result: Bounds on The Reliability Function of DHT

To begin the discussion on the reliability function of DHT systems, we note that for a given encoder the form of the optimal detector is just a comparison of likelihoods, as in ordinary HT. Indeed, it readily follows from (3) that the optimal detector has the form

 (11)

for some which sets the trade-off between the two error probabilities.

Hence, the characterization of the DHT reliability function is reduced to finding optimal encoders, to wit, a partition of the alphabet into bins, and then, for a given sequence of optimal encoders, finding single-letter expressions for the resulting error exponents, under optimal detection (11). These problems are much more challenging then the characterization of the optimal detector. Nonetheless, just like in distributed compression and channel coding problems mentioned in the introduction, achievability bounds can be derived using random-coding ensembles. Specifically, the main result of this paper, which we next describe, is a random-coding bound and an expurgated bound, obtained under an optimal Neyman-Pearson detector. Before that, we state the trivial converse bound, obtained when is not compressed, or alternatively, when (immediately deduced from the discussion in Section II-B).

###### Proposition 1.

The supremum DHT reliability function is bounded as

 E+2(R,E1;PXY,¯¯¯¯PXY)≤minQXY:D(QXY||PXY)≤E1D(QXY||¯¯¯¯PXY).

To state our achievability bound, we will need several additional notations. We denote the Chernoff distance between symbols by

 dτ(x,~x)def=−log∑y∈YP\nicefracττ+1Y|X(y|x)¯¯¯¯P\nicefrac1τ+1Y|X(y|~x), (12)

and between vectors by

 dτ(xn,~xn)def=1nn∑i=1dτ(xi,~xi). (13)

Further, when are distributed according to we define the average Chernoff distance as

 dτ(QX~X)def=EQ[dτ(X,~X)], (14)

and when is distributed according to , we denote, for brevity,

 dτ(QX)def=EQ[dτ(X,X)]. (15)

Next, we denote

 B′rc(R,Rb,QUX,τ) =+max{∣∣IQ(U;Y)−Rb∣∣+,IQ(U,X;Y)−H(QX)+R} =+τ⋅max{∣∣I¯¯¯¯Q(U;Y)−Rb∣∣+,I¯¯¯¯Q(U,X;Y)−H(¯¯¯¯QX)+R}}, (16)

and

 B′′rc(R,Rb,QUX,τ) def=min(QUXY,¯¯¯¯QUXY):QUX=¯¯¯¯QUX,QUY=¯¯¯¯QUY,IQ(U;Y)>Rb{τ⋅D(QY|UX||PY|X|QUX)+D(¯¯¯¯QY|UX||¯¯¯¯PY|X|¯¯¯¯QUX) =+∣∣IQ(X;Y|U)−H(QX)+R+Rb∣∣+ =+τ⋅∣∣I¯¯¯¯Q(X;Y|U)−H(¯¯¯¯QX)+R+Rb∣∣+}, (17)

as well as

 (18)

which are all related to a random-coding based bound on the reliability function. We also denote

 Bex(R,QX,τ)def=(τ+1)⋅minQX~X:QX=Q~X,HQ(X|~X)≥R{dτ(QX~X)+R−HQ(X|~X)}, (19)

which is related to an expurgated based bound on the reliability function. Finally, we also denote

 B(R,QX,τ)def=max⎧⎨⎩supQU|XsupRb:Rb≥∣∣IQ(U;X)−R∣∣+Brc(R,Rb,QUX,τ),Bex(R,QX,τ)⎫⎬⎭. (20)

For brevity, arguments such as will sometimes be omitted henceforth.

###### Theorem 2.

The infimum DHT reliability function bounded as

 E−2(R,E1;PXY,¯¯¯¯PXY)≥minQXsupτ≥0[−τ⋅E1+D(QX||¯¯¯¯PX)+τ⋅D(QX||PX)+min{(τ+1)⋅dτ(QX),B(R,QX,τ)}]. (21)

As hinted by (20), the best of a random-coding bound and an expurgated bound can be chosen for any given input type In the case of a random-coding bound, the achieving scheme is based on quantization and binning. In this respect, for such (with ), the conditional type is the test channel for quantizing source vectors into one of possible reproduction vectors, where the quantization rate satisfies . Then, these reproduction vectors are grouped to bins of size (at most) each, the binning rate satisfies . Both and may be optimized, separately for any given , to obtain the best type 2 exponent. In case the expurgated bound is better than the random-coding bound for , the scheme which achieves it is based on binning at rate , without quantization.

We next discuss several implications of Theorem 2. First, simpler bounds, perhaps at the cost of worse exponents, can be obtained by considering two extermal choices. To obtain a binning-based scheme, without quantization, we choose to be a degenerated random variable (deterministic, i.e., ) and . We then get that dominates the minimization in (18), and

 Brc(R,H(QX)−R,QUX,τ) =Brc,b(R,QX,τ) (22) def=min(QXY,¯QXY):QX=¯¯¯¯QX,QY=¯¯¯¯QY{τ⋅D(QY|X||PY|X|QX)+D(¯¯¯¯QY|X||¯¯¯¯PY|X|¯¯¯¯QX) =+∣∣R−HQ(X|Y)∣∣++τ⋅∣∣R−H¯¯¯¯Q(X|Y)∣∣+}. (23)

To obtain a quantization-based scheme, without binning, we choose , and limit to satisfy .

Second, if the rate is large enough then no loss is expected in the reliability function of DHT. We can deduce from Theorem 2 an upper bound on this no-loss rate, as follows.

###### Corollary 3.

Suppose that is sufficiently large such that

 B(R,QX,τ)≥dτ(QX) (24)

for all and . Then,

 E−2(R,E1) =E−2(∞,E1) (25) =D2(E1). (26)

The proof of this corollary appears in Appendix A.

Third, by setting , Theorem 2 yields an achievable bound on Stein’s exponent, as follows.

###### Corollary 4.

Stein’s exponent is bounded as

 E−2(R,0) ≥D(PX||¯¯¯¯PX)+supτ≥0min{(τ+1)⋅dτ(PX),B(R,PX,τ)} (27) ≥min⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩D(PXY||PX×¯¯¯¯PY|X),D(PX||¯¯¯¯PX)+supQU|XsupRb:Rb≥∣∣∣IPX×QU|X(U;X)−R∣∣∣+limτ→∞Brc(R,Rb,PX×QU|X,τ)⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭. (28)

The first term in (28) can be identified as Stein’s exponent when the rate is not constrained at all. The proof of this corollary also appears in Appendix A, and it seems that no further significant simplifications are possible. It is worth to note, however, that the resulting bound is quite different from the best known bound by Shimokawa, Han and Amari [10, Th. 4.3], [13] (and its refinement in [32, 33]).

Fourth, it is interesting to examine the case . Using analysis similar to the proof of Corollary 4, it is easy to verify that using a binning-based scheme [i.e., substituting (23) in (21) for ] achieves the lower bound

 E−2(R=0,E1)≥min(QXY,¯¯¯¯QXY):QX=¯¯¯¯QX,QY=¯¯¯¯QY,D(QXY||PXY)≤E1D(¯¯¯¯QXY||¯¯¯¯PXY).

As expected, this is the same type 2 error exponent obtained when is also encoded at zero rate, as obtained in [10, Th. 5.4], [11, Th. 6]. For this bound, a matching converse is known [10, Th. 5.5]. When then , and then Stein’s exponent is given by

 E−2(R=0,E1=0)≥min¯¯¯¯QXY:¯¯¯¯QX=PX,¯¯¯¯QY=PYD(¯¯¯¯QXY||¯¯¯¯PXY).

In [15, Th. 2] it was determined that this exponent is optimal (even when is not encoded and given as side-information to the detector).

The rest of the paper is mainly devoted to the proof of Theorem 2, and is based on the following methodology, comprised of two steps. We will introduce CD codes, which, in a nutshell, correspond to a single bin of a DHT system. The first step of the proof is the reduction of the DHT reliability problem to the CD reliability problem which will be considered in Section IV. The second step is derivation of single-letter achievable bounds on the reliability of CD codes, and this will be considered in Section V. The bound of Theorem 2 on the DHT reliability function then follow as easy corollary to these results.

## Iv From Distributed Hypothesis-Testing to Channel-Detection Codes

In this section, we formulate CD codes, and then use them to characterize the reliability of DHT systems. CD codes were considered in [42] for the problem of joint detection and decoding. For this purpose, the code has to be chosen to allow the receiver to detect the channel conditional probability, as well as for transmitting messages, just like an ordinary channel code. In this paper, each quantization cell of a DHT system will be considered and analyzed as a CD code. Since a DHT system is only required to decide on the hypothesis but not on the actual source vector, the error probability of CD codes (for transmitting messages) is irrelevant in this paper. However, this does not imply that all the codewords of CD code can be chosen to be identical (which is optimal in terms of detection performance), since by definition, the members of a quantization cell are different from one another. Hence, in what follows we will define CD codes of a given cardinality, and enforce their codewords to be different from one another. Here too, for brevity, we will denote the probability of an event under and by and , respectively. The required definitions for CD codes are quite similar to the ones required for DHT systems, but as some differences do exist, we explicitly outline them in what follows.

A CD code for a type class is given by (where all codewords must be different). An input

to the channel is chosen with a uniform distribution over

, and sent over uses of a DMC which may be either when is active or when is. The random channel output is given by . The detector has to decide based on

whether the DMC conditional probability distribution is

or . As for the DHT problem, we assume that and . A detector (possibly randomized) for is given by

 ϕn:Yn→{H,¯¯¯¯¯H}.

In accordance, two error probabilities can be defined, to wit, the type 1 error probability

 p1(Cn,ϕn)def=P[ϕn(Yn)=¯¯¯¯¯H], (29)

and the type 2 error probability

 p2(Cn,ϕn)def=¯¯¯¯P[ϕn(Yn)=H]. (30)

As for the DHT problem, the Neyman-Pearson lemma implies that the optimal detector is given by

 (31)

for some threshold .

Let be a given type, and let be the subsequence of blocklengths such that is not empty. As for a DHT sequence of systems , a sequence of CD codes is associated with two exponents. The infimum type 1 exponent of a sequence of codes and detector is defined as

 liminfl→∞−1nllogp1(Cnl,ϕnl), (32)

and the supremum type 1 exponent is similarly defined, albeit with a . Analogous exponents are defined for the type 2 error probability. In the sequel, we will construct DHT systems whose bins are good CD codes, for each . Since to obtain an achievability bound for a DHT system, good performance of CD codes of all types of the source vectors will be simultaneously required, the blocklengths of the components CD codes must match. Thus, the limit inferior definition of exponents must be used, as it assures convergence for all sufficiently large blocklength. For the converse bound, we will use the limit superior definition.

For a given type , rate , and type 1 constraint , we define the infimum CD reliability function as

 F−2(ρ,QX,F1;PY|X,¯¯¯¯PY|X)def=supC,{ϕnl}∞l=1{liminfl→∞−1nllogp2(Cnl,ϕnl):∀l,Cnl⊆Tnl(QX),|Cnl|≥enlρ,p1(Cnl,ϕnl)≤e−nl⋅F1}, (33)

and the supremum CD reliability function is analogously defined, albeit with a . For brevity, the dependence on will be omitted whenever it is understood from context. Thus, the only difference in the reliability function of CD codes from ordinary HT, is that in CD codes the distributions are to be optimally designed under the rate constraint . Indeed, for symmetry implies that any is an optimal CD code. The detector in this case has no ambiguity regarding the transmitted symbol at any given time point. This, however, does not hold when , and ambiguity exists at at least a single time point. This additional uncertainty complicates the operation of the detector, and reduces the reliability function. Basic properties of are given as follows.

###### Proposition 5.

As a function of ,

are non-increasing and have both limit from the right and from the left at every point. They have no discontinuities of the second kind and the set of first kind discontinuities (i.e., jump discontinuity points) is at most countable. Similar properties hold as a function of

###### Proof:

It follows from their definition that are non-increasing in . The continuity statements follow from properties of monotonic functions [45, Th. 4.29 and its Corollary, Th. 4.30] (Darboux-Froda’s theorem). ∎

With the above, we can state the main result of this section, which is a characterization of the reliability of DHT systems using the reliability of CD codes.

###### Theorem 6.

The DHT reliability functions satisfy:

• Achievability part:

 E−2(R,E1)≥limδ↓0infQX∈P(X){D(QX||¯¯¯¯PX)+F−2(H(QX)−R,QX,E1−D(QX||PX)+δ)}.
• Converse part:

 E+2(R,E1)≤limδ↓0infQX∈P(X){D(QX||¯¯¯¯PX)+F+2(H(QX)−R+δ,QX,E1−D(QX||PX)−δ)}.

The achievability and converse part match up to two discrepancies. First, in the achievability (respectively, converse) part the infimum (supremum) reliability function appears. This seems unavoidable, as it is not known if the infimum and supremum reliability functions are equal even for ordinary channel codes [4, Problem 10.7]. Second, the bounds include left and right limits of at rate and exponent . Nonetheless, due to monotonicity, is continuous function of and for all rates and exponents, perhaps excluding a countable set (Proposition 5). Thus, for any given there exists an arbitrarily close such that Theorem 6 holds with .

The proof of Theorem 6 appears in Appendix B. The achievability part (Appendix B-A) is proved by first constructing a DHT system for source vectors from a a single type class of the source. The bins are generated by permutations of a “good” CD code. The fact that the two hypotheses are memoryless implies that bins generated this way have approximately the same exponents as the original CD code. Furthermore, since type classes are closed under permutations, they can be covered using permutations of a CD code. A covering lemma by Ahlswede [41, Section 6, Covering Lemma 2] shows that in fact such covering method can be effective, in the sense that the required number of permutations is close to minimal number possible (up to the first order in the exponent). This allows to prove that the encoding rate is as required. Ideas from this spirit were used for the DHT problem in [7], as well as for distributed compression [39, 40], and secure lossy compression [46]. Afterwards, all types of the source are considered simultaneously to generate a DHT system for any possible type of the source vector. This requires proving the uniform convergence of the error probabilities to their asymptotic exponential behavior, uniformly over all possible types.

The proof of the converse part (Appendix B-B) is based upon identifying a sequence of bins whose size and conditional exponents are close to typical values of the entire DHT system. Such a bin corresponds to a CD code, and thus clearly cannot have better exponents than the ones dictated by the reliability function of CD codes. This restriction is then translated back to bound the reliability of DHT systems.

This theorem parallels a similar result of [39, 40] for the reliability functions of distributed compression. By analogy, the reliability function of distributed compression can be characterized by the reliability function of ordinary channel codes (see Fig. 1). The latter can be bounded using well-known classic bounds, such as the random-coding and expurgated achievability bounds, and the sphere-packing, zero-rate, and straight-line converse bounds [4, Ch. 10]. The characterization of Theorem 6 articulates the fact that the reliability of DHT systems is directly connected to the problem of determining the reliability of CD codes. However, even this apparently simpler problem is challenging to solve exactly, and just as in the case of ordinary channel coding, perhaps only lower and upper bounds are at reach. Hence, even though DHT is a source coding problem in nature, for the such reliability functions are known (e.g., Marton’s exponent [47]), Theorem 6 reveals that the reliability of DHT systems depends on the reliability of a channel coding problem, for the such optimal exponents are typically not entirely known, even for the most basic settings. This might manifest the intrinsic difficulty of the DHT problem. Nonetheless, in the next section, we derive concrete bounds on the reliability of CD codes, which, using Theorem 6, lead directly to bounds on the reliability of the DHT problem of Theorem 2.

## V Bounds on the Reliability of CD Codes

In the previous section, we have shown that the reliability function of DHT can be directly obtained by the reliability of CD codes. In this section, we derive bounds on the latter, using random-coding arguments. A naive approach would be to randomly draw the codewords independently from some distribution. However, as discussed in the introduction, the best known achievable bounds for DHT systems are obtained using quantization-and-binning-based schemes. As CD codes correspond to a single bin of a DHT system,131313Strictly speaking, this is true only when the type of the source vectors is constant within each bin. Since sending the type class of the source vector requires negligible rate, it can be assumed w.l.o.g. that this is indeed the case, as otherwise, one can further partition the bin into type-classes. it follows that CD codes can also benefit from adding dependence between the randomly drawn codewords. In more detail, a bin of a DHT system which was constructed by the quantization-and-binning method corresponds to a superposition code. To wit, it will contain a set of reproduction vectors which are sufficiently different from one another, so that the side-information vector enables to decide on the true reproduction vector with high reliability, and each reproduction vector is surrounded by a quantization cell, which correspond to all source vectors which are mapped to it in the quantization process. The quantization cell should be sufficiently “small”, such that as long as the detector correctly decodes the reproduction vector, the detection reliability given the reproduction vector is close to the detection reliability given the true vector. A CD code which corresponds to a binning-based scheme will contain source vectors which correspond to a good channel code for the channels and , so that the side-information vector can be used to decode the true vector with high reliability. A CD code which corresponds to a quantization-based scheme will look like a quantization cell of a single reproduction vector. The various types of CD codes are depicted in Fig. 3.

Since a single bin of a quantization-and-binning DHT system corresponds to a superposition code, it should be drawn from an hierarchical ensemble, which we next define:

###### Definition 7.

A fixed-composition hierarchical ensemble for a type and rate is defined by a conditional type , where is an auxiliary random variable from a finite alphabet , a cloud-center rate , and a satellite rate such that . A codebook from this ensemble is drawn in two stages. First, cloud centers are drawn, independently and uniformly over . Second, for each of the cloud centers , satellites are drawn independently and uniformly over . When considered a random entity, the codebook will be denoted by .

Evidently, codewords which pertain to the same cloud are dependent, whereas codewords from different clouds are independent. Further, the ordinary ensemble, in which each codeword is drawn uniformly over , independently of all other codewords, can be obtained as a special case by choosing and . On the other hand, when , then a single141414Or a sub-exponential number of cloud centers. cloud center is drawn, and all codewords are satellites of this center. This corresponds to the bin of a DHT system which is based only on quantization, without binning. More generally, for source vectors of type , a quantization-and-binning scheme of rate , binning rate , and quantization rate , leads to CD codes of rate