I Introduction
In the hypothesistesting (HT) problem, a detector needs to decide between two hypotheses regarding the underlying distribution of some observed data; the hypotheses are commonly known as the null hypothesis and the alternative hypothesis
. Two types of error probability are defined  the
type 1 error probabilityof deciding on the alternative hypothesis when the null hypothesis prevails, and
type 2 error probability for the opposite event. The celebrated NeymanPearson lemma (e.g., [1, Prop. II.D.1]) states that the detector which achieves the optimal tradeoff between the two error probabilities takes the form of comparing the likelihoodratio to a threshold.In the context of this work, we only consider data which is a sequence
of independent and identically distributed observations, and therefore the hypotheses correspond to the distribution of a random variable
. In informationtheoretic literature (e.g., [2], [3, Ch. 11], [4, Ch. 1], [5, Sec. 2]), largedeviations theory, most notably Sanov’s theorem, is usually applied to this problem. In particular, Stein’s theorem provides the largest exponential decrease rate of the type 2 error probability when the type 1 error probability is bounded away from one. More generally, the reliability function, i.e., the optimal tradeoff between the two types of exponents when both are strictly positive, is also known.As a special case of the HT problem, one may consider a pair of random variables , instances of which are fully observed by the detector. If, however, the detector does not directly observe the data sequence, the characterization of the optimal performance is much more challenging. A common model for such a scenario is known as distributed hypothesistesting (DHT), and its reliability function is the subject of this paper.
The DHT model was introduced by Berger [6] as an example for a problem at the intersection of multiterminal information theory and statistical inference. In this model, one encoder observes , while the other observes the corresponding ; they produce codewords to be sent over limitedrate noiseless links to a common detector. The goal is to characterize the optimal detection performance, under a given pair of encoding rates. As a starting point, usually an asymmetric model (sometimes referred to as the sideinformation case) is studied, in which the observations are fully available to the decoder, and thus there is only a single rate constraint.
The first major breakthrough on this problem was in a notable paper by Ahlswede and Csiszár [7], who addressed a special scenario termed testing against independence. In this case, the null hypothesis states that
have a given joint distribution
, whereas the alternative hypothesis states that they are independent, but with the same marginals as in the null hypothesis. This case is special since the Kullback–Leibler divergence, which is usually associated with Stein’s exponent of HT, can be identified as the mutual information between
and under , which is naturally related to compression rates. This allowed the authors to use distributed compression, specifically quantizationbased encoding techniques from [8, 9], to derive Stein’s exponent for the sideinformation testingagainstindependence problem [7, Th. 2]. Quantizationbased encoding was also used for an achievable Stein’s exponent for a general pair of memoryless hypotheses [7, Th. 5], but without a converse bound.As summarized in [10, Sec. IV], later progress on this problem for a pair of general hypotheses was nonconsecutive, and contributions were made by several groups of researchers. First, in [11], the achievable bound on Stein’s exponent from [7, Th. 5] was improved, and also generalized beyond the sideinformation case. Then, in [12], achievable bounds on the full tradeoff between the two types of exponents were derived. In [13], Stein’s exponent for sideinformation cases was further significantly improved using binning, as will be described in the sequel.
Interestingly, when either the marginal or the marginal of the hypotheses is different, positive exponents can be obtained even for zerorate encoders [10, Sec. V].^{1}^{1}1There is also a variant of onebit encoding, see [10, Sec. V] and [14]. For this case, achievable Stein’s exponents and exponential tradeoffs were derived in [10, Th. 5.5] and [11, 12, 15], along with matching converse bounds when some kind of assumptions are imposed.
In the last decade, a renewed interest in the problem arose, aimed both at tackling more elaborated models, as well as at improving the results on the basic model. As for the former, Stein’s exponents under positive rates were explored in successive refinement models [16], for multiple encoders [17], for interactive models [18, 19], under privacy constraints [20], combined with lossy compression [21], over noisy channels [22, 23], for multiple decision centers [24], as well as over multihop networks [25]. Exponents for the zerorate problem were studied under restricted detector structure [14] and for multiple encoders [26]. The finite blocklength and secondorder regimes were addressed in [27].
As for the basic model, which this work also investigates, the encoding approach used in [13] is currently the best known. It is based on quantization and binning, just as used, e.g., for distributed lossy compression (the WynerZiv problem [28, Ch. 11] [29]). First, the encoding rate is reduced by quantization
of the source vector to a reproduction vector chosen from a codebook. Second, the rate is further reduced by
binning of the reproduction vectors. As the detector is equipped with side information, it can identify the true reproduction vector with high reliability. In the context of DHT, it can then decide on one of the hypotheses using this reproduction and the side information. In [17] it was shown that a quantizationandbinning scheme achieves the optimal Stein’s exponent in a testing against conditional independence problem (as well as in a model inspired by the Gel’fandPinsker problem [30], and also in a Gaussian model). In [31], the quantizationandbinning scheme was shown to be necessary for the case of DHT with degraded hypotheses. In [32, 33], improved exponents were derived (for a doublysymmetric binary source) by refining the analysis of the effect of binning errors. In addition, a full achievable exponent tradeoff was presented, and KörnerMarton coding [34] was used in order to extend the analysis to the symmetricrate case (for the symmetric source). Finally, [35] suggested an improved detection rule, in which the reproduction vectors in the bin are exhausted onebyone, and the null hypothesis is declared if a single vector is jointly typical with the sideinformation vector.It is evident that the detectors above are all suboptimal, as they are based upon the twostage process of reproduction and then testing. However, the decoding of the source vector (or its quantized version) is totally superfluous for the DHT system, as the requirement from the latter is only to distinguish between the hypotheses. In fact, this detection procedure is still suboptimal even if quantization is used without binning, and only the second stage of the detector is employed. While [35] offered an improved detector, it is still suboptimal.^{2}^{2}2Recently in the zerorate regime, [27] considered the use of an optimal NeymanPearsonlike detector, rather than the possibly suboptimal Hoeffedinglike detector [36] that was used in [12]. In this work, we investigate the performance of the optimal detector. In fact, the optimal detector directly follows from the standard NeymanPearson lemma (see Section III). Nonetheless, analyzing its performance is highly nontrivial for DHT.
Specifically, we study the reliability function in the sideinformation case, and will be guided by the following methodology. Recall that for distributed lossless compression systems (the SlepianWolf problem [28, Ch. 10] [37]), the side information helps to fully reproduce the source. The concept of binning the source vectors was originally conceived for this problem, and a common wisdom for this problem states that the source vectors which belong to the same bin should constitute a good channel code for the memoryless channel induced by the conditional distribution of given . This intuition was made precise in [38, Th. 1][39, 40], which showed that the reliability function^{3}^{3}3The reliability function of distributed lossless compression is the optimal exponential decrease of the error probability as a function of the compression rate. of distributed compression is directly related to the reliability of channel coding. The idea is to use structured binning,^{4}^{4}4Also mentioned in [17] for the DHT problem, though recognized as inessential. using a sequence of channel codes which achieves the channel reliability function. At a given blocklength, such a channel code corresponds to one bin of the distributed compression system.^{5}^{5}5More precisely, this is done typebytype, i.e., per the subsets of source sequences that share the same empirical distribution. All other bins of the system are generated by permuting^{6}^{6}6This permutation technique, originally developed in [38, 41] will be useful here too, and will be reviewed in more detail in what follows. the source vectors of the first bin. Due to the memoryless nature of the problem, all bins generated this way are essentially as good as the original one, and this allows to directly link the reliability function of distributed compression to that of channel coding. While the reliability of channel codes is itself only known above the critical rate [4, Corollary 10.4], this characterization has two advantages nonetheless. First, analyzing channel codes is simpler than analyzing a distributed compression systems. Second, any bound on the reliability of channel codes immediately translates into a bound on the reliability of distributed compression systems. Specifically, the expurgated bound [4, Problem 10.18] and the spherepacking bound [4, Th. 10.3] can be immediately used, rather than just a randomcoding bound [4, Th. 10.2]. Noting the similarity between the distributed compression problem and the DHT problem, it is natural to ask whether structured binning is useful for the DHT problem, and what are the the properties of a “good” bin for the DHT problem?
To address these questions, we introduce the concept of channeldetection (CD) codes. Such codes are not required to carry information, but are rather designed for the task of distinguishing between two possible channel distributions.^{7}^{7}7 In [42], a a somewhat different channeldetection setting is considered, where the code is required to simultaneously be a good channel code (in the ordinary sense), as well as a good CD code. In this work, it will only be required that the codewords of the CD code are different from one another. Namely, a codeword from the code is chosen with a uniform probability over the codewords, and the detector should decide on the prevailing channel, based only on the output vector (and its knowledge of the codebook). It will be evident that this is the same problem encountered by the detector of a DHT system, given the encoded message. It will be shown that optimal DHT systems (in the exponential sense) can be generated by optimal CD codes, just like distributed compression systems are generated from ordinary channel codes. From this observation, the close relation between the reliability of DHT systems and CD codes will be determined. An illustration of the analogy between the relations distributed compression/channel coding and DHT/CD relations appears in Fig. 1 (with all quantities there will be formally defined in the sequel).
.
This intimate connection allows us to derive bounds on the reliability function of DHT using bounds on the reliability of CD codes. Concretely, we will derive both randomcoding bounds and expurgated bounds on the reliability of CD codes under the optimal NeymanPearson detector. The analysis goes beyond that of [42] in two senses: first, it is based on a Chernoff distance characterization of the optimal exponents, which leads to simpler singleletter bounds; and second, the analysis is performed for an hierarchical ensemble ^{8}^{8}8Yielding superposition codes [43], such as the ones used for the broadcast channel, wee, e.g., [28, Ch. 5]), corresponding to quantizationandbinning schemes.
The outline of the rest of the paper is as follows. The system model and preliminaries, such as notation conventions and background on ordinary HT, will be given in Section II. The main result of the paper, namely an achievable bound on the reliability function of DHT under optimal detection, will be stated in Section III, along with some consequences. For the sake of proving these bounds, the reduction of the DHT reliability problem to the CD reliability problem will be considered in Section IV. While only achievability bounds will ultimately be derived, the reduction to CD codes has both an achievability part as well as a converse part. Derivation of singleletter achievable bounds on the reliability of CD codes will be considered in Section V. From this, the achievability bounds on the DHT reliability will immediately follow. Afterwards, a discussion on computational aspects along with a numerical example are given in Section VI. Several directions for further research are highlighted in Section VII.
Ii System Model
Iia Notation Conventions
Throughout the paper, random variables will be denoted by capital letters, specific values they may take will be denoted by the corresponding lower case letters, and their alphabets will be denoted by calligraphic letters. Random vectors and their realizations will be superscripted by their dimension. For example, the random vector ( positive integer), may take a specific vector value , the th order Cartesian power of , which is the alphabet of each component of this vector. The Cartesian product of and (finite alphabets) will be denoted by .
We will follow the standard notation conventions for probability distributions, e.g.,
will denote the probability of the letter under the distribution . The arguments will be omitted when we address the entire distribution, e.g., . Similarly, generic distributions will be denoted by , , and in other similar forms, subscripted by the relevant random variables/vectors/conditionings, e.g. , . The joint distribution induced by and will be denoted by .In what follows, we will extensively utilize the method of types [4, 44] and use the following notations. The type class of a type at blocklength , i.e., the set of all with empirical distribution , will be denoted by . The set of all type classes of vectors of length from will be denoted by , and the set of all possible types over will be denoted by . Similar notations will be used for pairs of random variables (and larger collections), e.g., , and . The conditional type class of for a conditional type , namely, the subset of such that the joint type of is , will be denoted by (sometimes called the Qshell of [4, Definition 2.4]). For a given , the conditional type classes such that is not empty when will be denoted by . The probability simplex for an alphabet will be denoted by .
For two distributions over the same finite alphabet , the variational distance ( norm) will be denoted by
When optimizing a function of a distribution over a probability simplex , the explicit display of the constraint will be omitted. For example, for a function , will be used instead of .
The expectation operator with respect to a given distribution, e.g., , will be denoted by where the subscript will be omitted if the underlying probability distribution is clear from the context. In general, informationtheoretic quantities will be denoted by the standard notation [3], with subscript indicating the distribution of the relevant random variables, e.g. , under . As an exception, the entropy of under will be denoted by . The binary entropy function will be denoted by for . The conditional Kullback–Leibler divergence between conditional two distributions, e.g., and , when averaged with the distribution will be denoted by , and in case that is degenerated, the notation will be simplified to .
The Hamming distance between two vectors, and will be denoted by . The complement of a set will be denoted by . For a finite multiset , the number of distinct elements will be denoted by . The probability of the event will be denoted by , and its indicator function will be denoted by .
For two positive sequences, and the notation , will mean asymptotic equivalence in the exponential scale, that is, . Similarly, will mean , and so on. The notation will mean that . The ceiling function will be denoted by . The notation will stand for . Logarithms and exponents will be understood to be taken to the natural base. Throughout, for the sake of brevity, we will ignore integer constraints on large numbers. For example, will be written as . For , the set will be denoted by .
IiB Ordinary HypothesisTesting
Before getting into the distributed scenario, we will shortly review in this section ordinary HT between a pair of hypotheses. Consider a random variable over a finite alphabet , whose distribution according to the hypothesis (respectively, ) is given by (respectively, ). It is common in the literature to refer to (respectively, ) as the null hypothesis (respectively, the alternative hypothesis). However, we will refrain from using such terminology, and the two hypotheses will be considered to have an equal stature.
Given i.i.d. observations , a (possibly randomized) detector
has type 1 and type 2 error probabilities^{9}^{9}9Also called the falsealarm probability and misdetection probability in engineering applications. given by
(1) 
and
(2) 
The family of detectors which optimally trades between the two types of error probabilities is given by the NeymanPearson lemma [1, Prop. II.D.1], [3, Th. 11.7.1] by^{10}^{10}10Since the two hypotheses are assumed to be discrete, randomized tiebreaking should be used if a given constraint on one of the error probabilities should be matched exactly. Nonetheless, this randomization has no effect on the exponential behavior, which is the focus of this paper (in the distributed scenario). Thus, we will not dwell on randomized detectors too much in what follows.
(3) 
where is a threshold parameter. This parameter controls the tradeoff between the probability of the two types of error  the larger is, the type 1 error probability increases and the type 2 error probability decreases, and viceversa.
To describe bounds on the error probabilities of the optimal detector, let us define the hypothesistesting reliability function [2, Section II] as
(4) 
For brevity, we shall omit the dependence on as they remain fixed and can be understood from context. As is well known [2, Th. 3], for a given , there exists a proper choice of such that
(5) 
(6) 
Furthermore, it is also known that this exponential behavior is optimal [2, Corollary 2], in the sense that if
then
It should be noted, however, that the detector (3) and the bounds on its error probability (5)(6) are exactly optimal for any given . In fact, in what follows, we will use these relations for .
The function is known to be a convex function of , continuous on and strictly decreasing prior to any interval on which it is constant [2, Th. 3]. Furthermore, it is known [2, Th. 7] that it can be represented as
(7) 
where
(8) 
is the Chernoff distance between distributions. The representation (7) will be used in the sequel to derive bounds on the reliability of DHT systems.
IiC Distributed HypothesisTesting
Let be independent copies of a pair of random variables , where and are finite alphabets. Under , the joint distribution of is given by , whereas under , this distribution is given by . We assume that both probability measures are absolutely continuous with respect to one another, i.e., , as well as ,^{11}^{11}11This implies that both types of Stein’s exponent for this problem are finite. and thus it can be assumed without loss of generality (w.l.o.g.) that and . For brevity, we denote the probability of an event under (respectively, ) by [respectively, ].
A DHT system , as depicted in Fig. 2, is defined by an encoder
which maps a source vector into an index , and a detector (possibly randomized^{12}^{12}12Randomized encoding can also be defined. In this case, the encoder takes the form , where is a probability vector whose th entry is the probability of mapping to the index . In the sequel, we will also use a rather simple form of randomized encoding, which does not require this general definition. There, the source vector will be used to randomly generate a new source vector , and the latter will be encoded by a deterministic encoder (see the proof of the achievability part of Theorem 6 in Appendix BA).)
The inverse image of for , i.e.,
is called the bin associated with index . The rate of is defined as , the type 1 error probability of is defined as
and the type 2 error probability is defined as
In the sequel, conditional error probabilities given an event will be abbreviated as, e.g.,
A sequence of DHT systems will be denoted by . The sequence is associated with two different exponents for each of the two probabilities defined above. The infimum type 1 exponent of a sequence of systems is defined by
(9) 
and the supremum type 1 exponent is defined by
(10) 
Analogous exponents can be defined for the type 2 error probability.
The reliability function of a DHT system is the optimal tradeoff between the two types of exponents achieved by any encoderdetector pair of a given rate . Specifically, the infimum DHT reliability function is defined by
and the supremum DHT reliability function is analogously defined, albeit with a . For brevity, the dependence on will be omitted henceforth whenever it is understood from context.
Iii Main Result: Bounds on The Reliability Function of DHT
To begin the discussion on the reliability function of DHT systems, we note that for a given encoder the form of the optimal detector is just a comparison of likelihoods, as in ordinary HT. Indeed, it readily follows from (3) that the optimal detector has the form
(11) 
for some which sets the tradeoff between the two error probabilities.
Hence, the characterization of the DHT reliability function is reduced to finding optimal encoders, to wit, a partition of the alphabet into bins, and then, for a given sequence of optimal encoders, finding singleletter expressions for the resulting error exponents, under optimal detection (11). These problems are much more challenging then the characterization of the optimal detector. Nonetheless, just like in distributed compression and channel coding problems mentioned in the introduction, achievability bounds can be derived using randomcoding ensembles. Specifically, the main result of this paper, which we next describe, is a randomcoding bound and an expurgated bound, obtained under an optimal NeymanPearson detector. Before that, we state the trivial converse bound, obtained when is not compressed, or alternatively, when (immediately deduced from the discussion in Section IIB).
Proposition 1.
The supremum DHT reliability function is bounded as
To state our achievability bound, we will need several additional notations. We denote the Chernoff distance between symbols by
(12) 
and between vectors by
(13) 
Further, when are distributed according to we define the average Chernoff distance as
(14) 
and when is distributed according to , we denote, for brevity,
(15) 
Next, we denote
(16) 
and
(17) 
as well as
(18) 
which are all related to a randomcoding based bound on the reliability function. We also denote
(19) 
which is related to an expurgated based bound on the reliability function. Finally, we also denote
(20) 
For brevity, arguments such as will sometimes be omitted henceforth.
Theorem 2.
The infimum DHT reliability function bounded as
(21) 
As hinted by (20), the best of a randomcoding bound and an expurgated bound can be chosen for any given input type In the case of a randomcoding bound, the achieving scheme is based on quantization and binning. In this respect, for such (with ), the conditional type is the test channel for quantizing source vectors into one of possible reproduction vectors, where the quantization rate satisfies . Then, these reproduction vectors are grouped to bins of size (at most) each, the binning rate satisfies . Both and may be optimized, separately for any given , to obtain the best type 2 exponent. In case the expurgated bound is better than the randomcoding bound for , the scheme which achieves it is based on binning at rate , without quantization.
We next discuss several implications of Theorem 2. First, simpler bounds, perhaps at the cost of worse exponents, can be obtained by considering two extermal choices. To obtain a binningbased scheme, without quantization, we choose to be a degenerated random variable (deterministic, i.e., ) and . We then get that dominates the minimization in (18), and
(22)  
(23) 
To obtain a quantizationbased scheme, without binning, we choose , and limit to satisfy .
Second, if the rate is large enough then no loss is expected in the reliability function of DHT. We can deduce from Theorem 2 an upper bound on this noloss rate, as follows.
Corollary 3.
Suppose that is sufficiently large such that
(24) 
for all and . Then,
(25)  
(26) 
The proof of this corollary appears in Appendix A.
Third, by setting , Theorem 2 yields an achievable bound on Stein’s exponent, as follows.
Corollary 4.
Stein’s exponent is bounded as
(27)  
(28) 
The first term in (28) can be identified as Stein’s exponent when the rate is not constrained at all. The proof of this corollary also appears in Appendix A, and it seems that no further significant simplifications are possible. It is worth to note, however, that the resulting bound is quite different from the best known bound by Shimokawa, Han and Amari [10, Th. 4.3], [13] (and its refinement in [32, 33]).
Fourth, it is interesting to examine the case . Using analysis similar to the proof of Corollary 4, it is easy to verify that using a binningbased scheme [i.e., substituting (23) in (21) for ] achieves the lower bound
As expected, this is the same type 2 error exponent obtained when is also encoded at zero rate, as obtained in [10, Th. 5.4], [11, Th. 6]. For this bound, a matching converse is known [10, Th. 5.5]. When then , and then Stein’s exponent is given by
In [15, Th. 2] it was determined that this exponent is optimal (even when is not encoded and given as sideinformation to the detector).
The rest of the paper is mainly devoted to the proof of Theorem 2, and is based on the following methodology, comprised of two steps. We will introduce CD codes, which, in a nutshell, correspond to a single bin of a DHT system. The first step of the proof is the reduction of the DHT reliability problem to the CD reliability problem which will be considered in Section IV. The second step is derivation of singleletter achievable bounds on the reliability of CD codes, and this will be considered in Section V. The bound of Theorem 2 on the DHT reliability function then follow as easy corollary to these results.
Iv From Distributed HypothesisTesting to ChannelDetection Codes
In this section, we formulate CD codes, and then use them to characterize the reliability of DHT systems. CD codes were considered in [42] for the problem of joint detection and decoding. For this purpose, the code has to be chosen to allow the receiver to detect the channel conditional probability, as well as for transmitting messages, just like an ordinary channel code. In this paper, each quantization cell of a DHT system will be considered and analyzed as a CD code. Since a DHT system is only required to decide on the hypothesis but not on the actual source vector, the error probability of CD codes (for transmitting messages) is irrelevant in this paper. However, this does not imply that all the codewords of CD code can be chosen to be identical (which is optimal in terms of detection performance), since by definition, the members of a quantization cell are different from one another. Hence, in what follows we will define CD codes of a given cardinality, and enforce their codewords to be different from one another. Here too, for brevity, we will denote the probability of an event under and by and , respectively. The required definitions for CD codes are quite similar to the ones required for DHT systems, but as some differences do exist, we explicitly outline them in what follows.
A CD code for a type class is given by (where all codewords must be different). An input
to the channel is chosen with a uniform distribution over
, and sent over uses of a DMC which may be either when is active or when is. The random channel output is given by . The detector has to decide based onwhether the DMC conditional probability distribution is
or . As for the DHT problem, we assume that and . A detector (possibly randomized) for is given byIn accordance, two error probabilities can be defined, to wit, the type 1 error probability
(29) 
and the type 2 error probability
(30) 
As for the DHT problem, the NeymanPearson lemma implies that the optimal detector is given by
(31) 
for some threshold .
Let be a given type, and let be the subsequence of blocklengths such that is not empty. As for a DHT sequence of systems , a sequence of CD codes is associated with two exponents. The infimum type 1 exponent of a sequence of codes and detector is defined as
(32) 
and the supremum type 1 exponent is similarly defined, albeit with a . Analogous exponents are defined for the type 2 error probability. In the sequel, we will construct DHT systems whose bins are good CD codes, for each . Since to obtain an achievability bound for a DHT system, good performance of CD codes of all types of the source vectors will be simultaneously required, the blocklengths of the components CD codes must match. Thus, the limit inferior definition of exponents must be used, as it assures convergence for all sufficiently large blocklength. For the converse bound, we will use the limit superior definition.
For a given type , rate , and type 1 constraint , we define the infimum CD reliability function as
(33) 
and the supremum CD reliability function is analogously defined, albeit with a . For brevity, the dependence on will be omitted whenever it is understood from context. Thus, the only difference in the reliability function of CD codes from ordinary HT, is that in CD codes the distributions are to be optimally designed under the rate constraint . Indeed, for symmetry implies that any is an optimal CD code. The detector in this case has no ambiguity regarding the transmitted symbol at any given time point. This, however, does not hold when , and ambiguity exists at at least a single time point. This additional uncertainty complicates the operation of the detector, and reduces the reliability function. Basic properties of are given as follows.
Proposition 5.
As a function of ,
are nonincreasing and have both limit from the right and from the left at every point. They have no discontinuities of the second kind and the set of first kind discontinuities (i.e., jump discontinuity points) is at most countable. Similar properties hold as a function of
Proof:
It follows from their definition that are nonincreasing in . The continuity statements follow from properties of monotonic functions [45, Th. 4.29 and its Corollary, Th. 4.30] (DarbouxFroda’s theorem). ∎
With the above, we can state the main result of this section, which is a characterization of the reliability of DHT systems using the reliability of CD codes.
Theorem 6.
The DHT reliability functions satisfy:

Achievability part:

Converse part:
The achievability and converse part match up to two discrepancies. First, in the achievability (respectively, converse) part the infimum (supremum) reliability function appears. This seems unavoidable, as it is not known if the infimum and supremum reliability functions are equal even for ordinary channel codes [4, Problem 10.7]. Second, the bounds include left and right limits of at rate and exponent . Nonetheless, due to monotonicity, is continuous function of and for all rates and exponents, perhaps excluding a countable set (Proposition 5). Thus, for any given there exists an arbitrarily close such that Theorem 6 holds with .
The proof of Theorem 6 appears in Appendix B. The achievability part (Appendix BA) is proved by first constructing a DHT system for source vectors from a a single type class of the source. The bins are generated by permutations of a “good” CD code. The fact that the two hypotheses are memoryless implies that bins generated this way have approximately the same exponents as the original CD code. Furthermore, since type classes are closed under permutations, they can be covered using permutations of a CD code. A covering lemma by Ahlswede [41, Section 6, Covering Lemma 2] shows that in fact such covering method can be effective, in the sense that the required number of permutations is close to minimal number possible (up to the first order in the exponent). This allows to prove that the encoding rate is as required. Ideas from this spirit were used for the DHT problem in [7], as well as for distributed compression [39, 40], and secure lossy compression [46]. Afterwards, all types of the source are considered simultaneously to generate a DHT system for any possible type of the source vector. This requires proving the uniform convergence of the error probabilities to their asymptotic exponential behavior, uniformly over all possible types.
The proof of the converse part (Appendix BB) is based upon identifying a sequence of bins whose size and conditional exponents are close to typical values of the entire DHT system. Such a bin corresponds to a CD code, and thus clearly cannot have better exponents than the ones dictated by the reliability function of CD codes. This restriction is then translated back to bound the reliability of DHT systems.
This theorem parallels a similar result of [39, 40] for the reliability functions of distributed compression. By analogy, the reliability function of distributed compression can be characterized by the reliability function of ordinary channel codes (see Fig. 1). The latter can be bounded using wellknown classic bounds, such as the randomcoding and expurgated achievability bounds, and the spherepacking, zerorate, and straightline converse bounds [4, Ch. 10]. The characterization of Theorem 6 articulates the fact that the reliability of DHT systems is directly connected to the problem of determining the reliability of CD codes. However, even this apparently simpler problem is challenging to solve exactly, and just as in the case of ordinary channel coding, perhaps only lower and upper bounds are at reach. Hence, even though DHT is a source coding problem in nature, for the such reliability functions are known (e.g., Marton’s exponent [47]), Theorem 6 reveals that the reliability of DHT systems depends on the reliability of a channel coding problem, for the such optimal exponents are typically not entirely known, even for the most basic settings. This might manifest the intrinsic difficulty of the DHT problem. Nonetheless, in the next section, we derive concrete bounds on the reliability of CD codes, which, using Theorem 6, lead directly to bounds on the reliability of the DHT problem of Theorem 2.
V Bounds on the Reliability of CD Codes
In the previous section, we have shown that the reliability function of DHT can be directly obtained by the reliability of CD codes. In this section, we derive bounds on the latter, using randomcoding arguments. A naive approach would be to randomly draw the codewords independently from some distribution. However, as discussed in the introduction, the best known achievable bounds for DHT systems are obtained using quantizationandbinningbased schemes. As CD codes correspond to a single bin of a DHT system,^{13}^{13}13Strictly speaking, this is true only when the type of the source vectors is constant within each bin. Since sending the type class of the source vector requires negligible rate, it can be assumed w.l.o.g. that this is indeed the case, as otherwise, one can further partition the bin into typeclasses. it follows that CD codes can also benefit from adding dependence between the randomly drawn codewords. In more detail, a bin of a DHT system which was constructed by the quantizationandbinning method corresponds to a superposition code. To wit, it will contain a set of reproduction vectors which are sufficiently different from one another, so that the sideinformation vector enables to decide on the true reproduction vector with high reliability, and each reproduction vector is surrounded by a quantization cell, which correspond to all source vectors which are mapped to it in the quantization process. The quantization cell should be sufficiently “small”, such that as long as the detector correctly decodes the reproduction vector, the detection reliability given the reproduction vector is close to the detection reliability given the true vector. A CD code which corresponds to a binningbased scheme will contain source vectors which correspond to a good channel code for the channels and , so that the sideinformation vector can be used to decode the true vector with high reliability. A CD code which corresponds to a quantizationbased scheme will look like a quantization cell of a single reproduction vector. The various types of CD codes are depicted in Fig. 3.
Since a single bin of a quantizationandbinning DHT system corresponds to a superposition code, it should be drawn from an hierarchical ensemble, which we next define:
Definition 7.
A fixedcomposition hierarchical ensemble for a type and rate is defined by a conditional type , where is an auxiliary random variable from a finite alphabet , a cloudcenter rate , and a satellite rate such that . A codebook from this ensemble is drawn in two stages. First, cloud centers are drawn, independently and uniformly over . Second, for each of the cloud centers , satellites are drawn independently and uniformly over . When considered a random entity, the codebook will be denoted by .
Evidently, codewords which pertain to the same cloud are dependent, whereas codewords from different clouds are independent. Further, the ordinary ensemble, in which each codeword is drawn uniformly over , independently of all other codewords, can be obtained as a special case by choosing and . On the other hand, when , then a single^{14}^{14}14Or a subexponential number of cloud centers. cloud center is drawn, and all codewords are satellites of this center. This corresponds to the bin of a DHT system which is based only on quantization, without binning. More generally, for source vectors of type , a quantizationandbinning scheme of rate , binning rate , and quantization rate , leads to CD codes of rate
Comments
There are no comments yet.