A minimum -image of a set over a discrete memoryless channel (DMC), specified by the conditional distribution , is the smallest set such that for all . For any -maximum error, -length code over , the decoding subset of for a particular message value must constitute a -image of the subset of corresponding to the message value, for some small . Noting this, the intuitive sphere packing argument for channel capacity naturally extends by interpreting the minimum -image as the “sphere” of the smallest size mapped to from the codewords of an -error code (see section II-B for more details). Expressing capacity results in terms of minimum image sizes has many advantages, such as allowing for expressions of channel capacity as a function of . Furthermore because images sizes are not functions of the distribution of
, they are apt for use in joint source-channel problems for which messages may not be uniformly distributed. Unfortunately there are also significant drawbacks to analysis by minimum image size. For instance, there is no currently known method by which to calculate the minimum image size of any arbitrary set other than a singleton. This is perhaps why it is common to instead employ “spheres” whose sizes can be expressed in terms of entropies in the sphere packing argument. Entropies allow for simple algebraic manipulations and hence lead to simple representations of the capacity of many basic channels. These two different types of characterizations are often referred to as image size characterization and entropy size characterization, and the sets of possible image size characterizations and entropy size characterizations are referred to as the achievable exponent and achievable entropy regions, respectively. In order to take advantage of image size characterizations, we need to express minimum image sizes in terms of entropies. As Csiszár and Körner note in[1, p. 339] though
We shall see in Chapter 16 that the corresponding image size characterizations can be used to prove strong converse results for source networks and also to solve channel network problems. In this respect, it is important that the sets of achievable entropy resp. exponent triples have the same two dimensional projections see Theorems 15.11 and 15.20. The two sets, however, need not be equal; their relationship is described by Corollary 15.13.
The primary motivation of this work is to rectify this incongruity, and in doing so provide new stronger necessary conditions for reliable communications that have both the robustness of image size techniques while maintaining the algebraic flexibility of entropies.
In a three-terminal setting with a single message, it has been well established that the two-dimensional projections of image size characterization and the entropy characterization are equal [1, Theorem 15.11]. Results beyond three terminals are rare and partial. In addition, in multi-terminal settings there typically exist multiple receivers which are only required to decode a subset of the messages. In an earlier paper , we have shown that every source set may be partitioned into subsets, within each the entropy and image size characterizations are equal. The first significant contribution of the current paper is to extend this partitioning method to simultaneously account for multiple messages and multiple receivers. Over every partitioning subset, the image size characterization and the entropy characterization are equal in that the exponential orders of the minimum image sizes for nearly all messages are equal to the same entropy quantity. Furthermore, the partition results in the distribution of the messages being nearly uniform over every partitioning subset, while the number of partitioning subsets remains polynomial in .
Our second significant contribution, new necessary conditions for reliable communications over multi-terminal DMCs, then follows. These necessary conditions (see Theorems 19 and 23) are direct consequences of the equal-image-size partitions described above. More specifically, by the blowing up lemma [1, Ch. 5], the exponential order of the minimum image size is effectively invariant to the value of . Due to the equality between image size and entropy characterizations by our partitioning approach, the entropy terms in the sphere packing argument for codes with small error probabilities are nearly equivalent to those for codes with larger error probabilities. This suggests that the necessary conditions of reliable communications expressed in terms of these entropies may be made effectively invariant to the decoding error probabilities. Another way to look at these necessary conditions is that they imply all codes may only increase their rates by allowing transmissions which have nearly zero probability of decoding. Errors of this type have previously been considered by Effros et al. in regards to composite channels, where the probability of an error of this type occurring was deemed the outage probability .
From our new necessary conditions we may obtain more traditional, stronger versions of Fano’s inequality. The strong inequalities with regards to average probability of error work for nearly uniform messages (see Corollary 24) and information-stable messages (see Corollary 26), while the maximum-error version (see Corollary 22) applies universally. We deem these particular results as strong Fano’s inequalities because we may write them in the form of the standard Fano’s inequality except for the error term being replaced by a term which almost universally vanishes. Much of the complexity in regards to this paper revolves around crafting necessary conditions which are easy to apply, and apply directly to many active research problems. To demonstrate the power of the results, we present as an application example a simple solution to the strong converse problem for the discrete-memoryless wiretap channel (DM-WTC) with vanishing leakage, which heretofore has been an open problem.
We organize the rest of the paper as follows. Background on the methods used and similar approaches will be discussed first in section II. A preview of our main results will be provided in section III with an example showing application of the strong average-error Fano’s inequality to prove the strong converse for the DM-WTC. The mathematical machinery that we employ to establish equal-image-size source partitioning will be developed in sections IV and V. The proposed equal-image-size source partition will be developed in section VI. The new necessary conditions for reliable communications and strong Fano’s inequalities will come in section VII. Finally we will conclude this paper in section VIII with a brief list of some basic multi-terminal DMCs to which our results immediately apply.
I-a A note on notation
The notation used in this paper mostly follows that employed in 
, except for example the mutual information between a pair of random variablesand is written in the more common notation of . Moreover, the notation for conditional entropy will be slightly abused throughout the paper. Within, when a quantity such as is expressed it will mean , where
is an indicator random variable taking the valueif and if not.
To simplify writing, let denote the set of integers starting at and ending at , inclusively. When we refer to as an index set, we restrict to be discrete. A random index is a random variable distributed over an index set. Let be
random indices joint distributed over. For any , we write and as shorthand forms of and , respectively.
Consider a pair of discrete random variablesand over alphabets and , respectively. For any such that , whenever there is no ambiguity we use to denote for brevity. For any , a set is called an -image of over the DMC [1, Ch. 15] if for every . On the other hand, is called an -quasi-image of over [1, Problem 15.13] if . The minimum size of -images of over will be denoted by , while the minimum size of -quasi-images of over will be denoted by .
Ii-a Fano’s inequality
Fano’s inequality is one of the most widely used inequalities in the field of information theory. First appearing in Fano’s class notes , the inequality can be used to relate the entropy of a message , distributed over an index set , conditioned on a reconstruction with the probability of error of that reconstruction . The exact inequality
can be tight for specific , , and . It is most commonly used in proving converses of coding theorems, where when combined with the data processing inequality [1, Lemma 3.11], results in
We then can say if and is a finite constant, asymptotically upper bounds . In channel coding problems, the message is uniform and so asymptotically upper bounds the code rate . Fano’s inequality also works in joint source-channel coding problems, as is used in proving the source-channel separation theorem for the two-terminal DMC [5, Pg. 221]. The most general form of Fano’s inequality to date is due to Han and Verdú , who removed the constraint that at least one of the random variables involved in the inequality be discrete.
As Wolfowitz first showed, even with a non-vanishing decoding error probability, the upper bound on the rate of messages that can be transmitted through a two-terminal DMC is asymptotically equal to that with a vanishing error probability . Wolfowitz introduced the concept of capacity dependent upon error, usually denoted by . Following the terminology of Csiszár and Körner [1, Pg. 93], a converse result showing for all is called a strong converse. Verdú and Han  showed the stronger assertions that this is true for all finite , and that all rates larger must have error probability approaching unity hold for all two-terminal DMCs.
Clearly though the bound in Fano’s inequality is influenced by the probability of error . This dependence makes Fano’s inequality ill-suited for application to channel codes with non-vanishing error probabilities. This in turn has lead to other different methods of proving strong converses, such as the meta-converse proposed by Polyanskiy et al. . The meta-converse leverages the idea that any decoder can be considered as a binary hypothesis test between the correct codeword set and the incorrect codeword set. Bounding the decoding error by the best binary hypothesis test, new bounds, which are relatively tight even for small values of , can be established. In contrast to the original version of Fano’s inequality, the stronger versions presented in Corollaries 22, 24, and 26 directly apply to codes with non-vanishing decoding error probabilities over multi-terminal DMCs.
Fano’s inequality is also problematic when used in regards to characterizing joint source-channel coding (JSCC) problems. Using Fano’s inequality for JSCC problem necessitates either the restriction of vanishing error probabilities, or that messages (sources) whose probability exponents converge to the sources’ entropy rates. Both of these restrictions are limiting as results by Kostina et al.  suggest that allowing non-vanishing error probabilities in conjunction with compression may lead to increased rates. In contrast to the original version of Fano’s inequality, the necessary conditions supplied by Theorems 19 and 23 can be used to upper bound such rate gains in JSCC problems over multi-terminal DMCs.
Ii-B Image size characterizations
Image size characterizations, originally introduced in Gács and Körner  and Ahlswede et al. , are of particular importance for DMCs due to the blowing up lemma [1, Ch. 5]. Margulis  first introduced the blowing up lemma to study hop distance in hyper-connected graphs. In the context of DMCs, it can be used to show that any -image with not decaying too fast is close in size to a -image with not approaching unity too fast (see [1, Lemma 6.6] or Lemma 10). Ahlswede  used the blowing up lemma to prove a local strong converse for maximal error codes over a two-terminal DMC, showing that all bad codes have a good subcode of almost the same rate. Using the same lemma, Körner and Martin  developed a general framework for determining the achievable rates of a number of source and channel networks. On the other hand, many of the strong converses for some of the most fundamental multi-terminal DMCs studied in literature were proven using image size characterization techniques. Körner and Martin  employed such a technique to prove the strong converse of a discrete memoryless broadcast channel with degraded message sets. Dueck  used these methods to prove the strong converse of the discrete memoryless multiple access channel with independent messages.
For a detailed overview of image size characterization techniques, see [1, Chs. 5, 6, 15, 16]. Here we briefly summarize the sphere packing argument in [1, Ch. 6] to motivate the development of the results in this paper. Consider sending a uniform message from the message set over a two-terminal DMC specified by using a -maximal error channel code with . For the purposes of simple discussion here, assume that the encoder and the decoder are both deterministic. Let denote the set of codewords used by . Pick such that and let be a minimum -image of over . That is, . Let denote the decoding region for the message . The maximum error requirement implies that for all . Hence we have . In other words, this means that is a -image of the singleton , and hence for every . It is clear now that the subsets for serve as the “spheres” in the sphere packing argument. More specifically,
As a result, we have just obtained an upper bound on the rate of the -maximal error channel code in terms of minimum image sizes. Moreover as a consequence of the blowing up lemma (see [1, Lemma 6.6] or Lemma 10), the terms on the right hand side of (1) remain roughly the same regardless of the value of within the range of . Thus, unlike the standard Fano’s inequality, this bound may be used to establish the strong converse of the DMC.
Nevertheless usefulness of code rate bounds expressed in terms of minimum image sizes, like (1), depends upon the availability of simple image size characterizations. As mentioned before, while such characterizations exist for the two-terminal DMC (see [1, Ch. 6]) and the three-terminal DMC with a single message (see [1, Ch. 15]), simple image size characterizations for more general channels have been largely missing. This motivates us to develop the proposed tool of equal-image-size source partitioning (see Theorem 18) to solve the more general image size characterization problem and to apply this tool to obtain more general necessary conditions of reliable communications over multi-terminal DMCs (see section VII).
Iii Preview of main results
The main result of this paper is the proposed (nearly) equal-image-size partitioning of a source simultaneously over a number of DMCs. Consider a set of nearly uniform sequences are mapped to by the DMC for . The set can be partitioned by indices . Then we may partition in another way into at most subsets. Index this new partition by . Consider the intersection of this new partition and any old partition indexed by where , and denote each partitioning subset in the intersection by . Fixing any , the minimum image sizes of “most” of the partitioning subsets are approximately of the same exponential order. More specifically, for “most” . The qualifier “most” above may again be quantified in terms of exponential order. The more precise statement of this source partitioning method will be developed in the following sections, culminating in the results described in Theorem 18.
As mentioned in the previous section, one main application of image size characterizations is to find outer bounds on the capacity regions of multi-terminal DMCs. With the aid of equal-image-size source partitioning, we are able to develop strong versions of Fano’s inequality for multi-terminal DMCs that do not require the decoding error probabilities to vanish. These stronger versions of Fano’s inequality provide us an easy-to-use tool to find outer bounds of capacity regions for codes with non-vanishing error probabilities. Consider the multi-terminal communication scenario in which a set of messages ranging over , respectively, are to be sent to receivers through a set of DMCs 111Because only marginal decoding errors made at individual receivers are of concern, the marginal conditional distributions are sufficient in specifying all such error events. As a result, we speak of “a set of DMCs” rather than “the multi-terminal DMC specified by .”. The set of possible codewords is denoted by , which can be any subset of . Let be any non-empty subsets of with the interpretation that the th receiver is to decode the message . Let denote the (possibly stochastic) encoding function and denote the (possibly stochastic) decoding function employed by the th receiver, for . Note that distributed encoding is allowed in this model. For example, if there are distributed encoders, each generates a codeword in for , we may set , , where , and disjointly distribute the messages to the encoders. Then the following two stronger versions of Fano’s inequality are some of the main results that we will present in section VII:
Strong maximum-error Fano’s inequality
If the encoder-decoder pairs have maximum errors
for all , then there exists such that
for all .
Strong average-error Fano’s inequality
If the encoder-decoder pairs have average errors for all , then there exist , a random index over an index set with at most elements for some , and satisfying such that and
for all , as long as is uniformly distributed.
The strong converse for the general discrete memoryless wiretap channel (DM-WTC) is a heretofore open problem. The best known results were derived by Tan and Bloch  and independently by Hayashi et al. , and only pertain to the case where the wiretap channel is degraded. Such a scenario reduces the complexity by not requiring an auxiliary random variable to characterize the secrecy capacity. In particular, Tan and Bloch accomplish their result using an information spectrum approach, while Hayashi et al consider the question in regards to active hypothesis testing222It should be noted that their result simultaneously applied to both secret message transmission and secret key agreement, and allows for arbitrary leakage.. As a simple application example for our results, we employ the strong average-error Fano’s inequality to provide a strong converse for the general DM-WTC.
The DM-WTC consists of a sender , a legitimate receiver , and an eavesdropper . For any , a uniformly distributed message over the message set is to be sent reliably from to and discreetly against eavesdropping by . For any , consider the encoding function and the decoding function . For any and , a -code for the DM-WTC is any code which meets the following two requirements:
Like , we impose the decaying leakage requirement of .
Apply the strong average-error Fano’s inequality above to the DM-WTC with the reliability requirement, we obtain an index set with for some , a random index over , , and such that and
for all . Let be a random index over defined by the conditional distribution . Since , we also have . From (2), we obtain
On the other hand, from the leakage requirement
Noting that , and following the steps of [20, Section 22.1.2], we may obtain
for some over with such that . This proves the strong converse for the general DM-WTC with decaying leakage.
Iv Partitioning Index and Entropy Spectrum Partition
In this section, we describe the notions of partitioning index, entropy spectrum partition (slicing) , and nearly uniform distribution. They provide the basic machinery that we will employ in later sections to develop source partitioning results. The entropy spectrum partition method that we use here is a slight variant within the class of information/entropy spectrum slicing methods developed in . This class of methods find many different applications in information theory (see  for more detailed discussions).
While the definitions and results are stated for the sequence space , they clearly extend to other sequence spaces. When we say is distributed over , it is assumed with no loss of generality that for all . Otherwise we may just remove the zero-probability sequences from .
Let and be an index set. Let and be jointly distributed random variables over and , respectively. For each , define . Then is called a partitioning index of with respect to (w.r.t.) if and for all . We may simply say partitions when the underlying distribution of over is clear from the context.
Consider any and partitioning indices w.r.t. over .
Suppose that partitions . Then if and only if . In addition, for all such that , . Equivalently, where if .
Suppose that partitions . Then for every non-empty , partitions w.r.t with . As is clear from the context, we may simply say also partitions .
is a partitioning index of if and only if and are both partitioning indices of .
Let and be partitioning indices of . Then . Thus we may write in place of or . Furthermore, for each such that , is a partitoning index of w.r.t. (or equivalently ) with . Hence if the latter event is non-empty.
Let be a partitioning index of w.r.t. . Let be a collection of random indices such that is a partitioning index of w.r.t. , distributed over . Then is a partitioning index w.r.t. .
First, it is obvious from the definition of that if and only if . Consider now for each such that . Then , again due to the very definition of . Hence implies . On the other hand,
where the last equality is due to the fact that is a partitioning index of . Hence implies . Hence by setting if , with probability one.
Note that , and hence . Thus partitions w.r.t. .
First, suppose that is a partitioning index of . Clearly . Hence . In addition, for any , . Therefore is a paritioning index of . The same argument also applies to show that is a paritioning index of .
On the other hand, suppose that both and are partitioning indices of . Clearly we have . For any , . But for every , since , and hence for some . This means that . Therefore is a partitioning index of .
Suppose that and are partitioning indices of w.r.t. . Then by part 3) of the lemma, is also a partitioning index of w.r.t. . Moreover, by part 1) of the lemma, and are conditionally independent given . That is, we have , which implies . As above, we also clearly have . Therefore .
Consider any fixed such that (i.e., ). By part 1) of the lemma, we have . Further, by part 2) of the lemma, we have partitions w.r.t. with . The final assertion then results directly from part 1).
For convenience in notation, write , which distributes over . Note that . Hence . Clearly
and for all . Therefore is a partitioning index of w.r.t. .
All parts of Lemma 2 will be used repetitively many times in the rest of the paper. To avoid prolixity, we will not explicitly refer to each use of the lemma.
Let . Let be a distribution on , and be the corresponding entropy spectrum. For any , define and the -entropy spectrum partition of w.r.t. as , where
Clearly because of our convention that for all .
Suppose is the -entropy spectrum partition of w.r.t. , and if . The random variable is clearly a partitioning index of w.r.t. , and is conditionally independent of any other partitioning index of given .
Let be the -entropy spectrum partition of w.r.t. . Then for every , ,
In addition if , then
Trivially we have
and therefore . Similarly suppose that . Then
and therefore . Combining both results gives us . ∎
Let be the -entropy spectrum partition of w.r.t. . Then for each , , satisfying