Equal-image-size source partitioning: Creating strong Fano's inequalities for multi-terminal discrete memoryless channels

by   Eric Graves, et al.
University of Florida

This paper introduces equal-image-size source partitioning, a new tool for analyzing channel and joint source-channel coding in a multi-terminal discrete memoryless channel environment. Equal-image-size source partitioning divides the source (combination of messages and codewords) into a sub-exponential number of subsets. Over each of these subsets, the exponential orders of the minimum image sizes of most messages are roughly equal to the same entropy term. This property gives us the strength of minimum image sizes and the flexibility of entropy terms. Using the method of equal-image-size source partitioning, we prove separate necessary conditions for the existence of average-error and maximum-error codes. These necessary conditions are much stronger than the standard Fano's inequality, and can be weakened to render versions of Fano's inequality that apply to codes with non-vanishing error probabilities. To demonstrate the power of this new tool, we employ the stronger average-error version of Fano's inequality to prove the strong converse for the discrete memoryless wiretap channel with decaying leakage, which heretofore has been an open problem.



There are no comments yet.


page 1

page 2

page 3

page 4


Equating the achievable exponent region to the achievable entropy region by partitioning the source

In this paper we investigate the image size characterization problem. We...

Wiretap channel capacity: Secrecy criteria, strong converse, and phase change

This paper employs equal-image-size source partitioning techniques to de...

A New Inequality Related to Proofs of Strong Converse Theorems for Source or Channel Networks

In this paper we provide a new inequality useful for the proofs of stron...

Multi-Way Number Partitioning: an Information-Theoretic View

The number partitioning problem is the problem of partitioning a given l...

Reed-Muller Codes Achieve Capacity on BMS Channels

This paper considers the performance of long Reed-Muller (RM) codes tran...

Decision Procedure for the Existence of Two-Channel Prefix-Free Codes

The Kraft inequality gives a necessary and sufficient condition for the ...

Co-clustering separately exchangeable network data

This article establishes the performance of stochastic blockmodels in ad...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A minimum -image of a set over a discrete memoryless channel (DMC), specified by the conditional distribution , is the smallest set such that for all . For any -maximum error, -length code over , the decoding subset of for a particular message value must constitute a -image of the subset of corresponding to the message value, for some small . Noting this, the intuitive sphere packing argument for channel capacity naturally extends by interpreting the minimum -image as the “sphere” of the smallest size mapped to from the codewords of an -error code (see section II-B for more details). Expressing capacity results in terms of minimum image sizes has many advantages, such as allowing for expressions of channel capacity as a function of . Furthermore because images sizes are not functions of the distribution of

, they are apt for use in joint source-channel problems for which messages may not be uniformly distributed. Unfortunately there are also significant drawbacks to analysis by minimum image size. For instance, there is no currently known method by which to calculate the minimum image size of any arbitrary set other than a singleton. This is perhaps why it is common to instead employ “spheres” whose sizes can be expressed in terms of entropies in the sphere packing argument. Entropies allow for simple algebraic manipulations and hence lead to simple representations of the capacity of many basic channels. These two different types of characterizations are often referred to as image size characterization and entropy size characterization, and the sets of possible image size characterizations and entropy size characterizations are referred to as the achievable exponent and achievable entropy regions, respectively. In order to take advantage of image size characterizations, we need to express minimum image sizes in terms of entropies. As Csiszár and Körner note in

[1, p. 339] though

We shall see in Chapter 16 that the corresponding image size characterizations can be used to prove strong converse results for source networks and also to solve channel network problems. In this respect, it is important that the sets of achievable entropy resp. exponent triples have the same two dimensional projections see Theorems 15.11 and 15.20. The two sets, however, need not be equal; their relationship is described by Corollary 15.13.

The primary motivation of this work is to rectify this incongruity, and in doing so provide new stronger necessary conditions for reliable communications that have both the robustness of image size techniques while maintaining the algebraic flexibility of entropies.

In a three-terminal setting with a single message, it has been well established that the two-dimensional projections of image size characterization and the entropy characterization are equal [1, Theorem 15.11]. Results beyond three terminals are rare and partial. In addition, in multi-terminal settings there typically exist multiple receivers which are only required to decode a subset of the messages. In an earlier paper [2], we have shown that every source set may be partitioned into subsets, within each the entropy and image size characterizations are equal. The first significant contribution of the current paper is to extend this partitioning method to simultaneously account for multiple messages and multiple receivers. Over every partitioning subset, the image size characterization and the entropy characterization are equal in that the exponential orders of the minimum image sizes for nearly all messages are equal to the same entropy quantity. Furthermore, the partition results in the distribution of the messages being nearly uniform over every partitioning subset, while the number of partitioning subsets remains polynomial in .

Our second significant contribution, new necessary conditions for reliable communications over multi-terminal DMCs, then follows. These necessary conditions (see Theorems 19 and 23) are direct consequences of the equal-image-size partitions described above. More specifically, by the blowing up lemma [1, Ch. 5], the exponential order of the minimum image size is effectively invariant to the value of . Due to the equality between image size and entropy characterizations by our partitioning approach, the entropy terms in the sphere packing argument for codes with small error probabilities are nearly equivalent to those for codes with larger error probabilities. This suggests that the necessary conditions of reliable communications expressed in terms of these entropies may be made effectively invariant to the decoding error probabilities. Another way to look at these necessary conditions is that they imply all codes may only increase their rates by allowing transmissions which have nearly zero probability of decoding. Errors of this type have previously been considered by Effros et al. in regards to composite channels, where the probability of an error of this type occurring was deemed the outage probability [3].

From our new necessary conditions we may obtain more traditional, stronger versions of Fano’s inequality. The strong inequalities with regards to average probability of error work for nearly uniform messages (see Corollary 24) and information-stable messages (see Corollary 26), while the maximum-error version (see Corollary 22) applies universally. We deem these particular results as strong Fano’s inequalities because we may write them in the form of the standard Fano’s inequality except for the error term being replaced by a term which almost universally vanishes. Much of the complexity in regards to this paper revolves around crafting necessary conditions which are easy to apply, and apply directly to many active research problems. To demonstrate the power of the results, we present as an application example a simple solution to the strong converse problem for the discrete-memoryless wiretap channel (DM-WTC) with vanishing leakage, which heretofore has been an open problem.

We organize the rest of the paper as follows. Background on the methods used and similar approaches will be discussed first in section II. A preview of our main results will be provided in section III with an example showing application of the strong average-error Fano’s inequality to prove the strong converse for the DM-WTC. The mathematical machinery that we employ to establish equal-image-size source partitioning will be developed in sections IV and V. The proposed equal-image-size source partition will be developed in section VI. The new necessary conditions for reliable communications and strong Fano’s inequalities will come in section VII. Finally we will conclude this paper in section VIII with a brief list of some basic multi-terminal DMCs to which our results immediately apply.

I-a A note on notation

The notation used in this paper mostly follows that employed in [1]

, except for example the mutual information between a pair of random variables

and is written in the more common notation of . Moreover, the notation for conditional entropy will be slightly abused throughout the paper. Within, when a quantity such as is expressed it will mean , where

is an indicator random variable taking the value

if and if not.

To simplify writing, let denote the set of integers starting at and ending at , inclusively. When we refer to as an index set, we restrict to be discrete. A random index is a random variable distributed over an index set. Let be

random indices joint distributed over

. For any , we write and as shorthand forms of and , respectively.

Consider a pair of discrete random variables

and over alphabets and , respectively. For any such that , whenever there is no ambiguity we use to denote for brevity. For any , a set is called an -image of over the DMC [1, Ch. 15] if for every . On the other hand, is called an -quasi-image of over [1, Problem 15.13] if . The minimum size of -images of over will be denoted by , while the minimum size of -quasi-images of over will be denoted by .

Ii Background

Ii-a Fano’s inequality

Fano’s inequality is one of the most widely used inequalities in the field of information theory. First appearing in Fano’s class notes [4], the inequality can be used to relate the entropy of a message , distributed over an index set , conditioned on a reconstruction with the probability of error of that reconstruction . The exact inequality

can be tight for specific , , and . It is most commonly used in proving converses of coding theorems, where when combined with the data processing inequality [1, Lemma 3.11], results in

We then can say if and is a finite constant, asymptotically upper bounds . In channel coding problems, the message is uniform and so asymptotically upper bounds the code rate . Fano’s inequality also works in joint source-channel coding problems, as is used in proving the source-channel separation theorem for the two-terminal DMC [5, Pg. 221]. The most general form of Fano’s inequality to date is due to Han and Verdú [6], who removed the constraint that at least one of the random variables involved in the inequality be discrete.

As Wolfowitz first showed, even with a non-vanishing decoding error probability, the upper bound on the rate of messages that can be transmitted through a two-terminal DMC is asymptotically equal to that with a vanishing error probability [7]. Wolfowitz introduced the concept of capacity dependent upon error, usually denoted by . Following the terminology of Csiszár and Körner [1, Pg. 93], a converse result showing for all is called a strong converse. Verdú and Han [8] showed the stronger assertions that this is true for all finite , and that all rates larger must have error probability approaching unity hold for all two-terminal DMCs.

Clearly though the bound in Fano’s inequality is influenced by the probability of error . This dependence makes Fano’s inequality ill-suited for application to channel codes with non-vanishing error probabilities. This in turn has lead to other different methods of proving strong converses, such as the meta-converse proposed by Polyanskiy et al. [9]. The meta-converse leverages the idea that any decoder can be considered as a binary hypothesis test between the correct codeword set and the incorrect codeword set. Bounding the decoding error by the best binary hypothesis test, new bounds, which are relatively tight even for small values of , can be established. In contrast to the original version of Fano’s inequality, the stronger versions presented in Corollaries 2224, and 26 directly apply to codes with non-vanishing decoding error probabilities over multi-terminal DMCs.

Fano’s inequality is also problematic when used in regards to characterizing joint source-channel coding (JSCC) problems. Using Fano’s inequality for JSCC problem necessitates either the restriction of vanishing error probabilities, or that messages (sources) whose probability exponents converge to the sources’ entropy rates. Both of these restrictions are limiting as results by Kostina et al. [10] suggest that allowing non-vanishing error probabilities in conjunction with compression may lead to increased rates. In contrast to the original version of Fano’s inequality, the necessary conditions supplied by Theorems 19 and 23 can be used to upper bound such rate gains in JSCC problems over multi-terminal DMCs.

Ii-B Image size characterizations

Image size characterizations, originally introduced in Gács and Körner [11] and Ahlswede et al. [12], are of particular importance for DMCs due to the blowing up lemma [1, Ch. 5]. Margulis [13] first introduced the blowing up lemma to study hop distance in hyper-connected graphs. In the context of DMCs, it can be used to show that any -image with not decaying too fast is close in size to a -image with not approaching unity too fast (see [1, Lemma 6.6] or Lemma 10). Ahlswede [14] used the blowing up lemma to prove a local strong converse for maximal error codes over a two-terminal DMC, showing that all bad codes have a good subcode of almost the same rate. Using the same lemma, Körner and Martin [15] developed a general framework for determining the achievable rates of a number of source and channel networks. On the other hand, many of the strong converses for some of the most fundamental multi-terminal DMCs studied in literature were proven using image size characterization techniques. Körner and Martin [16] employed such a technique to prove the strong converse of a discrete memoryless broadcast channel with degraded message sets. Dueck [17] used these methods to prove the strong converse of the discrete memoryless multiple access channel with independent messages.

For a detailed overview of image size characterization techniques, see [1, Chs. 5, 6, 15, 16]. Here we briefly summarize the sphere packing argument in [1, Ch. 6] to motivate the development of the results in this paper. Consider sending a uniform message from the message set over a two-terminal DMC specified by using a -maximal error channel code with . For the purposes of simple discussion here, assume that the encoder and the decoder are both deterministic. Let denote the set of codewords used by . Pick such that and let be a minimum -image of over . That is, . Let denote the decoding region for the message . The maximum error requirement implies that for all . Hence we have . In other words, this means that is a -image of the singleton , and hence for every . It is clear now that the subsets for serve as the “spheres” in the sphere packing argument. More specifically,

which implies


As a result, we have just obtained an upper bound on the rate of the -maximal error channel code in terms of minimum image sizes. Moreover as a consequence of the blowing up lemma (see [1, Lemma 6.6] or Lemma 10), the terms on the right hand side of (1) remain roughly the same regardless of the value of within the range of . Thus, unlike the standard Fano’s inequality, this bound may be used to establish the strong converse of the DMC.

Nevertheless usefulness of code rate bounds expressed in terms of minimum image sizes, like (1), depends upon the availability of simple image size characterizations. As mentioned before, while such characterizations exist for the two-terminal DMC (see [1, Ch. 6]) and the three-terminal DMC with a single message (see [1, Ch. 15]), simple image size characterizations for more general channels have been largely missing. This motivates us to develop the proposed tool of equal-image-size source partitioning (see Theorem 18) to solve the more general image size characterization problem and to apply this tool to obtain more general necessary conditions of reliable communications over multi-terminal DMCs (see section VII).

Iii Preview of main results

The main result of this paper is the proposed (nearly) equal-image-size partitioning of a source simultaneously over a number of DMCs. Consider a set of nearly uniform sequences are mapped to by the DMC for . The set can be partitioned by indices . Then we may partition in another way into at most subsets. Index this new partition by . Consider the intersection of this new partition and any old partition indexed by where , and denote each partitioning subset in the intersection by . Fixing any , the minimum image sizes of “most” of the partitioning subsets are approximately of the same exponential order. More specifically, for “most” . The qualifier “most” above may again be quantified in terms of exponential order. The more precise statement of this source partitioning method will be developed in the following sections, culminating in the results described in Theorem 18.

As mentioned in the previous section, one main application of image size characterizations is to find outer bounds on the capacity regions of multi-terminal DMCs. With the aid of equal-image-size source partitioning, we are able to develop strong versions of Fano’s inequality for multi-terminal DMCs that do not require the decoding error probabilities to vanish. These stronger versions of Fano’s inequality provide us an easy-to-use tool to find outer bounds of capacity regions for codes with non-vanishing error probabilities. Consider the multi-terminal communication scenario in which a set of messages ranging over , respectively, are to be sent to receivers through a set of DMCs 111Because only marginal decoding errors made at individual receivers are of concern, the marginal conditional distributions are sufficient in specifying all such error events. As a result, we speak of “a set of DMCs” rather than “the multi-terminal DMC specified by .”. The set of possible codewords is denoted by , which can be any subset of . Let be any non-empty subsets of with the interpretation that the th receiver is to decode the message . Let denote the (possibly stochastic) encoding function and denote the (possibly stochastic) decoding function employed by the th receiver, for . Note that distributed encoding is allowed in this model. For example, if there are distributed encoders, each generates a codeword in for , we may set , , where , and disjointly distribute the messages to the encoders. Then the following two stronger versions of Fano’s inequality are some of the main results that we will present in section VII:

Strong maximum-error Fano’s inequality

If the encoder-decoder pairs have maximum errors

for all , then there exists such that

for all .

Strong average-error Fano’s inequality

If the encoder-decoder pairs have average errors for all , then there exist , a random index over an index set with at most elements for some , and satisfying such that and

for all , as long as is uniformly distributed.

Stronger and more thorough results (Theorem 19–Corollary 26) than the two strong Fano’s inequalities stated above will be developed and presented in section VII.

Application example

The strong converse for the general discrete memoryless wiretap channel (DM-WTC) is a heretofore open problem. The best known results were derived by Tan and Bloch [18] and independently by Hayashi et al. [19], and only pertain to the case where the wiretap channel is degraded. Such a scenario reduces the complexity by not requiring an auxiliary random variable to characterize the secrecy capacity. In particular, Tan and Bloch accomplish their result using an information spectrum approach, while Hayashi et al consider the question in regards to active hypothesis testing222It should be noted that their result simultaneously applied to both secret message transmission and secret key agreement, and allows for arbitrary leakage.. As a simple application example for our results, we employ the strong average-error Fano’s inequality to provide a strong converse for the general DM-WTC.

The DM-WTC consists of a sender , a legitimate receiver , and an eavesdropper . For any , a uniformly distributed message over the message set is to be sent reliably from to and discreetly against eavesdropping by . For any , consider the encoding function and the decoding function . For any and , a -code for the DM-WTC is any code which meets the following two requirements:

  • Reliability: ,

  • Leakage: .

Like [18], we impose the decaying leakage requirement of .

Apply the strong average-error Fano’s inequality above to the DM-WTC with the reliability requirement, we obtain an index set with for some , a random index over , , and such that and


for all . Let be a random index over defined by the conditional distribution . Since , we also have . From (2), we obtain


On the other hand, from the leakage requirement



Thus, combining (3) and (4) results in

Noting that , and following the steps of [20, Section 22.1.2], we may obtain

for some over with such that . This proves the strong converse for the general DM-WTC with decaying leakage.

Iv Partitioning Index and Entropy Spectrum Partition

In this section, we describe the notions of partitioning index, entropy spectrum partition (slicing) [21], and nearly uniform distribution. They provide the basic machinery that we will employ in later sections to develop source partitioning results. The entropy spectrum partition method that we use here is a slight variant within the class of information/entropy spectrum slicing methods developed in [21]. This class of methods find many different applications in information theory (see [21] for more detailed discussions).

While the definitions and results are stated for the sequence space , they clearly extend to other sequence spaces. When we say is distributed over , it is assumed with no loss of generality that for all . Otherwise we may just remove the zero-probability sequences from .

Definition 1.

Let and be an index set. Let and be jointly distributed random variables over and , respectively. For each , define . Then is called a partitioning index of with respect to (w.r.t.) if and for all . We may simply say partitions when the underlying distribution of over is clear from the context.

Lemma 2.

Consider any and partitioning indices w.r.t. over .

  1. Suppose that partitions . Then if and only if . In addition, for all such that , . Equivalently, where if .

  2. Suppose that partitions . Then for every non-empty , partitions w.r.t with . As is clear from the context, we may simply say also partitions .

  3. is a partitioning index of if and only if and are both partitioning indices of .

  4. Let and be partitioning indices of . Then . Thus we may write in place of or . Furthermore, for each such that , is a partitoning index of w.r.t. (or equivalently ) with . Hence if the latter event is non-empty.

  5. Let be a partitioning index of w.r.t. . Let be a collection of random indices such that is a partitioning index of w.r.t. , distributed over . Then is a partitioning index w.r.t. .

  1. First, it is obvious from the definition of that if and only if . Consider now for each such that . Then , again due to the very definition of . Hence implies . On the other hand,

    where the last equality is due to the fact that is a partitioning index of . Hence implies . Hence by setting if , with probability one.

  2. Note that , and hence . Thus partitions w.r.t. .

  3. First, suppose that is a partitioning index of . Clearly . Hence . In addition, for any , . Therefore is a paritioning index of . The same argument also applies to show that is a paritioning index of .

    On the other hand, suppose that both and are partitioning indices of . Clearly we have . For any , . But for every , since , and hence for some . This means that . Therefore is a partitioning index of .

  4. Suppose that and are partitioning indices of w.r.t. . Then by part 3) of the lemma, is also a partitioning index of w.r.t. . Moreover, by part 1) of the lemma, and are conditionally independent given . That is, we have , which implies . As above, we also clearly have . Therefore .

    Consider any fixed such that (i.e., ). By part 1) of the lemma, we have . Further, by part 2) of the lemma, we have partitions w.r.t. with . The final assertion then results directly from part 1).

  5. For convenience in notation, write , which distributes over . Note that . Hence . Clearly

    and for all . Therefore is a partitioning index of w.r.t. .

All parts of Lemma 2 will be used repetitively many times in the rest of the paper. To avoid prolixity, we will not explicitly refer to each use of the lemma.

Definition 3.

Let . Let be a distribution on , and be the corresponding entropy spectrum. For any , define and the -entropy spectrum partition of w.r.t. as , where

Clearly because of our convention that for all .

Suppose is the -entropy spectrum partition of w.r.t. , and if . The random variable is clearly a partitioning index of w.r.t. , and is conditionally independent of any other partitioning index of given .

Lemma 4.

Let be the -entropy spectrum partition of w.r.t. . Then for every , ,

In addition if , then


Trivially we have

For ,

and therefore . Similarly suppose that . Then

and therefore . Combining both results gives us . ∎

Lemma 5.

Let be the -entropy spectrum partition of w.r.t. . Then for each , , satisfying