DeepAI

# Exact Error and Erasure Exponents for the Asymmetric Broadcast Channel

We derive exact (ensemble-tight) error and erasure exponents for the asymmetric broadcast channel given a random superposition codebook. We consider Forney's optimal decoder for both messages and the message pair for the receiver that decodes both messages. We prove that the optimal decoder designed to decode the pair of messages achieves the optimal trade-off between the total and undetected exponents associated with the optimal decoder for the private message. We propose convex optimization-based procedures to evaluate the exponents efficiently. Numerical examples are presented to illustrate the results.

• 8 publications
• 1 publication
11/28/2017

### Expurgated Bounds for the Asymmetric Broadcast Channel

This work contains two main contributions concerning the expurgation of ...
12/26/2018

### Capacity Results for the K-User Broadcast Channel with Two Nested Multicast Messages

The K-user discrete memoryless (DM) broadcast channel (BC) with two nest...
03/11/2019

11/18/2019

### Universal superposition codes: capacity regions of compound quantum broadcast channel with confidential messages

We derive universal codes for transmission of broadcast and confidential...
08/28/2018

03/13/2019

### Learning Symmetric and Asymmetric Steganography via Adversarial Training

Steganography refers to the art of concealing secret messages within mul...
11/04/2020

### Context-based Broadcast Acknowledgement for Enhanced Reliability of Cooperative V2X Messages

Most V2X applications/services are supported by the continuous exchange ...

## I Introduction

### I-a Background and Related Works

The broadcast channel [2] has been extensively studied in multi-user information theory. Although the capacity region is still unknown, some special cases have been solved. One example is the broadcast channel with degraded message sets, also known as the asymmetric broadcast channel (ABC). For this channel, one receiver desires to decode both the private message and the common message while the other receiver desires to decode only .

The capacity region for the ABC was derived by Körner and Marton and is well known [3]. The earliest work on error exponents for the ABC is that by Körner and Sgarro [4], who used a constant composition ensemble for deriving an achievable error exponent. Later, Kaspi and Merhav [5] improved this work by deriving a tighter lower bound for the error exponent by analyzing the ensemble of i.i.d. random codes. Most recently, Averbuch et al. derived the exact random coding error exponents and expurgated exponents for the ensemble of constant composition codes in [6] and [7], respectively.

In this paper, we are interested decoders with an erasure option. In this setting, the decoders may, instead of declaring that a particular message or set of messages is sent, output an erasure symbol. For the discrete memoryless channel (DMC), Forney [8] found the optimal decoder and derived a lower bound the total and undetected error exponents using Gallager-style bounding techniques. Csiszár and Körner [9, Thm. 10.11] derived universally attainable erasure and error exponents using a generalization of the maximum mutual information (MMI) decoder. Telatar [10] also analyzed an erasure decoding rule with a general decoding metric. Moulin [11] generalized this family of decoders and proposed a new decoder parameterized by a weighting function. Recently, Merhav [12] derived lower bounds to these exponents by using a novel type-class enumerator method. In a breakthrough, Somekh-Baruch and Merhav [13] derived the exact random coding exponents for erasure decoding. Recently, Huleihel et al. [14] showed that the random coding exponent for erasure decoding is not universally achievable and established a simple relation between the total and undetected error exponents. Weinberger and Merhav [15] analyzed a simplified decoder for erasure decoding. Hayashi and Tan [16] derived ensemble-tight moderate deviations and second-order results for erasure decoding over additive DMCs. For the ABC, Tan [17] derived lower bounds on the total and undetected error exponents of an extended version of the universal decoder in Csiszár and Körner [9, Thm. 10.11]. Moreover, Merhav in [18] analyzed a random coding scheme with a binning (superposition coding) structure and showed that a potentially suboptimal bin index decoder achieves the random coding error exponent for decoding only the bin index.

### I-B Main Contributions

In this paper, we consider erasure decoding for the ABC with a superposition codebook structure. In this problem, for the decoder that aims to decode both messages, there are six exponents of interest: the total and undetected exponents corresponding to the individual messages , and the pair of messages . We show that the optimal decoder for the pair of messages achieves the optimal trade-off between the total and undetected exponents pertaining to the private message . Our main technical contribution is to handle statistical dependencies between codewords that share the same cloud center. Lemmas 6 and 7 provide two technical lemmas that are then used to establish the equality between the total random coding error exponents pertaining to the first message (i.e., the private message ) and the message pair, which partially ameliorates this problem.

Finally, we show that the minimizations required to evaluate these error exponents can be cast as convex optimization problems, and thus, can be solved efficiently. We also present numerical examples to illustrate these exponents and the tradeoffs involved in the erasure decoding problem for the ABC.

## Ii Problem Formulation

### Ii-a Notation

Throughout this paper, random variables (RVs) will be denoted by upper case letters, their specific values will be denoted by the respective lower case letters, and their alphabets will be denoted by calligraphic letters. A similar convention will apply to random vectors of dimension

and their realizations. For example, the random vector may take on a certain realization in , the -th order Cartesian power of , which is the alphabet of each component of this vector.

The distributions associated with random variables will be denoted by the letters or , with subscripts being the names of the random variables, e.g.,

stands for a joint distribution of a triple of random variables

on , the Cartesian product alphabets of , and . In accordance with these notations, the joint distribution induced by and will be denoted by . Information measures induced by the joint distribution (or for short) will be subscripted by . For example, denotes the mutual information of the random variables and with joint distribution .

For a sequence , let denote its empirical distribution or type. The type class of is the set of all whose empirical distribution is

. For a given conditional probability distribution

and sequence , denotes the conditional type class of (-shell) given , namely, the set of sequences whose joint empirical distribution with is given by .

The probability of an event will be denoted by , and the expectation operator with respect to a joint distribution , will be denoted by . For two positive sequences and , the notation means that and are of the same exponential order, i.e., . Similarly, means that . The indicator function of an event will be denoted by . The notation will stand for and notation stands for . Finally, logarithms and exponents will be understood to be taken to the natural base.

### Ii-B System Model

We consider a discrete memoryless ABC with a finite input alphabet , finite output alphabets and and a transition probability matrix . Let and be respectively the - and -marginals of .

Assume there is a random codebook with superposition structure for this ABC, where the message pair is destined for user and the common message is destined for user . In this paper, we consider i.i.d. random codes and constant composition random codes.

• For i.i.d. random codes, fix a distribution and randomly generate “cloud centers” according to the distribution

 P(un):=n∏i=1PU(ui). (1)

For each cloud center , randomly generate “satellite” codewords according to the conditional probability distribution

 P(xn|un):=n∏i=1PX|U(xi|ui). (2)
• For constant composition random codes, we fix a joint type and randomly and independently generate “cloud centers”

under the uniform distribution on the type class

. For each cloud center , randomly and independently generate “satellite” codewords under the uniform distribution on the conditional type class

The two decoders with erasure options are given by and where is the erasure symbol.

### Ii-C Definitions of Error Probabilities and Error Exponents

In this paper, we focus on six different error probabilities associated to terminal . We do not derive the total and undetected error probabilities at terminal since the analysis is completely analogous to the analysis of the error and erasure probabilities of the “cloud centers” at terminal by replacing with . Define the disjoint decoding regions according to the decoder as . Moreover, let and be the disjoint decoding regions associated to messages and respectively. For terminal , define for message and the message pair , the conditional total error and undetected error probabilities as

 etj(m1,m2):=WnY(Dcmj∣∣xn(m1,m2)) (3) euj(m1,m2):=WnY(⋃^mj∈Mj∖{mj}D^mj∣∣∣xn(m1,m2)) (4) etY(m1,m2):=WnY(Dcm1,m2∣∣xn(m1,m2)) (5) euY(m1,m2):=WnY(⋃(^m1,^m2)≠(m1,m2)D^m1^m2∣∣∣xn(m1,m2)). (6)

Then we may define the average total and undetected error probabilities at terminal as follows:

 etj:=1M1M2∑(m1,m2)∈M1×M2etj(m1,m2) (7) euj:=1M1M2∑(m1,m2)∈M1×M2euj(m1,m2) (8) etY:=1M1M2∑(m1,m2)∈M1×M2etY(m1,m2) (9) euY:=1M1M2∑(m1,m2)∈M1×M2euY(m1,m2). (10)

Using the Neyman-Pearson theorem, Forney [8] obtained the optimal trade-off between the average total and undetected error probabilities for discrete memoryless channels. By following his idea and using a similar argument, we can show that the optimal trade-off between the average total and undetected error probabilities for the ABC is attained by the following decoding regions111In the following, the threshold may take different values depending on whether we are decoding individual messages or the message pair.

 D∗mj :={yn:Pr(yn|Cj(mj))∑m′j≠mjPr(yn|Cj(m′j))≥enT}, (11) D∗m1m2 :={yn:WnY(yn|xn(m1,m2))∑(m′1,m′2)≠(m1,m2)WnY(yn|xn(m′1,m′2))≥enT}, (12)

where the distribution of the output conditioned on the subcodebook is

 Pr(yn|C1(m1)):=1M2∑m2∈M2WnY(yn|xn(m1,m2)) (13)

and similarly for .

We would like to find the exact error exponents , , and , with the erasure option, i.e.,  (we do not consider the list decoding mode, i.e., , in this paper). These are the exponents associated to the expectation of the error probabilities, where the expectation is taken with respect to the randomness of the codebook which possess the superposition structure as described in Section II-B. In other words,

 Et1(R1,R2,T):=limsupn→∞[−1nlnEC[et1]], (14)

and similarly for the other exponents , and . We show, in fact, that the in (14) is a limit. These exponents are also called random coding error exponents. If these exponents are known exactly, we say that ensemble-tight results are established.

## Iii Main Results and Discussions

The main result in this paper are stated below in Theorems 1 and 2, establishing exact random coding error exponents for the messages , , and the message pair at terminal , i.e., the random coding exponents corresponding to the probabilities in (7)–(10).

Before stating our results, we state a few additional definitions. For a given probability distribution on , rates and , and the fixed random coding distribution , define

 β(Q,R1) :=D(QX|U∥PX|U|QU)+IQ(X;Y|U)−R1 (15) γ(Q,R2) :=D(QU∥PU)+IQ(U;Y)−R2 (16) Φ(Q,R1,R2) :=∣∣γ(Q,R2)+|β(Q,R1)|+∣∣+ (17) Δ(Q,R1,R2) :=∣∣|−γ(Q,R2)|+−β(Q,R1)∣∣+. (18)

### Iii-a Main Results

###### Theorem 1.

For i.i.d. random codes, the error exponents , , and are given by222In the following analyses and derivations, for ease of notation, we sometimes drop the dependencies of the error exponents (including those in Theorem 2) on the parameters .

 Et1(R1,R2,T)=EtY(R1,R2,T)=min{Ψa(R1,R2,T),Ψb(R1,T)} (19) Eu1(R1,R2,T)=EuY(R1,R2,T)=Et1(R1,R2,T)+T (20)

where

 Ψa(R1,R2,T) :=min^QUXY[D(^QUXY∥PUXY)+minQUX|Y∈L1(^QXY,R1,R2,T)Φ(QUX|Y^QY,R1,R2)] (21) Ψb(R1,T) :=min^QUXY[D(^QUXY∥PUXY)+minQX|UY∈L2(^QUXY,R1,T)|β(QX|UY^QUY,R1)|+] (22)

with and the sets and are defined as

 L1(^QXY,R1,R2,T) :={QUX|Y:EQln1WY+E^QlnWY−T≤Δ(Q,R1,R2)} (23) L2(^QUXY,R1,T) (24)

where in (23) is equal to , in (24) is equal to , and the expectation can be explicitly written as .

For constant composition random codes, the corresponding error exponents , , and can be obtained by adding additional constraints to the optimization problems that define the i.i.d. random coding error exponents above. In particular, all joint distributions and that appear in (21)–(24) should satisfy the marginal constraint . For example, the corresponding exponent for constant composition random codes is given by

 ˜Ψa(R1,R2,T):=min^QUXY:^QUX=PUX[D(^QUXY∥PUXY)+minQUX|Y∈˜L1(^QXY,R1,R2,T)Φ(QUX|Y^QY,R1,R2)] (25)

and the set is defined as

 ˜L1(^QXY,R1,R2,T):={QUX|Y:QUX=PUX,EQln1WY+E^QlnWY−T≤Δ(Q,R1,R2)} (26)

where in (26) is equal to and in (26) is the -marginal distribution of .

The proof of Theorem 1 is provided in Section VI. It can be shown that there exists a sequence of (deterministic) codebooks which can simultaneously achieve these following exponents in Theorems 1 and 2 by using Markov’s inequality. (cf. [16, Proof of Theorem 1]).

###### Theorem 2.

For i.i.d. random codes, the error exponents and are given by

 Et2(R1,R2,T)=max{Ψa(R1,R2,T),Ψc(R1,R2,T)},andEu2(R1,R2,T)=Et2(R1,R2,T)+T, (27)

where

 Ψc(R1,R2,T):=min^QUXY[D(^QUXY∥PUXY)+minQUX|Y∈L3(^QUXY,R1,R2,T)Φ(QUX|Y^QY,R1,R2)] (28)

with

 L3(^QUXY,R1,R2,T):={QUX|Y:EQln1WY+s0(^QUY,R1)−T≤Δ(Q,R1,R2)} (29)

where in (29) is equal to , and

 s0(^QUY,R1):=−min~QX|UY:β(~Q,R1)≤0 [β(~Q,R1)−E~QlnWY] (30)

and where in (30) is equal to .

For constant composition random codes, the error exponents and are given by

 (31)

where

 ˜Ψc(R1,R2,T):=min^QUXY:^QUX=PUX[D(^QUXY∥PUXY)+minQUX|Y∈˜L3(^QUXY,R1,R2,T)Φ(QUX|Y^QY,R1,R2)] (32)

with

 (33)

where in (33) is equal to , in (33) is the -marginal distribution of and

 ~s0(^QUY,R1):=−min~QX|UY:β(~Q,R1)≤0,~QUX=PUX[β(~Q,R1)−E~QlnWY] (34)

and where in (34) is equal to and in (34) is the -marginal distribution of .

The proof of Theorem 2 is provided in Section VII.

### Iii-B Discussion of Main Results

A few remarks on the theorems above are in order.

Firstly, Eqn. (19) in Theorem 1 implies that the optimal decoder for the pair of messages (i.e., defined in (12)) achieves the optimal trade-off between the total and undetected error exponents pertaining to the private message . This observation is non-trivial and not immediately obvious. When wishes to decode only the private message , the optimal decoder for the pair of messages , called the joint decoder, declares the message of the decoded message pair is the final output. It is not clear that this decoding strategy is optimal error exponent-wise. The main difference between the error events for these two decoders is that the user can decode the correct private message but the wrong common message . This is an error event for the joint decoder (but not for the one that focuses only on ). However, Lemma 7 implies that on the exponential scale, the exponents of the two decoders are the same, i.e., there is no loss in optimality in using the joint decoder for decoding only message .

Secondly, one of our key technical contributions is Lemma 7 (to follow). This lemma allows us to simplify the calculation of the exponents by disentangling the statistical dependencies between “satellite codewords” that share the same cloud center. In particular, when we take into account the fact that the “cloud centers” (of which there are exponentially many) are random, this lemma allows us to decouple the dependence between two key random variables and which are on different sides of a fundamental error probability (see (61) and (93)). In contrast, for the analysis of the interference channel in [19] and [20], only an upper bound of the error probability is sought. This upper bound is not necessarily exponentially tight. On the other hand, the use of Lemma 7 incurs no loss in optimality on the exponential scale when appropriately combined with Lemma 6.

Thirdly, in an elegant work in [18], Merhav showed that for ordinary channel coding, independent random selection of codewords within a given type class together with suboptimal bin index decoding (which is based on ordinary maximum likelihood decoding), performs as well as optimal bin index decoding in terms of the error exponent achieved. Furthermore, Merhav showed that for constant composition random codes with superposition coding and optimal decoding, the conclusion above no longer holds in general. In this paper, we show that for i.i.d. and constant composition random codes with superposition coding and erasure decoding, the conclusion holds for the case of decoding the “satellite” codewords. That is the (in general) suboptimal decoding of the “satellite” codewords achieves same random coding error exponent as the optimal decoding of the “satellite” codewords (see Theorem 1).

Fourthly, in Theorem 1, the total error exponent for the private message is the minimum of two exponents and . The first exponent intuitively means that the user is in a regime where it decodes the pair of messages . Loosely speaking, the second exponent means that user knows the true common message (given by a genie), then decodes the “satellite” codeword . In contrast to the single-user DMC case, now every codeword is generated according to a conditional probability distribution . Thus all codewords are conditioned on a particular sequence rather than being generated according to a marginal distribution . This is also reflected in the expression of the inner optimization in (22) which is averaged over the random variable (see definition of in (15)).

Finally, for the case in which user wishes to decode the common message , the intuition gleaned from Theorem 2 is that user has two options, and uses the better depending on the rates. Either can decode the true transmitted codeword to identify (this corresponds to the exponent ) or the entire sub-codebook for the common message to identify (this corresponds to ). This explains the maximization in the first expression in (27). When is large, the term in (28) of Theorem 2 implies that is more likely than not to decode the “cloud center” according to the “test channel” . This corresponds to the second decoding strategy, i.e., decoding the entire sub-codebook indexed by . Also see Remark 1 to follow.

## Iv Evaluating the Exponents via Convex Optimization

In this section, we first consider i.i.d. random codes. To evaluate in Theorem 1, we need to devise an efficient numerical procedure to solve the minimization problems and . As will be shown below, these problems can be solved efficiently even though they are not convex.

For the second term in (22), we can split the feasible region of the inner minimization, i.e., (see (24)), into two closed sets, namely and , where

 B1(^QUY,R1) :={QX|UY:β(QX|UY^QUY,R1)≥0} (35) B2(^QUY,R1) :={QX|UY:β(QX|UY^QUY,R1)≤0}. (36)

We denote the corresponding minimization problems pertaining to in (22) (and (24)) in which the function is inactive or active as and , respectively, i.e.,

 Ψb1 :=min^QUXY[D(^QUXY∥PUXY)+minQX|UY∈L21(^QUXY)β(QX|UY^QUY,R1)] (37) Ψb2 :=min^QUXY:L22(^QUXY)≠∅D(^QUXY∥PUXY), (38)

where the sets and are defined as

 L21(^QUXY) :={QX|UY:EQln1WY+E^QlnWY−T≤0,β(Q,R1)≥0} (39) L22(^QUXY) :={QX|UY:EQln1WY+E^QlnWY−T+β(Q,R1)≤0,β(Q,R1)≤0}, (40)

and where in (39) and (40) is equal to . We thus have .

As the minimization problem is convex, it can be solved efficiently. However is non-convex due to the non-convex constraint in the inner optimization.333In this section, we drop the dependences of and on the rates and For the inner optimization, if we remove this constraint in , the modified problem is

 Ψ′b1:=min^QUXY[D(^QUXY∥PUXY)+minQX|UY∈L′21(^QUXY)β(QX|UY^QUY)], (41)

where

 (42)

is convex and can be solved efficiently. Furthermore, we have the following proposition.

###### Proposition 3.

For the optimization problem , if the optimal solution to the inner optimization of the modified problem is not feasible for the original problem , i.e., , then there exists an optimal solution to the original inner optimization problem that satisfies . Moreover, in this case, the optimal value of is equal that for (i.e., is active in the minimum that defines ).

###### Proof:

See Appendix A. ∎

In summary, we can solve the non-convex optimization problem by solving two convex problems and , i.e.,

 Ψb=min{(Ψ′b1)+,Ψb2}, (43)

where the superscript “” of means the value of is active in the minimization if the optimal solution is also feasible for the original optimization , i.e., . In other words,

 Ψb={min{Ψ′b1,Ψb2}β(Q∗X|UY^Q∗UY)≥0Ψb2else. (44)

Consequently, can be solved efficiently.

For in (21), let

 Ω(QUX|Y^QY):=EQUX|Y^QYln1WY+E^QlnWY−T, (45)

then similarly, we can partition the feasible region of the inner minimization into four parts and denote the corresponding inner optimization problems as follows:

1. If and , then

 Φ∗a1:=minQγ(Q)+β(Q),% such thatΩ(Q)≤0. (46)
2. If and , then

 Φ∗a2:=minQγ(Q)such thatΩ(Q)+β(Q)≤0. (47)
3. If and , then

 Φ∗a3:=minQγ(Q)+β(Q),% such thatΩ(Q)≤0. (48)
4. If and , then

 Φ∗a4:=0such thatΩ(Q)+γ(Q)+β(Q)≤0. (49)

where in the above definitions is equal to (compare the above to the definition of the optimization problem in (21)). Thus we have,

 Ψa=min^QUXY[D(^QUXY∥PUXY)+mini∈[4]{Φ∗ai(^QXY)}]. (50)

We can rewrite the objective functions of and as follows

 minQUX|Yγ(QUX|Y^Q