I Introduction
Ia Background and Related Works
The broadcast channel [2] has been extensively studied in multiuser information theory. Although the capacity region is still unknown, some special cases have been solved. One example is the broadcast channel with degraded message sets, also known as the asymmetric broadcast channel (ABC). For this channel, one receiver desires to decode both the private message and the common message while the other receiver desires to decode only .
The capacity region for the ABC was derived by Körner and Marton and is well known [3]. The earliest work on error exponents for the ABC is that by Körner and Sgarro [4], who used a constant composition ensemble for deriving an achievable error exponent. Later, Kaspi and Merhav [5] improved this work by deriving a tighter lower bound for the error exponent by analyzing the ensemble of i.i.d. random codes. Most recently, Averbuch et al. derived the exact random coding error exponents and expurgated exponents for the ensemble of constant composition codes in [6] and [7], respectively.
In this paper, we are interested decoders with an erasure option. In this setting, the decoders may, instead of declaring that a particular message or set of messages is sent, output an erasure symbol. For the discrete memoryless channel (DMC), Forney [8] found the optimal decoder and derived a lower bound the total and undetected error exponents using Gallagerstyle bounding techniques. Csiszár and Körner [9, Thm. 10.11] derived universally attainable erasure and error exponents using a generalization of the maximum mutual information (MMI) decoder. Telatar [10] also analyzed an erasure decoding rule with a general decoding metric. Moulin [11] generalized this family of decoders and proposed a new decoder parameterized by a weighting function. Recently, Merhav [12] derived lower bounds to these exponents by using a novel typeclass enumerator method. In a breakthrough, SomekhBaruch and Merhav [13] derived the exact random coding exponents for erasure decoding. Recently, Huleihel et al. [14] showed that the random coding exponent for erasure decoding is not universally achievable and established a simple relation between the total and undetected error exponents. Weinberger and Merhav [15] analyzed a simplified decoder for erasure decoding. Hayashi and Tan [16] derived ensembletight moderate deviations and secondorder results for erasure decoding over additive DMCs. For the ABC, Tan [17] derived lower bounds on the total and undetected error exponents of an extended version of the universal decoder in Csiszár and Körner [9, Thm. 10.11]. Moreover, Merhav in [18] analyzed a random coding scheme with a binning (superposition coding) structure and showed that a potentially suboptimal bin index decoder achieves the random coding error exponent for decoding only the bin index.
IB Main Contributions
In this paper, we consider erasure decoding for the ABC with a superposition codebook structure. In this problem, for the decoder that aims to decode both messages, there are six exponents of interest: the total and undetected exponents corresponding to the individual messages , and the pair of messages . We show that the optimal decoder for the pair of messages achieves the optimal tradeoff between the total and undetected exponents pertaining to the private message . Our main technical contribution is to handle statistical dependencies between codewords that share the same cloud center. Lemmas 6 and 7 provide two technical lemmas that are then used to establish the equality between the total random coding error exponents pertaining to the first message (i.e., the private message ) and the message pair, which partially ameliorates this problem.
Finally, we show that the minimizations required to evaluate these error exponents can be cast as convex optimization problems, and thus, can be solved efficiently. We also present numerical examples to illustrate these exponents and the tradeoffs involved in the erasure decoding problem for the ABC.
Ii Problem Formulation
Iia Notation
Throughout this paper, random variables (RVs) will be denoted by upper case letters, their specific values will be denoted by the respective lower case letters, and their alphabets will be denoted by calligraphic letters. A similar convention will apply to random vectors of dimension
and their realizations. For example, the random vector may take on a certain realization in , the th order Cartesian power of , which is the alphabet of each component of this vector.The distributions associated with random variables will be denoted by the letters or , with subscripts being the names of the random variables, e.g.,
stands for a joint distribution of a triple of random variables
on , the Cartesian product alphabets of , and . In accordance with these notations, the joint distribution induced by and will be denoted by . Information measures induced by the joint distribution (or for short) will be subscripted by . For example, denotes the mutual information of the random variables and with joint distribution .For a sequence , let denote its empirical distribution or type. The type class of is the set of all whose empirical distribution is
. For a given conditional probability distribution
and sequence , denotes the conditional type class of (shell) given , namely, the set of sequences whose joint empirical distribution with is given by .The probability of an event will be denoted by , and the expectation operator with respect to a joint distribution , will be denoted by . For two positive sequences and , the notation means that and are of the same exponential order, i.e., . Similarly, means that . The indicator function of an event will be denoted by . The notation will stand for and notation stands for . Finally, logarithms and exponents will be understood to be taken to the natural base.
IiB System Model
We consider a discrete memoryless ABC with a finite input alphabet , finite output alphabets and and a transition probability matrix . Let and be respectively the  and marginals of .
Assume there is a random codebook with superposition structure for this ABC, where the message pair is destined for user and the common message is destined for user . In this paper, we consider i.i.d. random codes and constant composition random codes.

For i.i.d. random codes, fix a distribution and randomly generate “cloud centers” according to the distribution
(1) For each cloud center , randomly generate “satellite” codewords according to the conditional probability distribution
(2) 
For constant composition random codes, we fix a joint type and randomly and independently generate “cloud centers”
under the uniform distribution on the type class
. For each cloud center , randomly and independently generate “satellite” codewords under the uniform distribution on the conditional type class
The two decoders with erasure options are given by and where is the erasure symbol.
IiC Definitions of Error Probabilities and Error Exponents
In this paper, we focus on six different error probabilities associated to terminal . We do not derive the total and undetected error probabilities at terminal since the analysis is completely analogous to the analysis of the error and erasure probabilities of the “cloud centers” at terminal by replacing with . Define the disjoint decoding regions according to the decoder as . Moreover, let and be the disjoint decoding regions associated to messages and respectively. For terminal , define for message and the message pair , the conditional total error and undetected error probabilities as
(3)  
(4)  
(5)  
(6) 
Then we may define the average total and undetected error probabilities at terminal as follows:
(7)  
(8)  
(9)  
(10) 
Using the NeymanPearson theorem, Forney [8] obtained the optimal tradeoff between the average total and undetected error probabilities for discrete memoryless channels. By following his idea and using a similar argument, we can show that the optimal tradeoff between the average total and undetected error probabilities for the ABC is attained by the following decoding regions^{1}^{1}1In the following, the threshold may take different values depending on whether we are decoding individual messages or the message pair.
(11)  
(12) 
where the distribution of the output conditioned on the subcodebook is
(13) 
and similarly for .
We would like to find the exact error exponents , , and , with the erasure option, i.e., (we do not consider the list decoding mode, i.e., , in this paper). These are the exponents associated to the expectation of the error probabilities, where the expectation is taken with respect to the randomness of the codebook which possess the superposition structure as described in Section IIB. In other words,
(14) 
and similarly for the other exponents , and . We show, in fact, that the in (14) is a limit. These exponents are also called random coding error exponents. If these exponents are known exactly, we say that ensembletight results are established.
Iii Main Results and Discussions
The main result in this paper are stated below in Theorems 1 and 2, establishing exact random coding error exponents for the messages , , and the message pair at terminal , i.e., the random coding exponents corresponding to the probabilities in (7)–(10).
Before stating our results, we state a few additional definitions. For a given probability distribution on , rates and , and the fixed random coding distribution , define
(15)  
(16)  
(17)  
(18) 
Iiia Main Results
Theorem 1.
For i.i.d. random codes, the error exponents , , and are given by^{2}^{2}2In the following analyses and derivations, for ease of notation, we sometimes drop the dependencies of the error exponents (including those in Theorem 2) on the parameters .
(19)  
(20) 
where
(21)  
(22) 
with and the sets and are defined as
(23)  
(24) 
where in (23) is equal to , in (24) is equal to , and the expectation can be explicitly written as .
For constant composition random codes, the corresponding error exponents , , and can be obtained by adding additional constraints to the optimization problems that define the i.i.d. random coding error exponents above. In particular, all joint distributions and that appear in (21)–(24) should satisfy the marginal constraint . For example, the corresponding exponent for constant composition random codes is given by
(25) 
and the set is defined as
(26) 
where in (26) is equal to and in (26) is the marginal distribution of .
The proof of Theorem 1 is provided in Section VI. It can be shown that there exists a sequence of (deterministic) codebooks which can simultaneously achieve these following exponents in Theorems 1 and 2 by using Markov’s inequality. (cf. [16, Proof of Theorem 1]).
IiiB Discussion of Main Results
A few remarks on the theorems above are in order.
Firstly, Eqn. (19) in Theorem 1 implies that the optimal decoder for the pair of messages (i.e., defined in (12)) achieves the optimal tradeoff between the total and undetected error exponents pertaining to the private message . This observation is nontrivial and not immediately obvious. When wishes to decode only the private message , the optimal decoder for the pair of messages , called the joint decoder, declares the message of the decoded message pair is the final output. It is not clear that this decoding strategy is optimal error exponentwise. The main difference between the error events for these two decoders is that the user can decode the correct private message but the wrong common message . This is an error event for the joint decoder (but not for the one that focuses only on ). However, Lemma 7 implies that on the exponential scale, the exponents of the two decoders are the same, i.e., there is no loss in optimality in using the joint decoder for decoding only message .
Secondly, one of our key technical contributions is Lemma 7 (to follow). This lemma allows us to simplify the calculation of the exponents by disentangling the statistical dependencies between “satellite codewords” that share the same cloud center. In particular, when we take into account the fact that the “cloud centers” (of which there are exponentially many) are random, this lemma allows us to decouple the dependence between two key random variables and which are on different sides of a fundamental error probability (see (61) and (93)). In contrast, for the analysis of the interference channel in [19] and [20], only an upper bound of the error probability is sought. This upper bound is not necessarily exponentially tight. On the other hand, the use of Lemma 7 incurs no loss in optimality on the exponential scale when appropriately combined with Lemma 6.
Thirdly, in an elegant work in [18], Merhav showed that for ordinary channel coding, independent random selection of codewords within a given type class together with suboptimal bin index decoding (which is based on ordinary maximum likelihood decoding), performs as well as optimal bin index decoding in terms of the error exponent achieved. Furthermore, Merhav showed that for constant composition random codes with superposition coding and optimal decoding, the conclusion above no longer holds in general. In this paper, we show that for i.i.d. and constant composition random codes with superposition coding and erasure decoding, the conclusion holds for the case of decoding the “satellite” codewords. That is the (in general) suboptimal decoding of the “satellite” codewords achieves same random coding error exponent as the optimal decoding of the “satellite” codewords (see Theorem 1).
Fourthly, in Theorem 1, the total error exponent for the private message is the minimum of two exponents and . The first exponent intuitively means that the user is in a regime where it decodes the pair of messages . Loosely speaking, the second exponent means that user knows the true common message (given by a genie), then decodes the “satellite” codeword . In contrast to the singleuser DMC case, now every codeword is generated according to a conditional probability distribution . Thus all codewords are conditioned on a particular sequence rather than being generated according to a marginal distribution . This is also reflected in the expression of the inner optimization in (22) which is averaged over the random variable (see definition of in (15)).
Finally, for the case in which user wishes to decode the common message , the intuition gleaned from Theorem 2 is that user has two options, and uses the better depending on the rates. Either can decode the true transmitted codeword to identify (this corresponds to the exponent ) or the entire subcodebook for the common message to identify (this corresponds to ). This explains the maximization in the first expression in (27). When is large, the term in (28) of Theorem 2 implies that is more likely than not to decode the “cloud center” according to the “test channel” . This corresponds to the second decoding strategy, i.e., decoding the entire subcodebook indexed by . Also see Remark 1 to follow.
Iv Evaluating the Exponents via Convex Optimization
In this section, we first consider i.i.d. random codes. To evaluate in Theorem 1, we need to devise an efficient numerical procedure to solve the minimization problems and . As will be shown below, these problems can be solved efficiently even though they are not convex.
For the second term in (22), we can split the feasible region of the inner minimization, i.e., (see (24)), into two closed sets, namely and , where
(35)  
(36) 
We denote the corresponding minimization problems pertaining to in (22) (and (24)) in which the function is inactive or active as and , respectively, i.e.,
(37)  
(38) 
where the sets and are defined as
(39)  
(40) 
As the minimization problem is convex, it can be solved efficiently. However is nonconvex due to the nonconvex constraint in the inner optimization.^{3}^{3}3In this section, we drop the dependences of and on the rates and For the inner optimization, if we remove this constraint in , the modified problem is
(41) 
where
(42) 
is convex and can be solved efficiently. Furthermore, we have the following proposition.
Proposition 3.
For the optimization problem , if the optimal solution to the inner optimization of the modified problem is not feasible for the original problem , i.e., , then there exists an optimal solution to the original inner optimization problem that satisfies . Moreover, in this case, the optimal value of is equal that for (i.e., is active in the minimum that defines ).
Proof:
See Appendix A. ∎
In summary, we can solve the nonconvex optimization problem by solving two convex problems and , i.e.,
(43) 
where the superscript “” of means the value of is active in the minimization if the optimal solution is also feasible for the original optimization , i.e., . In other words,
(44) 
Consequently, can be solved efficiently.
For in (21), let
(45) 
then similarly, we can partition the feasible region of the inner minimization into four parts and denote the corresponding inner optimization problems as follows:

If and , then
(46) 
If and , then
(47) 
If and , then
(48) 
If and , then
(49)
where in the above definitions is equal to (compare the above to the definition of the optimization problem in (21)). Thus we have,
(50) 
We can rewrite the objective functions of and as follows