# Time-Division Transmission is Optimal for Covert Communication over Broadcast Channels

We consider a covert communication scenario where a legitimate transmitter wishes to communicate simultaneously to two legitimate receivers while ensuring that the communication is not detected by an adversary, also called the warden. The legitimate receivers and the adversary observe the transmission from the legitimate transmitter via a three-user discrete or Gaussian memoryless broadcast channel. We focus on the case where the "no-input" symbol is not redundant, i.e., the output distribution at the warden induced by the no-input symbol is not a mixture of the output distributions induced by other input symbols, so that the covert communication is governed by the square root law, i.e., at most Θ(√(n)) bits can be transmitted over n channel uses. We show that for such a setting of covert communication over broadcast channels, a simple time-division strategy achieves the optimal throughputs. Our result implies that a code that uses two separate optimal point-to-point codes each designed for the constituent channels and each used for a fraction of the time is optimal in the sense that it achieves the best constants of the √(n)-scaling for the throughputs. Our proof strategy combines several elements in the network information theory literature, including concave envelope representations of the capacity regions of broadcast channels and El Gamal's outer bound for more capable broadcast channels.

## Authors

• 81 publications
• 6 publications
08/28/2018

02/07/2022

### Sequential Channel Synthesis

The channel synthesis problem has been widely investigated over the last...
05/01/2021

### Two-terminal Erasure Source-Broadcast with Feedback

We study the effects of introducing a feedback channel in the two-receiv...
03/10/2020

### Treating Interference as Noise is Optimal for Covert Communication over Interference Channels

We study the covert communication over K-user discrete memoryless interf...
05/01/2021

### Three-terminal Erasure Source-Broadcast with Feedback

We study the effects of introducing a feedback channel in the erasure so...
12/03/2020

### State Masking Over a Two-State Compound Channel

We consider fundamental limits for communicating over a compound channel...
07/27/2020

### Covert Identification over Binary-Input Memoryless Channels

This paper considers the covert identification problem in which a sender...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

There has been a recent surge of research interest in reliable communications in the presence of an adversary, or a warden, who must be kept incognizant of the presence of communication between the transmitters and receivers. The lack of communication is modelled in discrete channels by sending a specially-designed “no-input” symbol (where , a finite set, is the input alphabet of the discrete channel); in Gaussian channels, it is also modelled as sending . This line of research, known synonymously as covert communications, communication with low probability of detection (LPD) [1, 2, 3], deniability [4, 5], or undetectable communication [6], seeks to establish fundamental limits on the throughputs to communicate to the legitimate receiver(s) while ensuring that the signals observed by the warden are statistically close to the signals if communication were not present. It was shown by Bash et al. [1] that in the point-to-point setting, if the legitimate user’s channel and the adversary’s channel are perfectly known, the number of bits that can be reliably and covertly transmitted over channel uses scales at most as for additive white Gaussian noise (AWGN) channels. This is colloquially known as the “square root law”. For discrete memoryless channels, the covert communication is also governed by the square root law if the no-input symbol is not redundant, i.e., the output distribution at the warden induced by the no-input symbol is not a mixture of the output distributions induced by other input symbols. Recently, the optimal pre-constant in the term has also been established in a couple of elegant papers by Bloch [7] and Wang, Wornell and Zheng [8].

In this paper, we are interested in extending the above model and results to a multi-user (or network) scenario [9] in which there is one transmitter, two legitimate receivers and, as usual, one warden. See Fig. 1. We are interested in communicating reliably and simultaneously to the two receivers over the same medium while ensuring that the warden remains incognizant of the presence of any communication. We call our model a two-user discrete memoryless broadcast channel (BC) with a warden. This communication model mimics the scenario of a military general delivering commands to her/his multiple subordinates while, at the same time, ensuring that the probability of the communication being detected by a furtive enemy, the warden, is vanishingly small. Note that the enemy is not interested in the precise commands per se but on whether or not communication between the general and her/his subordinates is actually happening in order to pre-empt a possible attack. We establish the fundamental performance limits for communicating in this scenario when the no-input symbol is not redundant. Somewhat surprisingly, we show that the most basic multi-user communication scheme of time-division [9, Sec. 5.2] is optimal for a wide class of BCs. This implies that a code designed for such BCs that uses two separate point-to-point codes, each designed for the constituent channels and each used for a fraction of the time (blocklength) is optimal in the sense that it achieves the best constants of the -scaling of the throughput.

### I-a Related Work

Covert communication is related with steganography in the sense that both aim to hide information against a warden. In many instances of steganography, the square root law has been observed [10, 11, 12, 13]. In steganography, a cover text is given to the encoder and a message is embedded into the cover text when the communication is active. The message-embedded text is called the stegotext and the warden should not be able to distinguish the stegotext from the cover text. In particular, a setup closely related to our problem is the scenario in which the stegotext is generated through a memoryless channel as in [11, 12]. It is interesting to observe that for steganography, a source sequence (cover text) is given to the encoder and the encoder controls the channel (regarding the cover text as the input and the stegotext as the output), while in covert communications, the encoder generates a source sequence (codeword) for each message and then the source sequence is sent through a given communication channel. Since the given condition and the control function are swapped, the two classes of problems require different analyses.

This paper focuses on the information-theoretic aspects of covert communications for which there has been a flurry of recent work. In particular, refined asymptotics on the fundamental limits of covert communications over memoryless channels from the second-order [14] and error exponent [15] perspectives have been studied. In addition, the fundamental limits of covert communications for channels with random state known at the transmitter [16] and classical-quantum channels [17] have also been established. In contrast, work on multi-user extensions of covert communications is relatively sparse. Of note are the works by Arumugam and Bloch [18, 19]. In [18], the authors derived the fundamental limits of covert communications over a multiple-access channel (MAC). The authors showed that if the MAC to the legitimate receiver is “better”, in a precise sense, than the one to the adversary, then the legitimate users can reliably communicate on the order of bits per channel uses with arbitrarily LPD without using a secret key. The authors also quantified the pre-constant terms exactly. In [19], the authors considered a BC communication model. However, note that the model in [19] is significantly different from that in the present work. In [19], the authors were interested in transmitting two messages, one common and another covert, over a BC where there are two receivers, one legitimate and the other the warden. The common message is to be communicated to both parties, while the covert message has LPD from the perspective of the warden. This models the scenario of embedding covert messages in an innocuous codebook and generalizes existing works on covert communications in which innocent behavior corresponds to lack of communication between the legitimate receivers. In our work, there are three receivers, two legitimate and the other denotes the warden. Both messages that are communicated can be considered as covert message from the perspective of the warden. See Fig. 1 for our model.

### I-B Main Contribution

Our main contribution is to establish the covert capacity region (the set of all achievable pre-constants of the throughputs which scale as ) of some two-user memoryless BCs with a warden when the no-input symbol is not redundant. While the (usual) capacity region of the two-user discrete memoryless BC is a long-standing open problem in network information theory, we show that under the covert communications constraint, the covert capacity region admits a particularly simple expression for a class of BCs we explicitly identify (see Condition 1). This region implies that time-division transmission is optimal for this class of two-user BCs. We emphasize the BC does not have to be degraded, less noisy or more capable [9, Chapter 5]. Our main result is somewhat analogous to that of Lapidoth, Telatar and Urbanke [20] who showed that time-division is optimal for wide-band broadcast communication over Gaussian, Poisson, “very noisy” channels, and average-power limited fading channels. However, the analysis in [20] is restricted to stochastically degraded BCs.

The first analytical tool that is used to prove our main theorem is a converse bound derived by El Gamal [21] initially designed for more capable BCs. However, it turns out to be a useful starting point for our setting. We manipulate this bound into a form that is reminiscent of an outer bound for degraded BCs. Subsequently, we express the -sum throughput of this outer bound, for appropriately chosen , in terms of upper concave envelopes [22]

. This circumvents the need to identify the optimal auxiliary random variable for specific BCs. The identification of optimal auxiliaries is typically possible only if the BC possesses some special structure. For example for the binary symmetric BC, Mrs. Gerber’s lemma

[23] is a key ingredient in simplifying the converse bound. Similarly for the Gaussian BC, a highly non-trivial result known as the entropy power inequality [24, 25] is needed to obtain an explicit expression for the capacity region. Note that both these channels are degraded. Our main results applies to some classes two-user discrete memoryless BCs (that do not subsume degraded BCs nor is subsumed by degraded BCs). We employ concave envelopes and take into consideration that under the covertness constraint, the weight of the input (codeword) is necessarily vanishingly small [8, 7]. Augmenting some basic analytical arguments (e.g., [7, Lemma 1]) to these existing techniques allows us to simplify the outer bound to conclude that the time-division inner bound [9, Sec. 5.2] is optimal for some BCs.

### I-C Intuition Behind The Main Result

We now provide intuition as to why time-division transmission is optimal for some BCs with covertness constraints. Since symmetric channels satisfy our condition (point 3 in the discussion following Condition 1), we use the binary symmetric BC [9, Example 5.3], degraded in favor of (say), as an example to illustrate this. Let where for , i.e., the channel from to is a binary symmetric channel (BSC). It is well known that in the absence of the covertness constraint, superposition coding [9, Sec. 5.3] [26] is optimal, and time-division transmission is strictly suboptimal (unless the transmitter communicates with only a single receiver). The coding scheme is illustrated in Fig. 2 where are generated independently, denotes the cloud center and denotes the satellite codeword. Because in covert communications, is required to have low (Hamming) weight [7, 8], either both and have low weight or both have high weight. Let us assume it is the former without loss of generality. Then, the set of locations of the ’s in the satellite codeword is likely to be a disjoint union of the sets of ’s in and . Thus the relative weight of , denoted as , can be decomposed into the relative weights of and , denoted as and respectively for some . Thus, the superposition coding inner bound for the degraded BC [9, Theorem 5.2] [26] with reads

 R1 ≤I(X;Y1|U2)=I(U1;U1⊕Ψ1)≈ραnL∗1, (1) R2 ≤I(U2;Y2)=I(U2;U2⊕~Ψ2)≈(1−ρ)αnL∗2, (2)

where and are the “covert capacities” [8, 7] of the constituent point-to-point binary symmetric channels (see Theorem 1) and is a Bernoulli random variable with parameter due to the low weight constraint on (its relative weight is ). Thus, (1) and (2) suggest that time-division is optimal, at least for symmetric BCs but we show the same is true for a larger class of BCs.

### I-D Paper Outline

This paper is structured as follows. In Section II, we formulate the problem and define relevant quantities of interest precisely. In Section III, we state our assumptions and main results. We also provide some qualitative interpretations. The main result, corollaries, and bounds on the required length of the secret key are proved in Sections IVV, and VI respectively. We conclude our discussion in Section VII where we also suggest avenues for future research.

### I-E Notation

We adopt standard information-theoretic notation, following the text by El Gamal and Kim [9] and on occasion, the book by Csiszár and Körner [27]. We use to denote the binary entropy function. and are the relative entropy and the total variation distance respectively. We use and to denote the mutual information of . Throughout is taken to an arbitrary base. We also use to denote the differential entropy. The notation denotes the binary convolution operator. For two discrete distributions, and defined on the same alphabet , we say that is absolutely continuous with respect to , denoted as if for all , implies that . Other notation will be introduced as needed in the sequel.

## Ii Problem Formulation

A discrete memoryless111We omit the qualifier “discrete memoryless” for brevity in the sequel. We will mostly discuss discrete memoryless BCs in this paper. However, in Corollary 2, we present results for the BC with a warden when the constituent channels are Gaussian. two-user BC with a warden consists of a channel input alphabet , three channel output alphabets , , and , and a transition matrix . The output alphabets and correspond to the two legitimate receivers and corresponds to that of the warden. Without loss of generality, we let . We let be the “no input” symbol that is sent when no communication takes place and define for each . The BC is used times in a memoryless manner. If no communication takes place, the warden at receiver observes , which is distributed according to , the -fold product distribution of . If communication occurs, the warden observes , the output distribution induced by the code. For convenience, in the sequel, we often denote the two marginal channels corresponding to the two legitimate receivers as and respectively.

The transmitter and the receiver are assumed to share a secret key uniformly distributed over a set . We will, for the most part, assume that the key is sufficiently long, i.e., the set is sufficiently large. However, we will bound the length of the key in Section III-E. The transmitter and the receiver aim to construct a code that is both reliable and covert. Let the messages to be sent be and . These messages are assumed to be independent and also independent of . Also let their reconstructions at the receiver be for . As usual, a code is said to be reliable if the probability of error vanishes as . The code is covert if it is difficult for the warden to determine whether the transmitter is sending a message (hypothesis ) or not (hypothesis ). Let and denote the probabilities of false alarm (accepting when the transmitter is not sending a message) and missed detection (accepting when the transmitter is sending a message), respectively. Note that a blind test (one with no side information) satisfies . The warden’s optimal hypothesis test satisfies

 π1|0+π0|1 =1−V(^QZn,Q×n0) (3) ≥1−√D(^QZn∥Q×n0), (4)

where (4) follows by Pinsker’s inequality; see [28, 7]. Hence, covertness is guaranteed if the relative entropy between the observed distribution and the product of no communication distribution is bounded by a small .

Note that if for some , such should not be transmitted, otherwise it is not possible for to vanish [8]. Hence, by dropping all such input symbols as well as all output symbols not included in , we assume throughout that . In addition, we assume that the no-input symbol is not redundant i.e., where denotes the convex hull. If the symbol is redundant, there exists a sequence of codes for which for all  [8] so (i.e., the warden’s test is always blind) and transmitting at positive rates is possible; this is a regime we do not consider in this paper.

A code for the BC with a covertness constraint and the covert capacity region are defined formally as follows.

###### Definition 1.

An -code for the BC with a warden and with a covertness constraint consists of

• Two message sets for ;

• Two independent messages uniformly distributed over their respective message sets, i.e., for ;

• One secret key set ;

• One encoder ;

• Two decoders for ;

such that the following constraints hold:

 Pr(∪2j=1{^Wj≠Wj}) ≤ε, (5) D(^QZn∥Q×n0) ≤δ. (6)

For most of our discussion, we ignore the secret key set (i.e., we assume that the secret key is sufficiently long) for the sake of simplicity and refer to the family of codes above with secret key sets of arbitrary sizes as -codes. We will revisit the effect of the key size in Section III-E.

###### Definition 2.

We say that the pair is -achievable for the BC with a warden and with a covertness constraint if there exists a sequence of -codes such that

 liminfn→∞1√nδlogMjn ≥Lj,j∈{1,2}, (7) limsupn→∞εn ≤ε. (8)

Define the -covert capacity region to be the closure of all -achievable pairs of . We are interested in the -covert capacity region

 Lδ:=⋂ε∈(0,1)Lε,δ=limε→0Lε,δ. (9)

Note that and are measured in bits (or information units) per square root channel use.

We will also need the notion of covert capacities for point-to-point discrete memoryless channels (DMCs) with a warden  [8, 7]. This scenario corresponds to the above definitions with and . Recall that the chi-squared distance between two distributions and supported on the same alphabet is defined as

 χ2(Q1∥Q0) :=∑z∈Z(Q1(z)−Q0(z))2Q0(z). (10)
###### Theorem 1 (Bloch [7] and Wang, Wornell, Zheng [8]).

Let be a DMC with a warden in which and for . We assume that it satisfies for all , and is not redundant. Then its covert capacity is

 (11)

where the maximization extends over all length-

probability vectors

(i.e., where and ). If , i.e., has a binary input, then the maximization over in (11) is unnecessary and

 L∗(PY,Z|X):=√2D(W(⋅|1)∥W(⋅|0))2χ2(Q1∥Q0). (12)

As previously mentioned, we assume that in the binary input case (or more generally, where is any maximizer of (11)). Otherwise, covert communication is impossible [7, Appendix G]. With this assumption and the fact that is not redundant, as defined in (11) and (12) is finite.

## Iii Main Results

In this section, we present our main results. In Section III-A, we state a condition on BCs, which we call Condition 1. In Section III-B, we state our main result assuming the BC satisfies Condition 1 and the key is of sufficiently long length. We interpret our result in Section III-C by placing it in context via several remarks. In Section III-D, we specialize our result to two degraded BCs and show that standard techniques apply for such models. In Section III-E, we extend our main result to the case where the key size is also a parameter of interest.

### Iii-a A Condition on BCs

In the following, we consider the following condition on BCs that allows us to show that time-division is optimal.

###### Condition 1.

Fix a BC with a warden . Let the covert capacities of and be and respectively.

• If , we assume that

 maxPXI(X;Y1)I(X;Y2)≤L∗1L∗2 (13)
• Otherwise if , we assume that

 maxPXI(X;Y2)I(X;Y1)≤L∗2L∗1 (14)

A few remarks concerning Condition 1 are in order.

1. [leftmargin=*]

2. Condition 1, which is easy to check numerically as the optimizations over are over compact sets, neither subsumes nor is subsumed by degradedness or any other ordering of and . That is, we can show that there exists some degraded BCs that do not satisfy Condition 1 and there are also non-degraded BCs that satisfy Condition 1.

3. Condition 1 is significantly simplified in the binary-input case. Let , , and where

is the Bernoulli distribution with probability of

being , i.e.,

 Pγ(x):={1−γx=0γx=1. (15)

Similarly define . Then, one can use (12) and Lemma 1 (to follow) to show that (13) is equivalent to

 D(W1∥W0)D(V1∥V0)≤minγ∈[0,1]D(Wγ∥W0)D(Vγ∥V0). (16)

Thus the verification of Condition 1 for binary-input BCs reduces to a line search over .

4. To illustrate this condition, we consider the scenario in which222We use the notation if it is a binary-input, binary-output channel in which where . where and is a (generally) asymmetric binary-input, binary-output channel such that the transition matrix reads

 V=[1−q0q0q11−q1], (17)

for . In Fig. 3, we show the range of values of such that Condition 1 is satisfied (indicated in gray). Also we note that the “diagonal” values of in which satisfy Condition 1. Hence, if is a BSC, Condition 1 is satisfied. More generally, we can verify numerically that if and are both BSCs, then Condition 1, or equivalently (16) for the case , is satisfied.

### Iii-B Time-Division is Optimal for Some BCs

Our main result is a complete characterization of the -covert capacity region for all BCs satisfying Condition 1 and certain absolute continuity conditions.

###### Theorem 2.

Assume that a BC with a warden is such that Condition 1 is satisfied and the constituent DMCs and satisfy and for all . Also assume that the length of the secret key is sufficiently large. Then, for all , the -covert capacity region is

 Lδ={(L1,L2)∈R2+:L1L∗1+L2L∗2≤1}. (18)

Theorem 2 is proved in Section IV.

### Iii-C Remarks on the Main Theorem

A few remarks are in order.

1. [leftmargin=*]

2. First, note that (18) implies that under the covert communication constraints, time-division transmission is optimal for all BCs satisfying Condition 1. The achievability part simply involves two optimal covert communication codes, one for each DMC with a warden. The first code, designed for , is employed over channel uses where . The second code, designed for , is employed over the remaining channel uses. However, because the normalization of is , we need a slightly more subtle time-division argument. To do so, fix , then from Theorem 1, we know that there exists codes transmitting

 logM1n≅√ρnδ′L∗1 (19)

bits for user over channel uses and with covertness constraint (upper bound of the divergence in (6)) and

 logM2n≅√(1−ρ)n(δ−δ′)L∗2 (20)

bits for user over channel uses and with covertness constraint . Choosing and combining these two codes, achieves the covertness constraint and the rate point which is on the boundary of . By varying , we achieve the whole boundary and hence the entire region in (18).

Note that time-division is strictly suboptimal for the vast majority of BCs in the absence of the covert communication constraint. Indeed, one needs to perform superposition coding [26] to achieve all points in the capacity region for degraded, less noisy and more capable BCs. Thus, the covert communication constraint significantly simplifies the optimal coding scheme for BCs satisfying Condition 1.

3. The converse of Theorem 2 thus constitutes the main contribution of this paper. To obtain an explicit outer bound for the capacity region for BCs that satisfy some ordering—such as degraded, less noisy or more capable BCs [9, Chapter 5]—one often has to resort to the identification of the optimal auxiliary random variable-channel input pair in the capacity region of these classes of BCs. However, this is only possible for specific BCs; see Corollaries 1 and 2 to follow. For general (or even arbitrary degraded) BCs, this is, in general, not possible. Our workaround involves first starting with an outer bound of the capacity region for general memoryless BCs by El Gamal [21]. We combine the inequalities in the outer bounds and use this to upper bound a linear combination of the two throughputs in terms of the concave envelope of a linear combination of mutual information terms [22]. This allows us to circumvent the need to explicitly characterize since is no longer present in this concave envelope characterization. We then exploit Condition 1 and some approximations to obtain the desired the outer bound to (18). The appeal of this approach is not only that we do not need to find the optimal .

4. Let us say that the absolute continuity condition holds for if for all . Then Bloch [7, Appendix G] showed that if does not satisfy the absolute continuity condition, bits per channel uses can be covertly transmitted. In our setting, the same is true for the constituent channels; if satisfies the absolute continuity condition but does not, bits per channel uses can be transmitted to , while bits per channel uses can be transmitted to .

5. Finally, suppose that the covertness condition in (6) is replaced by a total variation constraint of the form

 V(^QZn,Q×n0)≤δ,for someδ∈(0,1). (21)

Then, by using the techniques to prove [29, Theorem 2], one can easily see that the covert capacity region is the same as (18) in Theorem 2 except that in the formulation, (7) is replaced by

 liminfn→∞1√nΓδlogMjn≥Lj,j∈{1,2}, (22)

where and

is the complementary cumulative distribution function of a standard Gaussian. Hence, only the normalization (or scaling) is different. Since the justification of this is completely analogous to that of (the first-order result of)

[29, Theorem 2], we omit the proof for the sake of brevity.

We now consider two specific classes of degraded BCs and show that modifications of standard techniques are applicable in establishing the outer bound to .

###### Corollary 1.

Suppose that and with (without loss of generality) [9, Example 5.3]. Then is as in (18) with

 L∗j=(loge)(1−2pj)log1−pjpj⋅√2χ2(Q1∥Q0),j∈{1,2}. (23)

An converse proof of this result follows from Mrs. Gerber’s lemma [23] and is presented in Section V-A. This result generalizes [8, Example 3].

We now consider the scenario in which the BC consists of three additive white Gaussian noise (AWGN) channels [9, Sec. 5.5], i.e.,

 Yj =X+Ψj,j∈{1,2} (24) Z =X+Ψ3, (25)

where , and

are independent zero-mean Gaussian noises with variances

and respectively. There is no (peak, average, or long-term) power constraint on the codewords [8, Sec. V]. Let the “no communication” input symbol be , so by (25), is distributed as a zero-mean Gaussian with variance .

###### Corollary 2.

Suppose that and be AWGN channels as in (24) with noise variances satisfying (without loss of generality). Then is as in (18) with

 L∗j=σ2logeNj,j∈{1,2}. (26)

The converse proof follows from the entropy power inequality [24, 25] and is presented in Section V-B. This result generalizes [7, Theorem 6] and [8, Theorem 5]. It is also analogous to [20, Prop. 1] in which it was shown that time-division is optimal for Gaussian BCs in the low-power limit.

We note that Corollaries 1 and 2 apply to an arbitrary but finite number of successively degraded legitimate receivers [9, Sec. 5.7], say . This means that . The corresponding -covert capacity region is

 Lδ={(L1,…,LN)∈RN+:N∑j=1LjL∗j≤1}, (27)

where is given as in (23) or (26). We omit the proofs as they are straightforward generalizations of the corollaries.

### Iii-E On the Length of the Secret Key

In the preceding derivations, we have assumed that the secret key that the transmitter and legitimate receivers share is arbitrarily long. In other words, the set is sufficiently large. In this section, we derive fundamental limits on the length of the key so that covert communication remains successful.

To formalize this, we will need to augment Definition 2. We say that is -achievable or simply achievable if in addition to (7) and (8) (with ),

 limsupn→∞1√nδlogKn≤Lkey. (28)

Finally, we set . The generalization of Theorem 2 is as follows.

###### Theorem 3.

Under the conditions of Theorem 2, the tuple is achievable if and only if

 L1L∗1+L2L∗2 ≤1 (29)

and

 Lkey≥(L1L∗1+L2L∗2)L∗Z−L1−L2. (30)

Note that if the throughputs of the code are such that we operate on the boundary of the covert capacity region (i.e., that (29) holds with equality), the optimum (minimum) key length

 L∗key=L∗Z−L1−L2. (31)

Also note that if is sufficiently large, then (30) is satisfied so based on (29), Theorem 3 reverts to Theorem 2. The proof of this enhanced theorem follows largely along the same lines as that for Theorem 2. However, we need to carefully bound the length of the secret key. The additional arguments to complete the proof of Theorem 3 are provided in Section VI.

## Iv Proof of Theorem 2

### Iv-a Preliminaries for the Proof of Theorem 2

Before we commence, we recap some basic notions in convex analysis. The (upper) concave envelope of , denoted as , is the smallest concave function lying above . If is a subset of , then by Carathéodory’s theorem [22],

 g=C[f]⟺g(x)=sup{(xi,pi)}d+1i=1d+1∑i=1pif(xi), (32)

where and

is a probability distribution such that

. Here, we record a basic fact:

 f2≥f1onD⟹g2≥g1onD (33)

where . This can be shown by means of the representation of the concave envelope in (32). Indeed,

 g2(x) ≥d+1∑i=1pif2(xi)≥d+1∑i=1pif1(xi) (34)

where the first inequality holds for any such that and the second inequality because on . Since the inequality holds for all such that , we can take the supremum of the right-hand-side of (34) over all such to conclude that on .

### Iv-B Converse Proof of Theorem 2: Binary-Input BCs

We first prove the converse to Theorem 2 for the case when . This is done for the sake of clarity and simplicity. We subsequently show how to extend the analysis to the multiple symbol case (i.e., ) in Section IV-C. Fix a sequence of -codes for the BC with a warden satisfying the -reliability constraint in (5) and (8) and the covertness constraint in (6). In the proof, we use the following result by Bloch [7, Lemma 1, Remark 1]:

###### Lemma 1.

Let for where is defined in (15). Then, it follows that

 I(Pγ,PZ|X)=γD(Q1∥Q0)−D(Qγ∥Q0). (35)

Furthermore, for any sequence such that as , for all sufficiently large,

 γ2n2χ2(Q1∥Q0)(1−√γn)≤D(Qγn∥Q0) ≤γ2n2χ2(Q1∥Q0)(1+√γn). (36)

#### Iv-B1 Covertness Constraint

We first discuss the covertness constraint in (6). Let (resp. ) have distribution (resp. ) and let (resp. ) be the marginal of (resp. ) on the -th element. Additionally, let be the average output distribution on , i.e., . Similarly we define the average input distribution on as . Then mimicking the steps in the proof of [8, Theorem 1], we have

 D(^QZn∥Q×n0) ≥nD(¯Qn∥Q0). (37)

Thus, by the covertness constraint in (6) and (37), we have

 D(¯Qn∥Q0)≤δn. (38)

Since and symbol is not redundant,333Indeed, if were redundant (e.g., for binary ), there exists an input distribution such that and its corresponding output distribution  [8, Eqn. (5)]. The distribution (taking the role of ) satisfies so (38) is trivially satisfied. However, (taking the role of ) clearly does not satisfy (39). it follows that

 ¯Pn=Pαn for some αn→0. (39)

Because for all , by Lemma 1,

 α2n2χ2(Q1∥Q0)(1−√αn)≤D(¯Qn∥Q0) ≤α2n2χ2(Q1∥Q0)(1+√αn). (40)

From (38) and (40), we conclude that has to satisfy the weight constraint

 α2n(1−√αn)≤2δχ2(Q1∥Q0)n=:¯α2n. (41)

From this relation, we see that .

#### Iv-B2 Upper Bound on Linear Combination of Code Sizes

We now proceed to consider upper bounds on the code sizes subject to the reliability and covertness constraints. Without loss of generality, we assume . We start with a lemma that is a direct consequence of the converse proof for more capable BCs by El Gamal [21]. This lemma is stated in a slightly different manner in [9, Theorem 8.5].

###### Lemma 2.

Every -code for any BC satisfies

 (logM1n)(1−εn)−1 ≤n∑i=1I(U1i;Y1i), (42) (logM2n)(1−εn)−1 ≤n∑i=1I(U2i;Y2i), (43)
 (logM1n+logM2n)(1−εn)−2 ≤n∑i=1[I(Xi;Y1i|U2i)+I(U2i;Y2i)], (44) (logM1n+logM2n)(1−εn)−2 ≤n∑i=1[I(U1i;Y1i)+I(Xi;Y2i|U1i)], (45)

where and satisfies . In addition, if a secret key is available to the encoders and decoder, then the auxiliary random variables and also include .

For completeness, the proof of Lemma 2, with the effect of the secret key, is provided in Appendix A. Note that no assumption (e.g., degradedness, less noisy or more capable conditions) is made on the BC in Lemma 2.

Now fix a constant . By adding copies of (43) to one copy of (44) and writing for all , we obtain

 (logM1n+λlogM2n)(1−εn)−(1+λ) ≤n∑i=1[I(Xi;Y1i|U