I Introduction
(a) The wiretap channel with transition probability
, where is the channel input and and are the channel outputs observed by the legitimate receiver and the eavesdropper, respectively; (b) The GelfandPinsker channel with state distribution , and channel transition probability , where is the input and is the output.Two fundamental, yet seemingly unrelated, informationtheoretic models are the wiretap channel (WTC) and the statedependent pointtopoint channel with noncausal encoder channel state information (CSI). The discrete and memoryless (DM) WTC (Fig. 1(a)) was introduced by Wyner in 1975 [1] and initiated the study of physical layer security. Csiszár and Körner characterized the secrecycapacity of the WTC as
(1) 
where is the WTC’s transition matrix and the underlying distribution is . The statedependent channel with noncausal encoder CSI is due to Gelfand and Pinsker (GP) [2]; we henceforth referred to it as the GP channel (GPC). The capacity of a GPC with state distribution is:
(2) 
where the joint distribution is
. An interesting question is whether the resemblance of (1) and (2) is coincidental or is there an inherent relation between these problems.This paper shows that an inherent relation is indeed the case, by proposing a rigorous framework that links the WTC and the GPC, establishing these two problems as analogous to one another. Specifically, we prove that any good (reliable and secure) sequence of codes for the WTC induces a good (reliable) sequence of codes of the same rate for a corresponding GPC. This observation enables exploiting known upper bounds on the GPC capacity to upper bound the secrecycapacity of an analogous WTC. While the solutions to the base cases from Fig. 1 have been known for decades, many multiuser extensions of these models remain open problems. Through the analogy we derive converse proofs for several multiuser wiretap settings that were not available until now, thus establishing several new secrecycapacity results.
To this end we extend the wiretapGP analogy to multiuser broadcasting scenarios. Given a wiretap broadcast channel (WTBC) (Fig. 2(a)), with two legitimate receivers observing and and one eavesdropper that intercepts , an analogous GP broadcast channel (GPBC), shown in Fig. 2(b), is constructed by:

Converting the eavesdropper’s observation to an independently and identically distributed (i.i.d.) state sequence with some appropriate distribution;

Noncausally revealing the state sequence to the encoder;

Setting (the conditional marginal of the WTBC’s transition probability, with in the role of the state) as the GPBC’s transition kernel.
The aforementioned relation between good sequences of codes for analogous WTBCs and GPBCs remains valid. This allows capitalizing on known GPBC capacity results to derive converse bounds for their analogous WTBC.
are the channel outputs observed by the legitimate receivers and the eavesdropper, respectively; (b) An analogous GPBC is constructed by replacing the eavesdropper’s observation with a state random variable
, revealing in a noncausal manner to the encoder and setting the statedependent BC’s transition kernel to , i.e., the conditional marginal distribution of the WTBC’s law .The GPBC has been widely studied in the literature and the capacity region is known for various cases. Notably, the semideterministic (SD) GPBC was solved in [3], the physically degraded (PD) GPBC with an informed receiver (i.e., when the stronger receiver also observes the state sequence) was treated in [4], and the cooperative version of this PDGPBC, where the receivers are connected via a unidirectional noiseless link, was the focus of [5]. The corresponding (SD and PD) WTBCs also received considerable attention in the literature [6, 7, 8, 9]; however, solutions are known only for some special cases. To the best of our knowledge, the widest framework of DMWTBCs for which tight secrecycapacity results are available is due to [9], where the regions for the SDWTBC and the PDWTBC were derived under a further assumption that the stochastic receiver is less noisy than the eavesdropper. The coding scheme therein does not rely on this lessnoisy property; the converse proofs, however, does.
GPBCs typically lend themselves to easier converse proofs. This is due to the particular statistical relations between the random variables in these problems. Specifically, the i.i.d. nature of the state sequence and its independence of the messages are used multiple times in the converse proofs from [3, 4, 5]. The WTBC, on the other hand, does not share these properties. When is the channel output sequence observed by the eavesdropper, it is generally not i.i.d., nor is it independent of the message pair. Consequently, one cannot simply repeat the steps from the converse proof of the analogous GPBC when upper bounding the transmission rates achievable over a WTBC. The lessnoisy assumption imposed on the WTBCs studied in [9] circumvents this difficulty. Since no corresponding assumption was imposed while deriving the above GPBC results, our analogybased proof method characterizes the SD and the PDWTBC secrecycapacity regions without assuming this ordering between the subchannels. As a natural extension to the analogy for the base case (WTCs versus GPCs), the obtained secrecycapacity regions are described by the same rate bounds as their GPBC counterparts.
An important ingredient in proving the analogy is to adopt the definition of WTC achievability from, e.g., [10, 8, 11]
, that merges the reliability and security requirements into a single demand on the joint distribution induced by a wiretap code. Specifically, we require that a good sequence of wiretap codes induces a sequence of joint distributions (on the message, its estimate and the eavesdropper’s observation) that is asymptotically indistinguishable in total variation from a target measure under which:

The message and its estimate are almost surely equal (a reliability requirement);

The eavesdropper’s observation is independent of the message and is distributed according to some product measure, say (a security requirement).
Denoting by the joint distribution of , and induced by a wiretap code , the above requirements mean that for large block lengths
(3) 
where the approximation is in total variation.
With this notion of achievability, we show that such a sequence of wiretap codes induces a sequence of reliable codes for the analogous GPC. The GP encoder and decoder(s) are distilled from the joint distribution induced by the wiretap code by inverting it. Under this inversion, the asymptotic i.i.d. distribution of the eavesdropper’s observation becomes the state distribution in the corresponding GPC. The asymptotic independence of and the message(s) in the WTC’s target distribution corresponds to the independence of the message(s) and the state in a GP coding scenario. The performance metric described above strongly related to the more standard notion of achievability used in [12], where performance of a wiretap code was measured via the error probability and the effective secrecy metric. We show that under mild conditions (namely, a superlinear decay of the involved quantities), our definition of achievability and the one from [12] are equivalent.
Connecting seemingly unrelated informationtheoretic problems dates back to Shannon’s landmark 1959 paper [13], where he observed the similarity between channel transmission and lossy source compression. Broadly speaking, Shannon noticed that two lossy source and channel coding problems that are obtainable from one another by interchanging the roles of their encoder(s) and decoder(s) have dual solutions. Channel and source duality was widely studied ever since, with multiple extensions to multiuser scenarios (a partial list of references is [14, 15, 16, 17, 18, 19, 20, 21, 22]); numerous examples were found that support Shannon’s observation. However, a formal definition of this duality remains elusive and, in the absence of a definition, a proof is out of reach. Since there is no mechanism of formal inference in place, channelsource duality is mainly used as a tool to produce educated guesses of the solution of one problem, provided the solution of the other.
An alternative and proven form of duality was established between the Gaussian BC and multipleaccess channels (MACs) [23]. It was shown that if the channels have corresponding gain and noise parameters, then the capacityregion of one problem can be expressed in terms of the region of the other. This duality was extended to multipleantenna setups in the followup work [24] and was used in [24, 25] to find the sum capacity of the multipleantenna BC. However, the validity of this relation to broader (not necessarily Gaussian) frameworks remains an open question. Analogical relations between different problems were also observed before. In particular, an analogy between Kelly Gambling [26] and work extraction in statistical physical was the focus of [27]. However, to the best of our knowledge, the analogy paradigm established in this work is the first proven relation between wiretap and GP coding scenarios. Like the aforementioned proven duality and analogy relations, it constitutes a powerful research tool that enables inference of results from one problem to another.
The remainder of this paper is organized as follows. Section II provides notation, basic definitions and properties. In Section III, we define the WTBC and the GPBC, and discuss the definition of achievability used in this work. Section IV explains the analogy between WTCs and GPCs, and illustrates how to use it to prove the converse of the WTC secrecycapacity theorem based on the GPC result. The same section also extends the analogy to multiuser broadcasting setups. In Section V we state the WTBC secrecycapacity results derived via the analogy, and cite the past results for the corresponding GPBCs. Proofs are provided in Section VI, while Section VII summarizes the main achievements and insights of this work.
Ii Notations and Preliminaries
We use the following notations. As is customary, is the set of natural numbers (which does not include 0), while denotes the reals. We further define and . Given two real numbers , we denote by the set of integers . Calligraphic letters denote sets, e.g., , the complement of is denoted by , while stands for its cardinality. denotes the fold Cartesian product of . An element of is denoted by ; whenever the dimension
is clear from the context, vectors (or sequences) are denoted by boldface letters, e.g.,
. For any , we use to denote the substring of defined by , with respect to the natural ordering of . For instance, for , we have .Let be a probability space, where is the sample space, is the algebra and is the probability measure. Random variables over are denoted by uppercase letters, e.g., , with conventions for random vectors similar to those for deterministic sequences. The probability of an event is denoted by , while denotes the conditional probability of given . We use to denote the indicator function of . The set of all probability mass functions (PMFs) on a finite set is denoted by , i.e.,
(4) 
PMFs are denoted by the lowercase letters, such as or , with a subscript that identifies the random variable and its possible conditioning. For example, for a discrete probability space and two random variables and over that space, we use , and to denote, respectively, the marginal PMF of , the joint PMF of and the conditional PMF of given . In particular,
represents a row stochastic matrix whose elements are given by
. Expressions such as are to be understood as , for all . Accordingly, when three random variables , and satisfy, they form a Markov chain, which we denote by
. We omit subscripts if the arguments of a PMF are lowercase versions of the random variables.For a discrete measurable space , a PMF gives rise to a probability measure on , which we denote by ; accordingly, for every . We use to denote an expectation taken with respect to . Similarly, we use and to indicate that an entropy or a mutual information term are calculated with respect to the PMF . For a sequence of random variable , if the entries of are drawn in an i.i.d. manner according to , then for every we have and we write . Similarly, if for every we have , then we write . The conditional product PMF given a specific sequence is denoted by .
The empirical PMF of a sequence is
(5) 
where . We use to denote the set of lettertypical sequences of length with respect to the PMF and the nonnegative number [28, Chapter 3], i.e., we have
(6) 
For a countable sample space and , the relative entropy between and is
(7) 
and the total variation between them is
(8) 
Total variation is a distance between probability measures that satisfies the following properties (see, e.g., [29, Property 1]).
Lemma 1 (Properties of Total Variation)
Let be a countable sample space and . We have:

Triangle inequality: .

Total Variation Bound on Difference of Expectations: If is a bounded function with , then .

Joint / Marginal / Conditional Distributions and Total Variation: If , and , where and , then

;

.

Relative entropy dominates total variation through Pinsker’s inequality [30, Theorem 4.1], which states that for any
(9) 
Pinsker’s inequality shows that convergence to zero of relative entropy implies the same for total variation. While no reverse Pinsker’s inequality is known in general, a reverse asymptotic relation is sometimes valid (see [31, Remark 1]).
Lemma 2 (Asymptotic Relations between Total Variation and Relative Entropy)
Let be a finite set and let be a sequence of distributions with . Let and assume for every . Then^{1}^{1}1 means that there exists such that , for any sufficiently large .
(10) 
In particular, (10) implies that an exponential decay of the TV in produces an (almost, up to a term) exponential decay of the relative entropy with the same exponent.
Iii Problem Setup
This work establishes an analogy between two fundamental informationtheoretic models: the WTC and the GPC. Via the analogy, one may derive upper bounds on the fundamental limit of (reliable and secure) communication of a given WTC based on existing bounds for a corresponding GPC. We use the analogy to derive the secrecycapacity region of the SDWTBC, which was an open problem until this work. Namely, we show that the inner bound from [8, Theorem 3] is tight by providing an analogybased converse proof the leverages the SDGPBC capacity result from [3, Theorem 1]. Similarly, the secrecycapacity region of a certain class of PDWTBCs (without and with cooperative receivers) is established based on the corresponding PDGPBC converse proofs from [4, 5].
For simplicity of presentation, we first explain the analogy between the basic models of the WTC and the GPC, and the use it to derive a converse proof for the WTC. These ideas naturally extend to WTBC and GPBC scenarios and give rise to our new secrecycapacity region characterization. We start by defining the problem setting of the WTBC and the GPBC, which is the content of this section.
Iiia Wiretap Broadcast Channels
Let and be finite sets and let
be a transition probability distribution from
to . The DMWTBC is illustrated in Fig. 3. The sender chooses a pair of messages uniformly at random from product set and maps it onto a sequence (the mapping may be random). The sequence is transmitted over the DMWTBC with transition probability . The output sequences , and are observed by Receiver 1, Receiver 2 and the eavesdropper, respectively. Based on , , Receiver produces an estimate of . The eavesdropper tries to glean whatever it can about the transmitted messages from .Definition 1 (Classes of WTBCs)
Consider a WTBC :

The channel is called SD if its channel transition distribution factors as , where and .

The channel is called PD if its channel transition distribution factors as , where and .

The channel is said to have an informed receiver if the eavesdropper’s output sequence is available to Receiver 1. Formally, replacing the output random variable with reduces the WTBC to have an informed receiver. With a slight abuse of notation, we refer to this setup as the PDWTBC with an informed receiver.
We proceed with some definitions for general WTBCs and will specialize to the particular instances described above when needed.
Definition 2 (WTBC Code)
An code for the WTBC with a product message set , where for we set , is a triple of functions such that

is a stochastic encoder;

is the decoding function for Receiver , for .
For any code , the induced joint distribution on is:
(11) 
Our analogy relies on developing a unified perspective on wiretap and GPCs. We arrive at the desired unification by defining achievability in a manner that is slightly different from typical definitions. Common definitions require the existence of a sequence of codes that achieves reliability (i.e., a vanishing error probability) and security (e.g., a vanishing information leakage). These are two separate requirements that do not project on one another. Namely, a code for the WTBC can be reliable but not secure, secure but not reliable, neither or both.
We take a different approach and merge the reliability and security requirements into a single requirement on the induced distribution from (11) phrased in terms of total variation. Such definitions have been more frequently used in recent years (see, e.g., [11, 8]) and seem to originate from [10]. Section IIIB expounds upon the definition of achievability used herein and shows it is closely related to the, sotospeak, classic definition.
Definition 3 (WTBC Achievability)
A rate pair is called achievable if there exists a , a probability distribution and a sequence of codes such that for any sufficiently large
(12) 
where is a marginal of the induced joint distribution from (11) and
denotes the uniform distribution over
.Remark 1 (Rate of Convergence)
The exponential rate of convergence in (12) is not necessary. Any superlinear convergence rate is sufficient for the purposes of this work.
Definition 4 (WTBC SecrecyCapacity Region)
The secrecycapacity region of a DMWTBC with transition probability is the convex closure of the set of all achievable rate pairs.
IiiB Discussing the Definition of Achievability
We discuss the definition of achievability used in this work (Definition 3), interpret Equation (12) and compare our definition to more frequently used notions of achievability.
Equation (12) means that a good sequence of codes induces a joint distribution (see (11)) whose marginal is asymptotically indistinguishable from the target measure . The following are several things to note:

, which means that the total variation in (12) is an averaging of the total variation distances between the corresponding conditional distributions given each message pair .

Under the target measure, almost surely. This corresponds to reliable decoding by both receivers.

Under the target measure, the conditional distribution of given is the product measure . Thus, as grows larger, the induced distribution of approaches the product distribution . This asymptotic independence corresponds to securing the transmitted messages from the eavesdropper. Furthermore, since the marginal distribution of approximates a product measure (namely, ), (12) not only implies security, but also stealth [12]: on top of securing the transmitted data from the eavesdropper, a good sequence of codes makes it impossible for the eavesdropper to determine whether communication occurred at all (or was the channel fed with random noise i.i.d. according to some with , for all ).
While similar definitions of achievability were used in [10, 8, 11], a more standard definition was employed in [12]. In that work, reliability was defined through a vanishing probability of error requirement, while security was measured by the effective secrecy metric. Adapting the definitions from [12] to the WTBC considered here gives rise to the following notion of achievability (to differentiate from Definition 3, we refer to this as EFachievability, where ‘EF’ stands for ‘effective secrecy’).
Definition 5 (EFAchievability)
A rate pair is called EFachievable if there exists a probability distribution and a sequence of codes such that
(13a)  
(13b) 
Convergence to zero of the effective secrecy metric in (13b) guarantees both strong secrecy and stealth. Strong secrecy refers to a vanishing mutual information between the confidential message and the eavesdropper’s observation, i.e., as . Stealth is quantified by means of a vanishing relative entropy between the induced distribution of , , and a product distribution , which represents the distribution that the eavesdropper expects to observe when the source is not communicating useful information. Namely, we say that the protocol achieves stealth if as . Noting that the relative entropy from (13b) factors as
(14) 
we see that (13b) implies both strong secrecy and stealth.
There is a close relation between the two notions of achievability from Definitions 3 and 5. This is formalized in the following proposition.
The proof of Proposition 1 is relegated to Appendix A. The proposition says that an achievable rate pair (with respect to Definition 3) is always EFachievable (with respect to Definition 5). Conversely, given an EFachievable rate pair with a sequence of codes that produces an exponential decay of the error probability from (13a) and the relative entropy from (13b), we have that is also achievable. Consequently, the two definitions are equivalent, provided that the aforementioned quantities converge to 0 exponentially fast with .
We note that the exponential convergence in (15) and (16) is a rather standard feature. The direct proofs of our main secrecycapacity theorems establish the exponential decay of the total variation from (15). Even more so, most i.i.d. random coding ensembles produce an exponential decay of the expected error probability and effective secrecy metric (over the ensemble). This implies the existence of a specific sequence of codes satisfying (16). More specifically, consider a random coding ensemble , and let and be the error probability and the effective secrecy metric, respectively, associated with a blockcode of length (the exact communication scenario is of no consequence here). Assume that there exists such that for sufficiently large we have
(17a)  
(17b) 
By Markov’s inequality, one can extract a specific sequence of codes with . For every sufficiently large , the union bound and Markov’s inequality give
(18) 
Consequently, for every large enough there exists an length blockcode (an outcome of ) with an exponentially decaying and .
IiiC GelfandPinsker Broadcast Channels
We derive singleletter characterizations of the secrecycapacity region of some WTBCs based on the results for analogous GPBCs. The latter communication scenario considers the reliable transmission of a pair of messages over a statedependent BC in which the transmitter has noncausal access to the i.i.d. sequence of channel states. The capacity regions of the SDGPBC and the PDGPBC with an informed receiver were found in [3, Theorem 1] and [4, Theorem 3], respectively. These results play a key role in the converse proofs of Theorems 1 and 4. We next formally define the GPBC setup.
Let and be finite sets, and . The DMGPBC is shown in Fig. 4. A state sequence is sampled from the fold product measure and can be represented as an outcome of the random sequence . The sender observes in a noncausal manner and chooses a pair of messages uniformly at random from the product set . The choice of the message pair is independent of . The triple is mapped by the transmitter onto a sequence of channel inputs (while being not necessary to achieve capacity, we allow the mapping to be random). The sequence is transmitted over a DM statedependent BC with transition probability . The output sequences and are observed by Receiver 1 and Receiver 2, respectively. Based on , , Receiver produces an estimate of .
Definition 6 (Classes of GPBCs)
Consider a GPBC :

The channel is called SD if its channel transition distribution factors as , where and .

The channel is called PD if its channel transition distribution factors as , where and .

The channel is said to have an informed receiver if the state sequence is available to Receiver 1. Formally, replacing the output random variable with reduces the GPBC to have an informed receiver. With a slight abuse of notation, we refer to this setup as the PDGPBC with an informed receiver.
As in Section IIIA, we define codes, achievability and capacity for arbitrary (not necessarily SD) DMGPBCs, and will specialize to the above instances when necessary.
Definition 7 (GPBC Code)
An code for the GPBC with a product message set , where for , is a triple of functions such that

is a stochastic encoder;

is the decoding function for Receiver , for .
An code induces a joint distribution over that is given by
(19) 
To enable the use of previous capacity region characterizations, we adhere to the (standard) definition of achievability from [4, 3, 5]. Thus, we define the error probability associated with an code as the probability of the event under the measure induced by the PMF from (19). Namely, denoting the probability of error by , we have
(20) 
Definition 8 (GPBC Achievability)
A rate pair is called achievable if there exists a sequence of codes such that .
Definition 9 (GPBC Capacity Region)
The capacity region of a DMGPBC with state distribution and transition probability is the convex closure of the set of all achievable rate pairs.
Iv Wiretap and GelfandPinsker Analogy
The ma
Comments
There are no comments yet.