Covert communication 
, also known as low-probability-of-detection (LPD) communication, has emerged as a new security paradigm in wireless systems. Different from conventional physical-layer security methods, covert communication aims to hide the very existence of legitimate transmission from an adversary while maintaining a certain covert rate at the intended user. It therefore achieves a stronger security and privacy level, which is highly desired in the emerging 5G/IoT systems and advanced military networks [3, 4].
Consider a classic covert communication setup, where Alice wishes to send message to Bob over a wireless channel, while ensuring that the probability of the transmission being detected by a warden Willie is small (i.e., a covertness constraint at Willie). For an additive white Gaussian noise (AWGN) channel, Bash et al.  established that Alice can only send covert bits to Bob over channel uses. This square-root law has later been shown to also hold for a binary symmetric channel  and a broader class of discrete memoryless channels . However, this square-root result can be further improved, for instance, by the use of an additional jammer to facilitate Alice’s transmission . Note such scaling-law results are obtained for sufficiently large .
Yan et al.  instead considered a delay-intolerant setup and studied the impact of finite on the covert communication performance. The optimality of Gaussian signalling has been then analyzed in delay-intolerant covert communications . Various other practical constraints, such as channel uncertainty [10, 11] and noise uncertainty  have also been modeled and investigated in covert communications. In addition, more complicated scenarios that involve artificial noise , multi-antenna nodes [14, 15, 16, 17, 18], full-duplex nodes  and relay-assisted transmission [20, 21, 22] have also been considered.
The above works focused on the design of covert transmission over conventional low-frequency bands. Compared to these frequency bands, millimeter-wave band has much more under-utilized spectrum and has now been put forward as an important means to expand the capacity for mobile communications [23, 24]. Due to its unfavorable propagation characteristics (such as high path loss and limited scattering), mmWave communication would heavily rely on beamforming transmission to ensure reliable links. The directional nature of the communication link makes it inherently suitable for covert transmission, because it is much more challenging for an adversary to overhear all of the communication.
While the potential of mmWave covert communication is conceivable, fundamental understanding and design guidelines for such system are still lacking. Related studies are scant except [25, 26], to the best of our knowledge. Reference  introduced a conceptual framework of mmWave soldier-to-soldier covert communications and discussed a few challenges at the physical layer and medium access control layer. However, it neither considered beamforming design that is crucial for the system nor provided rigorous quantification of the covertness level the framework can fundamentally achieve. Reference  considered a covert mmWave communication system, where Alice deploys dual independent antenna arrays, with one to form a beam towards Bob for covert data transmission and the other to form another beam towards Willie for jamming transmission. The outage probability and optimal covert rate of Alice-Bob link were characterized. However, this work only focused on the data transmission phase and did not address how the Alice-Bob directional link is established in the first place and whether or not the establishment of such link would require additional communication that might be detected by Willie.
Motivated by these observations, in this work, we consider a joint design of link establishment (beam alignment) and data transmission for covert mmWave communication. Specifically, we assume that both Alice and Bob are equipped with antenna arrays, while Willie is equipped with an omni-directional antenna to monitor all possible directions. Within a channel coherence time, Alice and Bob first take a commonly-used beam training approach [27, 28] to determine the best transmit-receive beam pair that aligns well with the channel, and then use this beam-pair found for subsequent data transmission. For a fixed coherence time, there is a tradeoff between the beam training duration and the effective throughput of the Alice-Bob link, since increasing the training duration can improve the beam alignment performance but at the expense of reducing time for data transmission. In addition, having larger training power and data transmission power can also contribute to an improvement of the throughput, however, this will increase Willie’s chance to successfully detect the presence of the Alice-Bob communication. Hence, a fundamental question is: How to jointly optimize the beam training duration, training power and data transmission power to maximize the throughput of Alice-Bob link while ensuring the covertness constraint imposed on Willie is met?
In this work, we address this question by assuming that Alice-Bob link is a Line-of-Sight (LOS) single-path channel for analytical tractability. With generalized flat-top beam codebooks and for exhaustive-search beam training, we derive a lower bound on the successful alignment probability as a function of beam training duration and training power. Based on this, we then develop a lower bound on the effective throughput of Alice-Bob link and study the training-throughput tradeoff optimization, subject to a covertness constraint at Willie. The resultant problem is highly nonconvex. To efficiently solve this problem, we exploit its structural properties and propose a Dual-decomposition Successive Convex Approximation (DSCA) algorithm. Numerical results demonstrate an interesting tradeoff among the key design parameters considered and also the necessity of joint beam training and data transmission design for covert mmWave communication. The resultant optimal effective throughput for Alice-Bob link crucially depends on the covertness level targeted by the system.
The remainder of the paper is organized as follows. Section II describes the mmWave covert communication model considered. Section III first characterizes the beam alignment and throughput performance of Alice-Bob link and the detection performance at Willie, and then moves on to study the optimized covert communication design. Numerical results are provided in Section V, while conclusions are drawn in Section VI.
Ii System Model
Ii-a General Description of the Communication Setup
We consider a mmWave covert communication scenario, where transmitter Alice wishes to communicate to receiver Bob, subject to the surveillance of warden Willie who attempts to detect the existence of this communication. It is assumed that Alice and Bob are equipped with Uniform Linear Arrays (ULA) of and antennas, respectively, so that directional transmission is possible between the two parties. In addition, both Alice and Bob deploy single RF chain to reduce hardware complexity as a commonly considered in existing works [29, 30, 27, 28]. As to warden Willie, he is always curious and greedy by nature and is thus assumed to deploy omni-directional antenna to monitor signal from all possible directions.
For the scenario described, we further assume a frame-slotted communication between Alice and Bob. As illustrated in Fig. 1, each frame has symbols in total (e.g., on the order of channel coherence time) and is further divided into a beam alignment (BA) phase that consists of symbols and a data transmission (DT) phase of symbols. In the BA phase, Alice and Bob jointly train transmit/receive beam pairs from pre-designed codebooks so as to determine the best beam pair that is then used for the subsequent DT phase. Both BA and DT phases should be carefully designed so that the directional link between Alice and Bob is sufficiently good and the probability of detection of communication at Willie is kept at the covertness level required.
In what follows, we first elaborate the signalling model for communication between Alice and Bob and then define the binary hypothesis detection problem at Willie.
Ii-B Signalling Model for Beam Alignment and Data Transmission Between Alice and Bob
We assume that Alice and Bob adopt an exhaustive-search (ES) strategy for beam training . Specifically, let be a set of unit-norm beams that jointly cover the Angle of Departure (AoD) interval at Alice, while be a set of unit-norm beams that jointly cover the Angle of Arrival (AoA) interval at Bob. The entire training codebook is then formed by considering all possible Alice/Bob beam pairs, i.e., .
For each , Alice sends a pilot sequence via beam , while Bob performs an output measurement via beam . The output signal at Bob is given by
where is the transmit power for beam training at Alice, denotes the pilot sequence with , denotes the channel between Alice and Bob, is the equivalent channel noise vector after received beamforming, whose elements are i.i.d. as . Assuming that the beam pairs in ES are allocated with equal training budget, we thus have .
In particular, we consider single-path light-of-sight (LOS) channel between Alice and Bob, and thus channel can be specialized to
where is the channel coefficient, while and are the steering vectors corresponding to AoA and AoD that are defined as
respectively, with being the wave-length and being the antenna spacing. Under this model and when beam pair is trained, the effective channel in (1) is specialized to
with beamforming gain
where is the transmit beamforming gain along AoD at Alice side, while is the receive beamforming gain along AoA at Bob side.
We also consider that each of the beams trained has uniform gain within its intended coverage interval (i.e., its mainlobe) and constant small leakage outside the mainlobe (as illustrated in Fig. 2) as in . This slightly generalizes the commonly used flat-top beam model and is useful to capture the side-lobe leakage of non-ideal beams in practice. Moreover, assuming that all Alice (Bob) beams have equal-size non-overlapping mainlobes that jointly cover the AoD range (resp. AoA ), so Alice (Bob) beamforming gain can then be represented as
respectively, where and denote the mainlobe interval (in the sin domain) of beam and , respectively, and , .
With the above assumptions, the output signal of (1) at Bob is then specialized to
where is drawn from the set , which depends on how well the th beam pair aligns with the underlying channel.
In addition, Alice and Bob are assumed to share the pilot sequences used for beam training beforehand. Given output measurement and the known pilot sequence , Bob can then further form match-filtered outputs as
The beam pair as leading to the strongest match-filtered output is then chosen the one used for subsequent data transmission, where is given by
In this way, during the DT phase, the Alice-to-Bob channel input-output relationship is represented by
where is the input data vector with i.i.d. elements , is the transmit power for data communication, with as in (12), noise vector with i.i.d. elements and is the output signal at Bob.
It is clear that the beamforming gain in (13) takes only one of , which depends on the beam alignment performance via ES beam training. Motivated by this, we introduce a notion of average effective throughput
to measure the average performance of the Alice-Bob data link, which explicitly takes into account the impact of beam alignment overhead and accuracy on the subsequent data communication. Unless otherwise specified, the function takes base 2.
Ii-C Binary Detection Problem at Willie
In order to determine the presence of Alice-to-Bob covert communication, Willie needs to distinguish the following two hypotheses
where all . Note that
denotes the null hypothesis where Alice has not communicated with Bob and thus only channel noise vector(with i.i.d. elements ) is observed, while denotes the alternative hypothesis where Alice has communicated with Bob and thus some information leakage superimposed on channel noise is observed.
Under , to be more specific, considering that the Alice-to-Willie link is also in LOS, the resultant channel is then represented as
where and are the associated AoD and channel coefficient, respectively. The signal vector i.e., at Willie is thus formed by two parts: The first symbols (i.e., ) are the signals at Willie when Alice and Bob perform beam training
while the rest are the signals at Willie when Alice and Bob perform data transmission:
Given , Willie makes a binary decision ( or ) that infers whether Alice’s transmission is present or not. Consider equal a priori probability of and . To measure the detection performance of Willie, we adopt the total detection error probability , which is defined as
where denotes the false alarm rate, denotes the missed detection rate. Let be the minimum error probability Willie can achieve by using an optimal detector.
Our ultimate goal is thus to develop appropriate Alice-to-Bob beam training and data transmission design so as to maximize for Alice-Bob link, while enforcing that at Willie for a covertness level required. Towards this end, in what follows, we shall first characterize Alice-Bob and Willie’s detection performance as a function of key system parameters (including training duration , transmit power and for BA and DT), and then propose a joint optimized design of BA and DT for the covert communication studied.
Iii Joint Optimization of Beam Alignment and Data Transmission for Covert MmWave Communication
Iii-a Characterization of for Alice-Bob Link
For the Alice-Bob link, recall from (14) that the average effective throughput crucially depends on the statistical property of beamforming gain after beam training. In particular, when there is perfect beam alignment, while takes a much smaller gain from the set if there is one-sided or two-sided misalignment.
To quantify , we introduce the following probability of successful alignment through ES beam training
where is the index of the optimal beam pair that leads to the largest beamforming gain. Considering the facts that beamforming gain is much larger under perfect alignment and that we ought to achieve high , we can approximate as
by dropping the marginal throughput contribution in the case of misalignment for the sake of tractability.
We now further analyze . Without loss of optimality and for notational convenience, is assumed. Based on the ES beam training (12), can be represented as
where we have defined normalized statistics
Letand noncentral parameter . To derive useful properties of , we introduce the following lemma.
The normalized statistics defined all follow noncentral chi-squared distribution with DoFs and with noncentral parameter drawn from the set defined as:
Specifically, , while among , variables follow , variables follow and variables follow .
which implies that
by the definition of noncentral chi-squared distribution, where i) when Alice and Bob’s beams are perfectly aligned, ; ii) when one-sided misalignment occurs at Alice or Bob side, or , respectively; iii) when misalignment occurs at both Alice and Bob, . Moreover, these variables are independent, since they are constructed from training measurements at different time.
For ease of exposition and without loss of generality, we further assume that variables , and . We now establish a lower bound on as stated in the following proposition.
Based on this proposition and from (21), the approximated is lowered bound by given by:
Iii-B Detection Performance at Willie
As for Willie, let and
be the probability distribution of its observations underand as in (15), respectively. In particular, under , the distribution is given by:
since only noises are observed at Willie. Under , the distribution can be represented by:
corresponds to the joint distribution of received signals at Willie when Alice and Bob are in the BA phase, whilecorresponds to the joint distribution of received signals at Willie when Alice and Bob are in the DT phase. Considering that Willie has no knowledge of the pilot sequence used for beam training between Alice and Bob, based on (15) and (17), is approximatively characterized by
To characterize , we note from (18) that Willie’s received signals would depend on whether it is in the main-lobe or in the side-lobe of Alice’s chosen beam for data transmission. To account for this factor, let be the probability that Willie is in the main-lobe of Alice’s data beam. Then can be characterized by a mixture of and as
where we have
as the joint probability distribution of Willie’s received signals when it is in the main-lobe and in the side-lobe of Alice s data beam, respectively.
Based on the and computed, Willie performs a binary hypothesis testing. It is known that the error probability that Willie can achieved is lower bounded by:
where is the total variation distance between and . However, the closed-form expression of this total variation is hard to obtain for the given and in our case. Alternatively, as in many existing works [2, 6, 8], we consider the following upper bound on the total variation by using Pinsker’s inequlity:
where is the relative entropy of to defined by:
As a result, a sufficient condition to ensure the covertness constraint at Willie is that
We further note that in our case contains a mixture of multivariate Gaussian component as in (39), which renders closed-form expression for still difficult. To make the problem more tractable, we further approximate by a joint distribution as given by
whose underlying variables are i.i.d. Gaussian distributed with zero mean and variancewith . Namely, we approximate each underlying mixture Gaussian variable in of (39) with a single Gaussian variable of the same mean and variance. Letting , we thus approximate by , which can be derived in closed-form as:
with , and
. We note that this approximation is accurate in particular when the signal-to-noise-ratio (SNR) at Willie is relatively small (a typical case in covert communications).
Iii-C Problem Formulation
which aims to optimize the number of symbols allocated to each beam pair trained ( thus reflects the total training overhead), training power and data transmission power in order to maximize , subject to the covertness constraint at Willie.
It is noted that while optimizing lower bound might not give exactly the same results as optimizing the true effective throughput, this problem still provide valuable insights into the tradeoff between beam training overhead and achievable rate for Alice-Bob link, and also the tradeoff between the rate performance of Alice-Bob link and the achievable covertness level against Willie. Specifically, spending more symbols on beam training would improve the beam alignment performance between Alice and Bob, but at the expense of reducing the time left for data transmission. In addition, having larger training power and data transmission power would improve the effective throughput of Alice-Bob link, but at the risk of violating the covertness constraint imposed on Willie.
Solving problem (46) is quite challenging because the optimization variables are coupled in the nonconvex objective function and constraints. Moreover, the training overhead for each beam pair is a discrete variable, which further complicates the solution of problem (46). Hence, we are faced with a mixed-integer nonlinear programming problem, which is usually considered as NP-hard. In principle, one can attempt to perform exhaustive search over variable space to find the optimal solution, but this would require traversing all possible values and proper discretization of , which leads to extremely high computational complexity and an unaffordable computation overhead. In the next section, we shall propose a more efficient algorithm to solve this problem.
Iv Dual-Decomposition Successive Convex Approximation Algorithm
In this section, we develop an efficient double-loop iterative algorithm named DSCA, which integrates dual-decomposition  with successive convex approximation (SCA) method  to find the stationary solution of problem (46). Specifically, we first recast problem (46) into a more tractable yet equivalent form by exploiting its structural properties. We then elaborate the design of the proposed algorithm and also prove its convergence to a local stationary point.
Iv-a Problem Reformulation
Before proceeding to the derivation of the proposed algorithm, a suitable transformation for problem (46) is necessary, and we provide the following corollary:
Since the function is monotonically nondecreasing and also analytic in the real region, Corollary 1 can be easily proved. It is noteworthy that the above equivalent transformation would facilitate the separation of optimization variables, thereby simplifying the subsequent development of the proposed algorithm.
To make the problem tractable, we subsequently relax the discrete integer constraint (46c) into a closed connected subset of the real axis, i.e., . We remark that the limiting point generated by the proposed DSCA algorithm may not satisfy the integer constraint in (46c). To obtain an integer solution for the optimal training overhead required by each beam pair, we use the same method as in  to round to its nearby integer as follows
where is chosen such that constraint (46b) is met. Since is monotonic in