I Introduction
A number of wireless applications exists involving echoassisted communication wherein messages transmitted by the source arrive at the destination as multiple noisy copies. Typical examples include communication over frequencyselective channels [1], relay networks [2], and multiple receive antennas [3]. In such echoassisted scenarios, it is well known that suitably combining the noisy received copies at the destination increases the effective signaltonoiseratio, thereby facilitating higher transmission rate.
In this work, we consider attack models on echoassisted communication wherein a subset of the copies collected at the destination might have been manipulated by an adversary. Such scenarios are attributed to practical limitations on the adversary to manipulate all the copies. For instance, in the case of frequencyselective channels with delay spreads, the adversary may have processingdelay constraints to manipulate the symbols on the first copy, but not the subsequent ones [1]. We study a specific adversarial attack referred to as the flipping attack [4] wherein the message bits of the attacked copy are flipped at 50% rate independently. With such attacks, the dilemma at the destination is whether to combine the multiple copies or to discard them when recovering the messages. To gain insights on the attack model, we focus on the case of two received copies, out of which the second copy might have been manipulated by an adversary. Although adversarial models on binary channels have been studied in the literature [4, 5], flipping attacks on echoassisted communication involving binary input and continuous output have not been studied hitherto. Henceforth, throughout the paper, we refer to the source and the destination as Alice and Bob, respectively.
Ia Motivation and Contributions
Consider an echoassisted communication, wherein information bits from Alice to Bob are transmitted as a sequence of length binary codewords. Specifically, each codeword, represented by , is received at Bob as two noisy copies, given by and , where and are constants perfectly known to Bob, and and are statistically independent additive noise components. The adversarial model in our setting is that the second copy is vulnerable to the flipping attack but not the first one. As a result, Bob needs to detect whether the second copy is attacked, and then decide on combining the two copies. In this work, we consider a nonpersistent attack model, wherein is vulnerable to the flipping attack on 50% of the codewords chosen at random in an i.i.d. fashion. A conservative strategy to handle this adversarial setting is as follows:

Bob discards of every codeword irrespective of the attack, and uses only to recover the information bits.

Alice designs an length codebook (designed for Gaussian channels) which achieves the rate .
Keeping in view of the above conservative baseline, we are interested in designing a detection strategy at Bob that can assist Alice in transmitting at higher rate than . Towards that end, Bob needs to detect the attack within the first samples (referred to as the frame length) of every codeword, and then decide whether the second copy can be used to recover the information bits. Subsequently, the decision on the combining strategy has to be fed back to Alice so that any possible rate modifications can be incorporated through the next coded symbols. Given that a practical detection strategy is typically imperfect, we are interested in quantifying the achievable rates of a detection strategy by incorporating its associated missdetection and falsepositive rates. However, since the attack model is not memoryless and the input alphabet is finite in size [6], we show that computing the achievable rates for arbitrary missdetection rates is challenging for large . To circumvent this issue, we provide a new framework, as depicted in Fig. 1, to approximate the achievable rates of detectors under special conditions on missdetection and falsepositive rates (see Theorem 1). We also show that the achievable rates offered by our framework is lower bounded by that of the conservative strategy, thereby giving rise to interesting questions on codedesign criteria at Alice. We propose a codedesign criteria to assist Alice in achieving the rates promised by the detector, and show that the criteria is closely coupled with , which is the number of samples after which Bob has to feedback his decision on attack detection. Finally, we showcase the results of attackdetection strategies which are motivated by both traditional as well as neural network ideas, and study their applicability to detect attacks on codewords of short blocklengths.
Notations: For an
dimensional random vector
with joint probability distribution function
, its differential entropy, denoted by , is represented as , where the expectation is over. A Gaussian random variable with zero mean and variance
is denoted by . An identity matrix, an length vector of zeros, and an length vector of ones are denoted by , , and , respectively. For a given length vector, denoted by , the notation for , denotes the length vector containing the first components of . The notation denotes the usual probability operator.Ii System Model
Alice transmits an length sequence such that the components of are i.i.d. over the Probability Mass Function (PMF) for some . Meanwhile, Bob receives two copies of over the Additive White Gaussian Noise (AWGN) channels as
(1) 
where and are constants known to Bob, and represent the additive Gaussian noise distributed as . We assume that and are statistically independent. Between the two copies, we assume that is vulnerable to the flipping attack, whereas is not. To model the flipping attack on , we introduce Hadamard product, denoted by , between and . With attack, the components of are i.i.d. over the PMF , and are unknown to Bob. However, without attack, . In this adversarial setting, the attacker executes the flipping attack on a codeword chosen randomly with probability in an i.i.d. fashion. By using and to denote the events of attack and noattack, respectively, we have .
We compute the mutual information (MI) offered by the channel when is perfectly known to Bob, namely, and . We refer to this case as the Genie detector. When , each bit of is flipped by the attacker with probability in an i.i.d. fashion, and as a result, it is straightforward to prove that . As a countermeasure, the following proposition shows that discarding when is the optimal strategy at Bob (we omit the proof due to lack of space).
Proposition 1
When the components of are i.i.d. over the PMF , for , then we have and where and are as given in (1).
(2) 
Without the flipping attack, i.e., , the mutual information of the channel is given by
Thus, with perfect knowledge of at Bob, the MI offered by the channel is
where in the last equality is obtained by combining and when as such that the additive noise vector is distributed as , where . It is straightforward to verify that , which implies that the combining strategy is optimal without the attack. Furthermore, can be simplified as
(3) 
by using the memoryless nature of the channel, attributed to the perfect knowledge of at Bob. Here, are the scalar channels such that the additive noise is distributed as Since takes values from finite input alphabet, in (3) can be numerically computed as a function of the input PMF , constants and , and [6]. Specifically, is given by
(4) 
where such that is as given in (2). The conditional entropy can be computed using the distribution given by
for . Similarly, we can also compute . In the next section, we study the MI of the combining strategy when the attack detector at Bob is not perfect.
Iii Achievable Rates with Practical Detection Strategy
We consider a practical attackdetection strategy, which uses the received samples to detect the flipping attack on every codeword. Based on the detector’s output, represented by the variable , Bob decides either to combine and , or discard . Note that this detector is typically imperfect, and as a result, it has its associated missdetection and falsepositive rates, defined as and , respectively. When the detector outputs , Bob drops the samples , and only uses the samples to recover the information bits. On the other hand, when the detector outputs , Bob combines and to obtain and then uses it to recover the information bits.
In the event of missdetection, i.e., when and , we know that is random and unknown to Bob. Therefore, is denoted as , and is given by
(5) 
However, when and , we have , and therefore, is denoted as , and is given by
(6) 
The MI of this detection strategy, denoted by , is
(7)  
where and .
(9)  
To compute , we have to compute for a given blocklength . However, this needs us to evaluate the differential entropy of the probability distribution function given in (9). Since the input alphabet is finite in size, the corresponding differential entropy can only be computed using numerical methods, and as a result, computing is intractable for sufficiently large (of the order of hundreds). In a nutshell, the above computational issue is because the equivalent channel when is not memoryless. To circumvent this problem, we show that the MI value of some detectors can be computed using an approximation under special conditions on and .
The following sequence of definitions and lemmas are useful to present our results on approximation in Theorem 1.
Definition 1
For , let a set , for some negligible , be defined as
Definition 2
For a given attack detector, we define its performance profile as
where and .
Definition 3
For a given , let denote an dimensional discrete constellation in obtained by using over . On , we define



,
where and denotes the squared Euclidean distance.
Lemma 1
If are such that and is a negligible number, then we have
Proof:
The convex combination can be written as . This implies that , where when , and when . Since is negligible, for every .
Since the accuracy of the approximation depends on , we henceforth denote by .
Lemma 2
If , and are such that for each then we have
(10)  
(11) 
for every .
Proof:
We only show the applicability of (10). Since can be written as a weighted sum of over all , (10) can be used to show the applicability of (11). Given , the dimensional distribution of is given by
When evaluated at , we can upper bound the above term as
(12) 
where is as given in Definition 3. Meanwhile, the dimensional distribution of is given by
(13)  
where the first inequality holds since . The second inequality holds because of triangle inequality. Finally, if for each then (13) can be further lower bounded as
(14)  
(15) 
where the last inequality is due to the bound in (12). This implies that for each . This completes the proof.
Theorem 1
If , and are such that for each and if the detection strategy is such that , for a fixed small , then we have , where
(16) 
and the notation captures the notion that the approximation on MI is a result of approximating the underlying distributions using .
Proof:
Based on the expression of in (7), it is straightforward to show that . In this proof, we only address the computation of . From first principles, we have
where can be obtained using as
where is as given in (9). When the attackdetection technique operates at , then we can show that , where and such that the expectation is over . By applying the results of Lemma 1 and Lemma 2 on (9), we get
The above approximation holds because plays the role of in Lemma 1, and the condition of Lemma 1 is satisfied because of (11) in Lemma 2. As a result . Furthermore, since each component of is independent across , we have
(17) 
where such that is given by (2). Similarly, the conditional differential entropy is given by
(18) 
where and such that can be written as
(19) 
To arrive at (19), we assume that and are statistically independent. Again, applying the results of Lemma 1 and Lemma 2 on (19), we have the approximation
for every . As a result, we have . Finally, using the above expression in (18), we get
(20)  
where the last equality is due to i.i.d. nature of . Overall, using (20) and (17) in (7), we get the expression in (16).
Due to intractability in evaluating , Theorem 1 approximates the achievable rates of a special class of detection strategies that operate in the region on the channel parameters