Learning-Aided Physical Layer Attacks Against Multicarrier Communications in IoT

08/01/2019 ∙ by Alireza Nooraiepour, et al. ∙ Rutgers University 4

Internet-of-Things (IoT) devices that are limited in power and processing capabilities are susceptible to physical layer (PHY) spoofing attacks owing to their inability to implement a full-blown protocol stack for security. The overwhelming adoption of multicarrier communications for the PHY layer makes IoT devices further vulnerable to PHY spoofing attacks. These attacks which aim at injecting bogus data into the receiver, involve inferring transmission parameters and finding PHY characteristics of the transmitted signals so as to spoof the received signal. Non-contiguous orthogonal frequency division multiplexing (NC-OFDM) systems have been argued to have low probability of exploitation (LPE) characteristics against classic attacks based on cyclostationary analysis. However, with the advent of machine learning (ML) algorithms, adversaries can devise data-driven attacks to compromise such systems. It is in this vein that PHY spoofing performance of adversaries equipped with supervised and unsupervised ML tools are investigated in this paper. The supervised ML approach is based on estimation/classification utilizing deep neural networks (DNN) while the unsupervised one employs variational autoencoders (VAEs). In particular, VAEs are shown to be capable of learning representations from NC-OFDM signals related to their PHY characteristics such as frequency pattern and modulation scheme, which are useful for PHY spoofing. In addition, a new metric based on the disentanglement principle is proposed to measure the quality of such learned representations. Simulation results demonstrate that the performance of the spoofing adversaries highly depends on the subcarriers' allocation patterns used at the transmitter. Particularly, it is shown that utilizing a random subcarrier occupancy pattern precludes the adversary from spoofing and secures NC-OFDM systems against ML-based attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The broadcast nature of radio signal propagation along with standardized transmission schemes and intermittent communications make wireless communication systems extremely vulnerable to interception and spoofing attacks. Specifically, the broadcast nature of the wireless medium facilitates the reception of radio signals by any illegitimate receiver as long as it is within the coverage radius of the transmitter. Further, standardized transmission and conventional security schemes open up wireless systems to interception and eavesdropping [1]. Additionally, sporadic transmissions of low-cost wireless devices, especially the significantly growing number of IoT devices, provide massive opportunities to adversaries and malicious actors for spoofing attacks. Therefore, it is of paramount importance for wireless communication systems to enhance the security mechanisms meant to combat the adversaries, especially in light of the ongoing adoption of IoT systems in industrial applications.

IoT devices that have limited battery and computational resources may not be able to execute a full-blown protocol stack for security and authentication (non-access stratum (NAS)) purposes [2]. This indeed challenges the implementation of conventional key-based cryptographic techniques, which require appropriate key management procedures to distribute, refresh and revoke digital security keys, for a large number of IoT devices. As a result, physical layer (PHY) security has been put forth as an alternative to higher-layer security mechanisms by exploiting the physical layer characteristics of the communication links.

In this work, we focus on understanding the PHY properties of multicarrier communications that can help secure non-contiguous orthogonal frequency division multiplexing (NC-OFDM) and OFDM systems. To this end, we investigate the robustness of such systems against PHY spoofing attacks aided by machine learning tools that aim at inferring PHY characteristics of OFDM/NC-OFDM transmissions and sending bogus data to the receiver.

I-a Relation to prior work

Current attempts at using PHY characteristics as authentication keys for the message source follow various approaches. One possibility is to assume a pre-shared secret key hidden in the modulation scheme, which is detected by the receiver [3, 4]. In other keyless transmitter-based methods (also known as wireless fingerprinting), device-specific non-ideal transmission parameters are extracted from the received signal. These are identified as characteristics of the claimed source and then compared with those from previous authenticated messages [5]. Channel-based authentication algorithms compare the channel response estimated from the current message with that estimated from the previous ones by the claimed source, thus actually authenticating the position of the transmitter rather than its identity. In order to reliably distinguish channels from two different positions, some source of diversity is exploited in practice, either in the spatial domain by measurements of the received power levels at many receivers [6], [7] or in the frequency domain by wideband channel estimates [8], [9], [10]. However, the attacker in these works is assumed to only use higher-layer identity forging (e.g., spoofing of a MAC address) and does not try to attack the system by exploiting the underlying PHY characteristics of the signals, which we refer to as PHY spoofing.

PHY spoofing is shown to be able to compromise OFDM systems, which are widely used in IoT standards [11]. NC-OFDM systems in which transmissions take place over a subset of subcarriers are shown to be capable of circumventing this impediment [12]. NC-OFDM systems are also capable of efficiently utilizing the fragmented spectrum and improving spectral efficiency. The authors in [13, 14] examined the low probability of exploitation (LPE) characteristics of NC-OFDM transmissions, assuming that an adversary is using cyclostationary analysis [15, 16, 17] to infer transmission parameters. In [13], the authors showed that the cyclostationary analysis is extremely challenging to do for most choices of NC-OFDM transmission parameters. Therefore, NC-OFDM systems may be deemed to be secure against PHY spoofing. However, it is not clear if this is still the case when an adversary utilizes powerful data-driven tools like machine learning (ML) for inferring transmission parameters, followed by spoofing.

Particularly, a wide range of ML tools have recently received a lot of attention among communication researchers for solving analytically intractable problems. In particular, the authors in [18] proposed an end-to-end learning of communication systems based on deep neural networks (DNNs). They optimize transmitter and receiver jointly without considering the classical communication and signal processing blocks, including channel encoder and modulator. In [19]

, the authors showed that deep learning techniques are very promising in scenarios where the channel is too complex to be described analytically. The authors in

[20] demonstrated how a DNN-based system can communicate over-the-air without the need for any conventional signal processing blocks. Moreover, securing a point-to-point communication system against a DNN-based attacker trying to determine the modulation scheme is considered in [21].

I-B Our contributions

We investigate the resilience of OFDM/NC-OFDM systems against PHY spoofing, assuming the adversary is equipped with ML tools. Two general ML algorithms are assumed to be utilized by the adversary: supervised and unsupervised algorithms. The former relies on true labels during training while the latter learns from the structure of raw data itself. For the supervised case, we assume that the adversary trains a deep feed-forward neural network (DNN) to estimate the transmission parameters. For the unsupervised scenario, VAEs are utilized by the adversary to extract representations from the datasets of NC-OFDM/OFDM signals that can be used for PHY spoofing. In the unsupervised scenario, we also develop a new metric based on disentanglement principles

[22] to measure the usefulness of the learned representations for PHY spoofing.

In terms of datasets, we assume the adversary receives the signals through an additive white Gaussian noise (AWGN) channel. We note that if a transmission scheme is secured in an AWGN scenario, then one can conclude that it would also be secure in fading scenarios where the received signals rapidly decorrelate in time and space. Therefore, in a fading channel, unless the adversary is physically collocated with the transmitter, it would find it difficult to mimic the fading channel between the transmitter and receiver [8, 9]. We provide numerical results showing that an adversary equipped with ML tools is able to spoof an NC-OFDM system, which was previously considered to be completely secure against cyclostationary based attacks. Furthermore, we demonstrate that the representations learned by the VAEs carry significant information about the PHY characteristics of the NC-OFDM signals, including the total number of subcarriers, the amount of power sent in each subcarrier and modulation schemes (e.g., BPSK and QAM), all of which can be used by the adversary for PHY spoofing. Hence, unlike what is suggested by cyclostationary analysis in [13, 14], these results show that NC-OFDM systems are vulnerable to PHY spoofing if the adversary is equipped with ML tools. However, we also establish that the success of spoofing attacks highly depends on the subcarrier occupancy patterns chosen at the transmitter; the more structured is the band allocation, the better is the performance of the adversary in estimating the transmission parameters. Therefore, in order to secure the NC-OFDM systems against ML-based attacks, the transmitter should employ random subcarrier occupancy patterns, where active subcarriers are chosen in a (pseudo)random fashion to preclude the adversary from correctly inferring the transmission parameters and spoofing the signal.

I-C Notation and organization

Throughout the paper, vectors are denoted with lowercase bold letters while uppercase bold letters are reserved for matrices. The

th element of a vector is denoted by . Non-bold letters are used to denote scalar values and calligraphic letters denote sets. The spaces of real and complex vectors of length are denoted by and , respectively. Also, real and imaginary parts of a complex number are denoted by and

, respectively. The expectation and probability mass (or density) function of a random variable

are denoted by and , respectively, while is used to denote the probability of an event. The notation

denotes a Gaussian distribution with mean vector

and covariance matrix .

The rest of the paper is organized as follows. The system model is described in Section II. An introduction to cyclostationary analysis and its limitations for PHY spoofing are presented in Section III. In Sections IV and V

, PHY spoofing attacks based on unsupervised and supervised learning algorithms are discussed, respectively. Section

IV also describes the significance of the learned representations in VAEs for NC-OFDM/OFDM signals. We introduce a new metric based on the idea of disentanglement in Section VI to measure the usefulness of the learned representations for PHY spoofing. Finally, we present numerical examples that highlight the performance of learning-based PHY spoofing attacks in Section VII.

Ii System model

Consider a system composed of a transmitter (Tx), a receiver (Rx) and an adversary. An important type of attack in this setting is PHY spoofing where an adversary disguises itself as a legitimate transmitter and sends spurious data to the receiver (Fig. 1). Specifically, the adversary overhears the signals sent by the Tx to the Rx, and its goal is to send bogus data to the Rx using signals that have similar PHY characteristics to the ones sent by the Tx. In this way, the Rx cannot distinguish the source of the original and the bogus data, and by decoding the latter, it might compromise the underlying system security in different ways. PHY spoofing is indeed achievable by estimating the transmission parameters used by the TX.

The point-to-point communication link between the Tx and Rx is assumed to operate over a total bandwidth composed of a set of subcarriers. The transmitter can either transmit over the whole band in the case of OFDM transmissions, or a subset of subcarriers (known as active subcarriers) in the case of NC-OFDM transmissions. The signal transmitted by the Tx can be written as:

(1)

where and are the transmitted symbol and a power factor corresponding to the th subcarrier in the th time slot, respectively. The total duration of one NC-OFDM/OFDM symbol is given by , with and being the NC-OFDM symbol duration and the duration of the cyclic prefix, respectively. Further, is called subcarrier occupancy pattern, which is a binary vector of size whose elements are zero for inactive subcarriers and one for active subcarriers. Particularly, for OFDM transmission amounts to an all-one vector of size . We assume is a rectangular pulse of width centered at . The center frequency of each subcarrier is denoted by where represents the width of each subcarrier.

The system model in Fig. 1 consists of three AWGN channels corresponding to pairs Tx-Rx, Tx-Adversary and Adversary-Rx, where each can have a different signal-to-noise ratio (SNR). Particularly, the spoofing performance highly depends on SNR of the Tx-Adversary channel, which we call spoofing SNR in the remaining of the paper. For all three channels, the discrete-time signal received by a party is given by,

(2)

where is additive white Gaussian noise at time , and represents the number of (complex) samples. We assume noise samples at different time instances

’s are independent and identically distributed (i.i.d.) with zero mean and variance

. Denoting , signal power is computed by where is the -norm. Then, SNR and SNR per bit equal and , respectively. One can verify that equals where denotes the number of active subcarriers, and is the number of bits sent over each subcarrier.

Fig. 1: PHY spoofing by an adversary overhearing a transmission between two legitimate parties (Tx and Rx). The depicted signals correspond to an NC-OFDM/OFDM system.

As noted earlier, an AWGN model for the Tx-Adversary channel presents the worst-case scenario from a defense perspective as fading channels result in rapid decorrelation of signals both in time and space [8, 9] which deteriorate the PHY spoofing performance. We also assume that the adversary’s spoofing performance is not affected by time arrival or carrier frequency offsets, i.e., the communication is taking place with perfect synchronization between Tx-adversary and adversary-Rx in both time and frequency. Again, one can conclude that the performance of the adversary only gets worse in the case of time/frequency offset.

The Tx chooses the parameters , and and transmits an OFDM/NC-OFDM symbol to the Rx. The positions of active subcarriers in either follow a certain pattern or are totally random. The adversary seeks to find these transmission parameters in order to generate waveforms similar to (1), inject bogus data in place of , and transmit them to the receiver. We assume the Rx only decodes data which are being sent over the active subcarriers with the correct and chosen by Tx; otherwise, a decoding failure will occur. Therefore, we utilize bit error rate (BER) at Rx as a measure to evaluate the performance of the adversary in terms of spoofing. If this BER is close to that of the baseline transmission (where the parameters are perfectly known at the Rx), it is indicative of the maximum spoofing performance of the adversary. At the other extreme, a BER close to suggests the adversary cannot do much in terms of spoofing, i.e., Tx-Rx transmission is secured against PHY spoofing. We note that as Tx may use a different set of transmission parameters to transmit each NC-OFDM/OFDM symbol (e.g. sending over a different set of subcarriers), these parameters need to be estimated by the Rx as well in order to ensure reliable communication for the legitimate parties. Therefore, we seek a secure scheme which provides reliability for Rx-Tx transmission while preventing PHY spoofing.

We assume the adversary (and the Rx) is able to build up a dataset out of its received signals and perform ML algorithms on it to train deep neural networks (DNNs). Each entry of the dataset consists of the samples of an OFDM/NC-OFDM symbol that may or may not be associated with the corresponding true transmission parameters (, , ) referred to as labels. Depending on the availability of the labels during the training stage, two types of ML algorithms are useful: supervised and unsupervised. The former makes use of the labels for training the DNNs while the latter exploits possible data structure and clustering methods without using labels. Next, we discuss cyclostationary analysis as the classical tool for inferring the transmission parameters.

Iii PHY spoofing via cyclostationary analysis

While cyclic prefix is useful to mitigate the effect of inter-carrier interference in multicarrier systems, it also enables an adversary to infer basic transmission parameters using cyclostationary analysis [15]. Cyclostationary analysis is based on the auto-correlation function of the transmitted signal, which is calculated as

(3)

where . The periodicity of in allows representing it as a Fourier series sum

(4)

where is the cyclic frequency and is called the cyclic auto-correlation function (CAF). For OFDM transmissions, this function can provide an adversary with [15, 16]. However, for NC-OFDM transmissions this analysis does not always lead to the correct results as illustrated in the following example.

Example 1

Consider an NC-OFDM signal with a total number of subcarriers with occupancy pattern vector illustrated in Fig. 3, where active subcarriers are spaced subcarriers apart (known as interleaved subcarriers). The transmitter chooses two transmission parameters and based on one of the cases listed in Table I and sends (1) over the channel. Then, the adversary receives the noisy signal, samples it at an arbitrary rate above Nyquist frequency to obtain samples and estimates the CAF function using

(5)

where denotes the th obtained sample and belongs to the set of integers. To extract and corresponding to the original transmission, the adversary must look at the locations of the absolute peaks in (5) at as illustrated in Fig. 3. We note that for all three cases, CAF-based analysis results in the same plot. In other words, the adversary cannot decide which set of transmission parameters are used by the transmitter.

Case
TABLE I: Three sets of transmission parameters for NC-OFDM signals
Fig. 2: Interleaved subcarrier occupancy pattern with interleaving factor in an NC-OFDM symbol.
Fig. 3: Estimated CAF (5) at for the three cases considered in Table I at spoofing SNR of dB, where , and

In the next sections, we study how the adversary can make use of deep learning models for PHY spoofing. Our main motivation for taking this approach comes from the fact that analytical approaches seem to be failing at spoofing even for simple NC-OFDM systems, which might give PHY designers the idea that the NC-OFDM PHY is secure. We investigate the PHY spoofing performance of an adversary equipped with deep learning tools, which of course comes at the expense of a higher cost (associated with training), and answer the question that if/when it is able to do so and what are the parameters that affect its performance. Towards this goal, we consider two major types of ML algorithms, i.e., supervised and unsupervised, depending on the availability of the labels (true transmission parameters) for training. As mentioned in Section II, these labels correspond to transmission parameters in our scenario that can be obtained at a test facility with similar PHY characteristics to those of the Tx-Adversary channel.

Iv PHY spoofing via unsupervised learning

We assume the adversary utilizes variational autoencoders (VAEs) for PHY spoofing in an unsupervised manner. VAEs are designed based on the idea of variational inference, which can be described via a latent variable model (LVM). An LVM is a generative model for a dataset consisting of

i.i.d. samples of a continuous random variable

. LVM is defined over a joint distribution

parameterized by where is an unobserved continuous random variable known as the latent variable (feature space) and denotes the model size. The joint density is denoted by where is a fixed prior over the latent space and is the conditional generator. Then, the inference problem amounts to finding the posterior of the hidden random variable , i.e., . However, this integration is not tractable for complicated datasets

consisting of high-dimensional data samples

’s, and alternative methods must be utilized in practice to compute it approximately. Variational inference is a promising optimization-based candidate for this purpose, and has enjoyed a lot of attention during the past few years [23, 22, 24, 25]. Particularly, a VAE learns an optimal model () which maximizes the probability of data samples by maximizing the expectation of the evidence probabilities, i.e.,

(6)

over where denotes the underlying true distribution of the data in the dataset . However, this requires computing which is not tractable. Instead, [23] obtains a lowerbound called evidence lower bound (ELBO) for individual data sample in (6), which is given by

(7)

Then, a VAE solves the following optimization problem,

(8)

where and are optimization variables of dimension and , respectively. We call a probabilistic encoder, since given a data sample it produces a distribution (e.g. a Gaussian) over the possible values of the code from which the data sample could have been generated. Similarly, we call a probabilistic decoder as given a code it produces a distribution over the possible corresponding values of . We choose both encoder and decoder to be Gaussian distributions with diagonal covariance matrix whose parameters are estimated by deep feed-forward neural networks parameterized by and , respectively (see Fig. 4).

Fig. 4: A schematic of a VAE where encoder and decoder are chosen to be Gaussian distributions with mean vector and diagonal covariance matrix with variances . These are parameterized by two DNNs trained based on (8). and are generated by sampling from the distributions corresponding to the encoder and the decoder, respectively.

The (KL divergence) term in (7) can be seen as a regularizer that encourages the posterior to be close to the prior

, and the second term is called reconstruction loss. Particularly, when ELBO is maximized the KL-divergence term approaches zero. This KL term can be computed analytically as a closed-form expression by choosing specific distributions, e.g., Gaussian and Bernoulli distributions, for

and [23]. Afterward, one can efficiently minimize the negative of ELBO using mini-batch gradient descent algorithm [23]. Mini-batch gradient descent is a variant of the gradient descent algorithm that splits the training dataset into small batches that are then used to calculate model error followed by updating of model coefficients.

One of the long-standing problems in ML literature is learning representations from large datasets in an unsupervised manner that facilitates the downstream learning tasks (e.g., classification). VAEs have shown great potential [26] for learning such so-called “useful” representations. Although it is not clear how to define/measure usefulness in unsupervised settings, researchers have developed the concept of disentangled representation that offers several advantages. Data samples in this concept are assumed to be generated via a finite number of generative factors , each representing notions like position, scale, rotation, etc., in the case of an image sample for instance. Although there is no canonical definition for a disentangled representation, [24] points out that the learned representations are called disentangled if changes in one of the generative factors of the data are mirrored by the encoder in Fig. 4 in exactly one of the latent variables (i.e., one dimension of the latent space). We will discuss this in more depth in Section VI, mention the weaknesses associated with this definition, and propose a new metric to overcome some of them for the datasets of NC-OFDM/OFDM signals. A VAE can encourage learning disentangled representations by choosing , which corresponds to different latent variables being uncorrelated. However, this choice is not enough to achieve a disentangled representation from large datasets with high-dimensional data samples since the VAE objective function in (7) usually incurs a trade-off between the reconstruction loss and the KL divergence term that determines the disentanglement level. Hence, researchers have been looking for ways to not sacrifice one for the other by redesigning VAEs. In this paper, we will focus on four of the most effective techniques for learning disentangled representations and investigate their performances in the context of spoofing NC-OFDM/OFDM signals.

  • -VAE: The authors in [26] proposed to weight the KL divergence term in (7) by a real-valued factor to encourage the VAE to learn disentangled representations. Therefore, the ELBO for data sample in -VAE is

    (9)

    An ELBO with a larger encourages learning disentangled representations as it penalizes more the dissimilarities between the learned posterior and the prior by weighing the KL term. However, this has been shown to sacrifice the reconstruction ability of the VAE in large datasets [26].

  • DIP-VAE: Disentangled inferred prior (DIP) VAE was proposed in [22]. The authors add an extra term to ELBO (7) where is the learned prior. By explicitly minimizing this term one can effectively encourage learning disentangled latent variables without the need for weighting the

    term in ELBO, which could result in higher reconstruction losses. In this vein, DIP-VAE matches the moments of two distributions

    and . In particular, it encourages the covariance of to be the same as the covariance of to minimize . Using the law of total covariance, the covariance of can be written as

    (10)

    For a dataset with real-valued data samples, can be chosen to be [23] where and represent mean and covariance of the Gaussian for a given as a function of the parameter . Then, plugging the mean and covariance of in (10), we get

    which has to be close to the identity matrix for the case when

    . By using -norm as the measure of the proximity, DIP-VAE solves the following optimization problem

    (11)

    where is the same as (7), and and are two hyper-parameters controlling penalties induced by the diagonal and the off-diagonal components in , respectively.

  • FactorVAE: As mentioned previously, weighting the KL term by a could negatively impact the reconstruction performance of VAE. Mathematically, this can be seen from

    (12)

    where is the mutual information between and . Therefore, penalizing is a double-edged sword. On the one hand, it forces to be close to , and encourages learning disentangled representations. On the other hand, by penalizing it encourages learning a independent of , which would limit the amount of information stored in about . Thus a larger leads to a better disentanglement but reduces the reconstruction quality. Similar to DIP-VAE, FactorVAE avoids this conflict by augmenting to the ELBO (7) in order to directly encourage independence in the latent variables, which results in the following objective:

    (13)

    where is assumed to be of the form and denotes dimension of . Although the idea of directly minimizing was used in DIP-VAE as well, FactorVAE takes a different approach towards achieving this goal. Specifically, FactorVAE estimates the density ratio,

    (14)

    via density-ratio trick [27]. For each mini-batch, it generates samples from both and , and approximates (14) by a model parameterized by a DNN , which takes a sample as input and outputs the probability that belongs to . Then, utilizing

    (15)

    (13) can be jointly maximized over the set of parameters , each of which is taken to be a DNN in this work.

Iv-a Learning useful representations from NC-OFDM signals

In this section, we describe training of VAEs on a dataset of NC-OFDM signals whose entries consist of samples of an NC-OFDM signal in (1) obtained by

(16)

where denotes the sampling rate. We concatenate the real and imaginary parts of the samples in (16) to build a single sample of size in the form for . We have chosen encoder and decoder to be Gaussian with diagonal covariance matrix having means and , and variances , , respectively. We note that the learned representations are distributed around the mean of , denoted by , and they approach when goes to zero. Encoder (resp., decoder) is modeled with a fully connected feed-forward DNN whose outputs are and (resp., and ). These DNNs have hidden layers with , , , , and

number of neurons in each layer. We also have trained DNNs of larger parameter spaces, but the performance of the networks did not improve noticeably. The prior

is also chosen to be a Gaussian distribution with zero mean and identity covariance matrix, which makes it possible to compute the KL term in (7) analytically [23]. We build two NC-OFDM datasets whose signals are generated by two different subcarrier occupancy patterns, and investigate the properties of the learned representations by the FactorVAE (13) with , which has shown to be able to find disentangled representations effectively (see Section VII). Training is done via mini-batches of size with a learning rate of over a dataset of size . In order to study what information has been encoded to each dimension of the latent space, we use a common technique called latent traversal. After training a VAE, latent traversal obtains the representation corresponding to a data sample . To study what information the th latent variable () represents about , latent traversal changes the value of (e.g. between

) while fixing the other latent variables and studies the corresponding changes induced by the decoder in the fast Fourier transform (FFT) of the reconstructed sample

.

First, we consider a dataset of NC-OFDM signals with a structured band allocation based on different occupancy pattern vectors depicted in Fig. 6. We note that this is given as a toy example, which enables us to fully describe the properties of the learned latent space by a VAE through choosing a small number of subcarriers with only distinct occupancy patterns. The number of latent variables in FactorVAE is set to , , , where a larger indicates a higher SNR for a fixed noise variance . Binary phase-shift keying (BPSK) is utilized as the modulation technique at the transmitter. Also, complex samples are collected from each signal (16) for building the dataset. We have done latent traversal on a trained VAE for latent variables, six of which are depicted for instance in Fig. 8, where the input signal to the VAE is a signal from Case depicted in Fig. 6. We observe that the VAE only encodes information in distinct latent variables , , , and , which are called informative latent variables and control the amount of power in distinct active subcarriers depicted in Fig. 6. The other variables are uninformative (e.g., ) that carry no information about as changing them does not have any effect on the reconstructed sample. As we chose , we note that the learned representations lie within a continuous space whose dimensions capture the amount of power in different subcarriers as suggested by Fig. 8. Specifically, changing an informative latent variable could result in generating an output by the decoder that belongs to a different case than that of . For example, by changing (and fixing the other variables), a signal from Case can be generated while the input signal belongs to Case . One can interpret this as the decoder is changing the amount of power in subcarriers and through changing . Similarly, by changing a signal from Case can be generated. This is illustrated in Fig. 6 for the whole space of , which shows how the VAE exploits the subcarrier occupancy pattern for finding a continuous representation space that covers all signals in the training dataset.

Fig. 5: Three different band allocations based on which we have generated the NC-OFDM signals in the dataset. An inactive subcarrier is denoted by dashed lines.
Fig. 6: VAE maps the NC-OFDM signals to a continuous space . Latent traversal for is shown by an arrow where and belong to the cases denoted by the arrow’s tail and tip, respectively.

Next, we consider NC-OFDM signals with random occupancy pattern vectors where active subcarriers are chosen in a random fashion, i.e., each element in is or with probability . Similar to the structured case, and BPSK is used as the modulation technique. We consider the total number of subcarriers to be ( different subcarrier occupancy patterns), number of latent variables and . Latent traversal, in this case, shows that informative latent variables control the amount of power in exactly one subcarrier in this case (Fig. 8). As an example, one can see that changing results in changing the power in the th subcarrier of . In other words, represents the relative amount of power in the th subcarrier. A graph similar to Fig. 6 can be sketched for this case as well where there are different band allocations. We note that out of latent variables correspond to different subcarriers, and the remaining variables are uninformative. Therefore, one can see that VAE is capable of learning a representation where each latent variable corresponds to the relative amount of power in a unique subcarrier.

Fig. 7: Latent traversal for a dataset of NC-OFDM signals based on Fig. 6 where belongs to Case . FFTs ( points) of the reconstructed signals are depicted. X-axis denotes FFT bin index, and y-axis is the magnitude of FFT.
Fig. 8: Latent traversal for a dataset of NC-OFDM signals with random band allocation. FFTs (16 points) of the reconstructed signals are depicted. X-axis denotes FFT bin index, and y-axis is the magnitude of FFT.

Iv-B VAEs for PHY spoofing

We now study how VAEs can be leveraged to compromise the PHY security in NC-OFDM/OFDM systems. As shown in Section IV-A, VAEs can learn important information about the NC-OFDM transmission parameters in an unsupervised manner. Specifically, a VAE extracts the following information about the NC-OFDM signals:

  • Subcarrier occupancy pattern: The learned representation corresponding to a subcarrier (correspondence can be found by latent traversal) that is inactive in a data sample is close to zero as the power sent on that subcarrier is zero. Therefore, subcarrier occupancy pattern can be inferred by applying a threshold test to the learned representations (threshold value depends on the SNR of the received signals and is estimated by cross-validation).

  • Modulation scheme

    : Through utilizing different modulation schemes in NC-OFDM signals, we have observed that the learned VAEs allocate one (two) latent variable(s) to each subcarrier when real (complex) symbols are being sent through each subcarrier. This can be justified by noting that VAEs treat the real and imaginary parts of a symbol transmitted through each subcarrier as distinct generative factors of the data and allocate separate latent variables to capture each one. This fact can be utilized to distinguish between modulation schemes that use real versus complex symbols (e.g., BPSK versus QAM). Also, as higher-order modulations send more power through each subcarrier, latent traversal can be utilized to distinguish them from lower-order ones. However, identifying the modulation scheme, in general, may need resorting to specific classifiers as discussed in

    [21].

  • Total number of active subcarriers: As VAEs treat the real and imaginary powers in active subcarriers as generative factors of the dataset, the total number of active subcarriers in the whole dataset amounts to the number of informative latent variables when real symbols are being sent through each subcarrier (like in BPSK modulation), and is half this number when complex symbols (like in QAM or PSK modulation) are being used.

For example, for the random allocation case considered in Fig. 8, we have observed that one latent variable encodes each subcarrier, which means BPSK is used as modulation scheme, and there are informative latent variables, which indicates the number of subcarriers is . It is shown in section VII that VAEs are agnostic to the true signal bandwidth in these inferences. In other words, as long as signals are sampled above Nyquist rate and stored in the dataset, the rate at which an adversary is sampling the received signals doesn’t matter. This is particularly important from an adversary point of view because as pointed out in [13], it is a major hurdle to estimate the bandwidth in the case of NC-OFDM signals.

We now incorporate these findings with the system model described in Section II, which assumes an AWGN channel between the Tx and the adversary. Here, the adversary only has access to a corrupted version of , denoted by . As the training data samples are noisy in this case, we propose the following change to the original VAE objective function (7):

(17)

where we have weighted the reconstruction loss with a constant which is inversely proportional to SNR of the received signals and is obtained using cross-validation. This would lessen the reconstruction penalty and mitigate the effect of noise in the reconstructed signals by allowing VAEs to generate samples that are different than the input noisy ones. We note that the resulting bound remains a lower bound to the evidence . We have seen through our experiments in Section VII that this greatly improves the spoofing performance (which depends on the accuracy of the learned representations) in the case of noisy samples. After training a VAE on the dataset of received signals, during the test stage, the adversary inputs a signal to the trained VAE and estimates the aforementioned transmission parameters for PHY spoofing.

V PHY spoofing via supervised learning

In this section, we assume the adversary has access to true labels for each data sample (i.e., a noisy received NC-OFDM/OFDM signal) in the dataset and makes use of them in the training stage. The adversary trains fully connected DNNs to estimate transmission parameters. Similar to the unsupervised spoofing, we assume that the adversary builds up a dataset out of the samples of the received noisy signals in (2) to extract (complex) samples using (16). Fig. 9 illustrates two DNNs utilized for estimating transmission parameters, where the specifications of the DNN corresponding to the estimation of and are:

Fig. 9: Block diagrams of two DNNs used for estimating transmission parameters by the adversary.
  • Input: .

  • Output: Estimated total number of subcarriers and estimated subcarrier width, i.e., .

  • Architecture: Six fully connected layers, four of which are hidden layers with , , and

    neurons, respectively. The activation function for all the layers is chosen to be rectified linear unit (ReLU) function.

  • Training: We minimize the -loss , where and are the true parameters. Learning rate is set to .

The properties of the DNN utilized to infer are described as follows:

  • Input: The same input vector described for the above DNN.

  • Output: Estimated subcarrier occupancy pattern vector .

  • Architecture: There are six fully connected layers, four of which are hidden layers. The number of neurons in the hidden layers are , , , and

    , respectively. The activation function for all the layers is chosen to be ReLU function except for the output layer, where the sigmoid function is used. The adversary further converts the output values to

    ’s and ’s using a hard-thresholding function defined by

  • Training: The DNN is trained by minimizing where is the true subcarrier occupancy pattern.

We investigate the performance of the adversary when Tx is transmitting signals through different types of subcarrier occupancy patterns. ) A single contiguous block of active subcarriers (OFDM signal). ) NC-OFDM signal whose band allocation is illustrated in Fig. 11 where integer denotes the number of inactive subcarriers between active ones and integer denotes the length of a block of contiguous active subcarriers. We refer to it as Pattern in the following. For training and test purposes, we generate signals of this type with in range and in range where the location of in the band is considered to be random. ) NC-OFDM signal whose band allocation is illustrated in Fig. 11 and is referred to as Pattern . Here, there are two blocks of contiguous active subcarriers of length , which is in range , and three different interleaved factors , and , all belonging to the range . ) NC-OFDM signal where the bands are allocated in a random fashion without any specific pattern. In other words, we assume the transmitter flips a coin to decide whether a subcarrier is active or inactive.

The adversary receives signals of one of the above types at a certain SNR (spoofing SNR) during the training stage and trains the aforementioned DNNs on the corresponding dataset (one of the four datasets of signals). Then, it utilizes the trained model in the test stage to estimate the transmission parameters and PHY spoofs an unknown signal that has the same occupancy pattern type as in the training dataset. In this way, we are able to study how the choices of occupancy patterns affect spoofing performance of the adversary. The number of signals in the training and test datasets for each case is set to and , respectively. As mentioned in Section II, since the Tx may change the transmission parameters (particularly ) while transmitting an NC-OFDM symbol, the Rx also needs to infer these parameters, which can be done via DNNs as described above for the adversary. This is a fair model as the only advantage Rx has over the adversary is a better receiving channel. As justified in Section II, BER at Rx (corresponding to decoding the bogus data) is used as the metric to measure the spoofing performance of the adversary. Next, we discuss how the idea of supervised learning can be utilized to solve the problem introduced in Example 1.

Fig. 10: Subcarrier occupancy pattern ; location of the contiguous block () is random.
Fig. 11: Subcarrier occupancy pattern ; locations of the contiguous blocks are random.

V-a Utilizing supervised learning to solve Example 1

We discussed in section III how CAF-based analysis fails to infer parameters of a simple NC-OFDM signal. Here, we now consider the same problem described in Example 1 again, and utilize a DNN to solve it in a supervised manner. Here, as the underlying dataset is simpler (there are only different transmission cases) in comparison to the problem described in the previous section, we are able to solve the problem with a much simpler model. Specifically, we consider two architectures for the DNN. The first one has a hidden layer with neurons, and the second one has hidden layers of , and neurons per each layer. The number of neurons at the input layer in both DNNs is set to . The properties of the DNN which estimates the set of parameters are as follows.

  • Input: Samples of the received signal (similar to the previous DNNs).

  • Output: Estimated parameters .

  • Training: We minimize the -loss , where are the true parameters.

As another approach, we consider a DNN which classifies between the three different transmission cases introduced in Table I. The specifications for this DNN are as follows.

  • Input: Samples of the received signal (similar to the previous DNNs).

  • Output: where denotes the estimated probability that the inputs correspond to case , and .

  • Training: We minimize the cross-entropy loss between and , i.e., where represents the true probability that the signal belongs to case .

For solving this problem, we consider training and test sets of size and , respectively, which consist of the signals corresponding to the three transmission cases in Table I at spoofing SNR of dB. The performances of these DNNs are presented in Section VII, showing that they are indeed capable of estimating the transmission parameters that CAF had failed to infer.

Vi Disentanglement metrics and useful representations

Despite its importance, measuring the usefulness of the learned representation in a VAE model is not yet a well-studied subject. This is partly due to the context-dependent nature of the problem that makes it challenging for researchers to reach a consensus on this matter. As mentioned in Section IV, disentanglement is one of the features that is particularly desirable for the learned representations in a variety of applications. In this section, we characterize the properties of a disentangled representation by utilizing/extending the ideas from ML literature, describe why they are useful, and devise appropriate metrics to measure the disentanglement performance in the case of NC-OFDM datasets. Recall that [24] assumes a finite number of generative factors are used to generate the data samples in the dataset, where each factor could represent notions like position, scale, rotation, etc., in the case of image samples, or amount of power sent over distinct subcarriers for NC-OFDM/OFDM signals (see Section IV-A). Then, [24] calls the learned representations disentangled if changes in one of the generative factors of the data are mirrored by the encoder in Fig. 4 in exactly one of the latent variables (i.e., one dimension of the latent space). We describe such encoder to function in a disentangled manner. By extending this definition, we call the learned representations disentangled if decoder functions in a disentangled manner as well, i.e., it employs only one latent variable for each generative factor during the reconstruction process. In the remaining of this section, we discuss how to measure the disentangled performance of the encoder and decoder in VAEs utilizing existing and newly proposed metrics.

In order to measure the performance of the encoder in achieving the disentangled representations, we will utilize the metrics proposed by Haggin et al. in [26] and by Kim et al. in [24]. The authors in [26] introduce a quantitative metric for measuring disentanglement assuming generative factors of dataset are known. This metric works as follows. Choose a data sample and obtains its corresponding representation via the encoder of a trained VAE. Then, generate data samples by fixing value of a generative factor () while changing values of the other factors uniformly at random in an interval (e.g. ). Next, find their corresponding learned representations , and use and the fixed factor’s index () as a training sample to train a linear classifier where is treated as label. Then, the correct classification rate achieved by this classifier on a test set (generated in the same way as the training set) is reported as the metric. A more stringent metric for disentanglement is proposed in [24], which works as follows. Choose a generative factor (), generate data with this factor fixed while the others’ values are changed randomly, and obtain their corresponding learned representations . Then, normalize each dimension of

, by its empirical standard deviation computed over

to obtain normalized representations . The empirical variance for an arbitrary real-valued vector of length is defined as

(18)

Then, compute the empirical variances of each dimension in , which are denoted by . Afterward, the index of the dimension with the lowest variance () and the fixed factor index provide one training sample for the classifier. The performance of this classifier (measured between and ) on the test set is reported as the final metric.

Next, we evaluate the performance of the VAE’s decoder in achieving disentangled representations utilizing latent traversal method described in Section IV-A. In the case of disentangled representations, each latent variable governs a specific generative factor in the reconstructed sample through the decoder. Algorithm 1 shows the pseudo-code for our proposed metric, which outputs a decimal value between and . Specifically, this algorithm changes the value of the -th latent variable (corresponding to the input data sample ) in range with steps of while fixing the others, obtains the element-wise difference between FFT of the reconstructed samples and FFT of , and counts the number of subcarriers in which the magnitude of the difference is more than a predefined threshold . If there is only one subcarrier, it represents a disentangled latent variable and gets a perfect score (). If there is more than one, then gets . This would be done for all the latent variables, and different data samples {. Then, the final metric equals the averaged score over all the data samples and the number of informative latent variables (denoted by in Alg. 1). We have used (larger values did not change the result noticeably) and two different values of , i.e., and in our experiments.

We consider a VAE to have learned disentangled representations if it achieves score for both Kim’s metric [24] and the proposed latent traversal metric (Alg. 1). Such a VAE is well-suited for a dataset of NC-OFDM signals where the amount of power is independent in different subcarriers. More specifically, as VAEs encode these power in the latent space , the corresponding learned representation is expected to be disentangled. Therefore, designing a VAE model that captures this important property of the dataset enables one to find accurate representations and improve the performance of the unsupervised PHY spoofing described in Section IV.

Output: Disentanglement metric (out of ) and , the number of informative latent variables.
Input: A VAE model (, ), dataset , number of latent variables , traversing limit , traversing step , averaging factor , precision factor .
for  to  do
        Pick data point (), ,
        while  do
               mean of zero-vector,
               for  to  do
                      , zero-vector
                      mean of
                      Element-wise magnitude of , length of
                      if for
                     
                     
               end for
               Number of non-zero elements in
               if  then
                     
              else if  then
                      ,
                     
              
        end while
       
end for
Algorithm 1 Latent traversal for a dataset of NC-OFDM signals

Vii Numerical examples

In this section, we first present the result of numerical simulations that characterize the performance of an adversary utilizing unsupervised and supervised learning algorithms for PHY spoofing as discussed in Sections IV and IV. Then, we provide our results and insights on disentanglement metrics presented in Section VI. We begin with investigating the performance of the supervised learning based on DNNs described in Section V-A for spoofing the system introduced in Example 1. This is demonstrated in Fig. 12 where two different DNN structures are considered for each classification and estimation scenario. Note that the -axis represents correct classification probability for classification curves, and -loss for the estimation problems on the test set. Specifically, it is shown that for the classification problem, even using one hidden layer enables us to identify the true class of the test signals after around training steps with very high probability. In each training step, mini-batches of size are chosen from the training set and SGD algorithm with a learning rate of is applied to train the DNNs. Furthermore, if the adversary wishes to estimate the true parameters of the signal (i.e., ), average -losses as low as are achievable using DNN estimator with only one hidden layer. We also observe that a higher number of layers results in better performances in both cases.

Fig. 12: Performance of DNNs described in Section V-A in solving Example 1.

Now, we consider the system model described in Section II where the corresponding PHY parameters between different communication parties follow the narrowband (NB)-IoT [11, 28] standard that is widely used in the existing IoT solutions [11]. Specifically, the transmission is assumed to take place over an NC-OFDM scheme with KHz and utilizing BPSK modulation with random subcarrier occupancy pattern using the total number of subcarriers and , which give rise to and distinct subcarrier occupancy patterns, respectively (see Section IV-A). Adversary overhears the transmissions at a certain spoofing SNR and builds up a dataset out of the received noisy signals where complex samples are collected from each signal. The size of training and test dataset is set to and , respectively. Fig. 14

illustrates the spoofing performance for several supervised and unsupervised learning algorithms. For the unsupervised cases, we assume the adversary is utilizing the FactorVAE model with

. Then, it infers the total number of subcarriers and the corresponding latent variable for each subcarrier via latent traversal as described in Section IV. During the test stage, it obtains the corresponding learned representation (mean of ) for a test signal and decides whether a subcarrier is active or inactive as discussed in Section IV. For the supervised cases, the specifications of the DNNs are presented in Section V. Fig. 14 shows that supervised algorithm offers better spoofing performances in general as it relies on the ground truth labels. Note also that the higher number of subcarriers ( vs. ) negatively affects the performance of the adversary in both cases. Also, as the spoofing SNR increases, spoofing performance improves as well. Specifically, if the spoofing SNR is dB, the supervised spoofing performance can get very close to the baseline transmission (the best spoofing performance). Furthermore, it is shown that the adversary can spoof the signal to certain degrees via VAEs if the spoofing SNR is high (e.g.,

dB). We note that VAE’s performances do not rely on any supervision or labels about the signals in the dataset, and they can be improved upon by having partial availability of the labels and further processing on the learned representations, which demands, a semi-supervised learning algorithm. This needs further investigation and is left for future studies. We note that as mentioned in Section

II, Rx also needs to estimate these parameters in order to be able to decode the legitimate data sent by Tx which is done utilizing supervised DNNs described above for the adversary. SNR of the Tx-Rx channel is assumed to be higher than spoofing SNRs in order to ensure Rx is able to estimate the parameters before adversary sends bogus data to Rx. Specifically, SNR of the Tx-Rx channel is assumed to be and dB for and in Fig. 14, respectively, which enables the Rx to successfully estimate transmission parameters using DNNs.

Fig. 13: The BER at the receiver while decoding the bogus data sent by the adversary which spoofs the signals with at different SNRs via unsupervised (VAE) or supervised learning. The baseline transmission represents the performance of the legitimate Tx and Rx.
Fig. 14: The BER at the receiver while decoding the bogus data sent by the adversary which utilizes supervised learning for spoofing the NC-OFDM signals with and at different SNRs.

Next, we consider an NC-OFDM transmission with a higher number of subcarriers where -QAM modulation is utilized by the Tx. We investigate the performance of the adversary utilizing supervised learning algorithm for spoofing in such a system, and assume it collects samples from each received signal based on (16) at different spoofing SNRs. As described in Section V, we consider different occupancy patterns (OFDM, Pattern 1 in Fig. 11, Pattern 2 in Fig. 11 and random allocation) to investigate their resilience against DNN PHY spoofing. Fig. 14 demonstrates the performance of the adversary in different scenarios based on different spoofing SNRs and occupancy pattern vectors. For OFDM scenario, the adversary is able to infer the occupancy pattern without error and can achieve the same performance as the baseline transmission. For Pattern NC-OFDM signal, the DNNs are trained and tested at different spoofing SNRs of , and dB, where as expected the higher spoofing SNRs result in better spoofing performances. Comparing to the baseline transmission, there is a gap in the performances since the parameters are estimated with error. The baseline transmission corresponds to the case where the parameters are fully known at the Rx and can be achieved in this case when the Tx-Rx SNR is dB. For Pattern , one can see that the performance is worse in comparison to Pattern as the Tx is using a more complex subcarrier occupancy pattern with a higher number of distinct band allocations. Also, the Tx-Rx SNR is dB in this case. Therefore, one can conclude that Pattern NC-OFDM signals are more difficult to spoof. For random occupancy case, it is shown that BER is at spoofing SNR of dB for wide range of ’s corresponding to adversary-Rx channel, which indicates that the adversary is unable to spoof such signals at SNR dB (or lower) because of the high estimation errors at the output of the DNNs. As a result, the PHY is considered to be secure when spoofing SNR is lower than dB while Tx-Rx SNR is set to dB in order to ensure reliable communication in the meantime. Fig. 14 also demonstrates the spoofing performance for the case of random occupancy patterns with . Due to the larger number of possible subcarrier occupancy patterns, the Rx is unable to estimate the transmission parameters via the Tx-Rx channel correctly even when the Tx-Rx SNR dB for this case which precludes achieving a reliable communication. In other words, although this scheme prevents the adversary from PHY spoofing, it also fails to ensure reliable communication for legitimate Tx and Rx pairs.

Next, we have evaluated the metrics described in Section VI in Table II for different VAE models trained on a dataset of NC-OFDM signals with and random subcarrier occupancy patterns where BPSK modulation with is used. The number of latent variables is set to and the encoder/decoder is modeled with feed-forward DNNs described in Section IV-A. Table II shows that the performance of the encoder and decoder in terms of disentanglement may vary greatly. In other words, metrics in [26, 24] do not guarantee a disentangled representation in the case of latent traversal. For example, for the DIP-VAE models described in Section IV (with parameters and ), one can see that although they achieve perfect scores () for the metrics proposed in [26, 24], they perform poorly on the metric based on the latent traversal. This is illustrated in Figs. 16 and 16. It is clear that the learned representation could not be considered disentangled since changing one latent variable affects the values of the others. Reconstruction error is also reported in Table II, which represents the ability of the VAE model to reconstruct the input signal , and is defined as (element-wise subtraction) where denotes the dimension of .

Fig. 15: FFT of the reconstructed signals in DIP-VAE () obtained under latent traversal. denotes the fixed latent variable. X-axis denotes FFT bin index, and y-axis is the magnitude of FFT.
Fig. 16: FFT of the reconstructed signals in DIP-VAE model () obtained under latent traversal. X-axis denotes FFT bin index, and y-axis is the magnitude of FFT.
Method Parameters
Haggin’s
metric [26]
Kim’s
metric [24]
Alg. 1 metric
()
Alg. 1 metric
()
Reconst.
error
-VAE [26] 100 100 98.25 100 6.5e-2
-VAE 99 92 99.75 100 8.3e-2
DIP-VAE [22]
DIP-VAE
DIP-VAE
Factor-VAE [24]
Factor-VAE
Factor-VAE
InfoVAE [25] ,
InfoVAE ,
InfoVAE ,
TABLE II: Disentanglement metrics and reconstruction loss computed for different VAE models

As discussed in Section VI, we say a VAE model results in perfectly disentangled representations if it achieves perfect scores for both Kim’s metric [24] and the latent traversal metric (Alg. 1). There, we also argued how such a VAE is particularly relevant for PHY spoofing NC-OFDM/OFDM signals where the amount of power is independent in different subcarriers. Specifically, the Kim’s metric ensures that a specific learned representation (by the encoder) corresponding to a generative factor is not affected by the changes in other generative factors. The latent traversal metric implies that the learned representations corresponding to different generative factors are disjoint. This can be seen by comparing the learned representations for two different VAEs: DIP-VAE with () and FactorVAE with whose latent traversal performance is demonstrated in Figs. 16 and 8, respectively. First, it should be noted that only in FactorVAE model, which achieves perfectly disentangled representations, a latent variable corresponds to exactly one subcarrier via the decoder. For these VAEs, we also have illustrated the learned representations corresponding to two different latent variables via a D space in Figs. 18. The VAE’s decoder maps each point in this space to a signal whose corresponding power in two different subcarriers follows the values of these two latent variables. Both models achieve perfect scores on Kim’s metric [24]. However, one can see that the learned representations for the case of FactorVAE in Fig. 18 are disjoint while they overlap for DIP-VAE in Fig. 18. In fact, this is the reason why DIP-VAE in Fig. 16 gets a low score on the latent traversal metric. By changing the value of one of the latent variables (e.g. ) at a time, a signal can be generated whose power changes in more than one subcarrier ( and ). On the other hand, in a perfectly disentangled latent space, the learned representations that get mapped to signals with different power levels in each subcarrier form a disjoint region as illustrated in Fig. 18. Specifically, one can see that the representations corresponding to a signal with specific power levels in different subcarriers form a region in space. Having disjoint representations facilitates downstream learning tasks like classification as the data samples generated by different generative factors get mapped to distinct regions in the latent space.

Fig. 17: The learned representations and for FactorVAE model () which control the amount of power in subcarriers and ( and ) of the reconstructed signal, respectively, .
Fig. 18: The learned representations and for DIP-VAE model () which control the amount of power in subcarriers and ( and ) of the reconstructed signal, respectively,

Viii Conclusions

IoT devices have a limited amount of power and processing capabilities, which precludes implementing a full-blown security protocol stack for them. This makes IoT devices, which widely rely on multicarrier communications in the PHY layer, susceptible to PHY spoofing attacks. We have investigated the PHY robustness of NC-OFDM/OFDM system against an adversary equipped with machine learning tools. Specifically, we have assumed the adversary employs supervised and unsupervised learning algorithms to infer some of the NC-OFDM transmission parameters and physically spoof the system. The proposed unsupervised algorithm utilizes VAEs for spoofing, which can infer important spectral information about the NC-OFDM/OFDM signals. Furthermore, we have characterized the properties of the learned representations by the VAEs and proposed a new metric to evaluate their performances. Numerical results demonstrate that the PHY spoofing performance highly depends on the subcarrier occupancy pattern used by the transmitter. Specifically, the results suggest that the transmitter should randomize the selection of the active subcarriers in order to impede PHY spoofing attacks utilizing DNNs.

References

  • [1] Y. Zou, J. Zhu, X. Wang, and L. Hanzo, “A survey on wireless security: Technical challenges, recent advances, and future trends,” Proceedings of the IEEE, vol. 104, no. 9, pp. 1727–1765, Sep. 2016.
  • [2] J. Russell and R. Cohn, Non-Access Stratum.   Book on Demand, 2012. [Online]. Available: https://books.google.com/books?id=9XDGMgEACAAJ
  • [3] P. L. Yu, J. S. Baras, and B. M. Sadler, “Physical-layer authentication,” IEEE Transactions on Information Forensics and Security, vol. 3, no. 1, pp. 38–51, March 2008.
  • [4] S. Rezaei Aghdam, A. Nooraiepour, and T. M. Duman, “An overview of physical layer security with finite-alphabet signaling,” IEEE Communications Surveys Tutorials, vol. 21, no. 2, pp. 1829–1850, Secondquarter 2019.
  • [5] T. Daniels, M. Mina, and S. F. Russell, “A signal fingerprinting paradigm for general physical layer and sensor network security and assurance,” in Proc. First International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM’05), Sep. 2005, pp. 219–221.
  • [6] S. Misra, A. Ghosh, A. P. S. P., and M. S. Obaidat, “Detection of identity-based attacks in wireless sensor networks using signalprints,” in Proc. 2010 IEEEInt’l Conference on Green Computing and Communications, Dec 2010, pp. 35–41.
  • [7] Y. Chen, W. Trappe, and R. P. Martin, “Detecting and localizing wireless spoofing attacks,” in Proc. 2007 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, June 2007, pp. 193–202.
  • [8] L. Xiao, L. Greenstein, N. Mandayam, and W. Trappe, “Fingerprints in the ether: Using the physical layer for wireless authentication,” in Proc. 2007 IEEE International Conference on Communications, June 2007, pp. 4646–4651.
  • [9] L. Xiao, L. J. Greenstein, N. B. Mandayam, and W. Trappe, “Channel-based spoofing detection in frequency-selective rayleigh channels,” IEEE Transactions on Wireless Communications, vol. 8, no. 12, pp. 5948–5956, December 2009.
  • [10] L. Xiao, A. Reznik, W. Trappe, C. Ye, Y. Shah, and N. Mandayam, “Phy-authentication protocol for spoofing detection in wireless networks,” in Proc. 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, Dec 2010, pp. 1–6.
  • [11] R. S. Sinha, Y. Wei, and S.-H. Hwang, “A survey on LPWA technology: LoRa and NB-IoT,” ICT Express, pp. 14 – 21, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2405959517300061
  • [12] R. Rajbanshi, A. M. Wyglinski, and G. J. Minden, “An efficient implementation of NC-OFDM transceivers for cognitive radios,” in Proc. 2006 1st International Conference on Cognitive Radio Oriented Wireless Networks and Communications, June 2006, pp. 1–5.
  • [13] G. Sridharan, R. Kumbhkar, N. B. Mandayam, I. Seskar, and S. Kompella, “Physical-layer security of NC-OFDM-based systems,” in Proc. 2016 IEEE Military Communications Conference, Nov 2016, pp. 1101–1106.
  • [14] A. Nooraiepour, K. Hamidouche, W. U. Bajwa, and N. Mandayam, “How secure are multicarrier communication systems against signal exploitation attacks?” in Proc. 2018 IEEE Military Communications Conference (MILCOM), Oct 2018, pp. 201–206.
  • [15] A. Punchihewa, Q. Zhang, O. A. Dobre, C. Spooner, S. Rajan, and R. Inkol, “On the cyclostationarity of OFDM and single carrier linearly digitally modulated signals in time dispersive channels: Theoretical developments and application,” IEEE Transactions on Wireless Communications, vol. 9, no. 8, pp. 2588–2599, August 2010.
  • [16] M. Bouanen, F. Gagnon, G. Kaddoum, D. Couillard, and C. Thibeault, “An LPI design for secure OFDM systems,” in Proc. 2012 IEEE Military Communications Conference, Oct 2012, pp. 1–6.
  • [17] Z. E. Ankaral, M. Karabacak, and H. Arslan, “Cyclic feature concealing cp selection for physical layer security,” in Proc. 2014 IEEE Military Communications Conference, Oct 2014, pp. 485–489.
  • [18] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Transactions on Cognitive Communications and Networking, vol. 3, no. 4, pp. 563–575, Dec 2017.
  • [19] H. Ye, G. Y. Li, and B. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, Feb 2018.
  • [20] S. Dörner, S. Cammerer, J. Hoydis, and S. T. Brink, “Deep learning based communication over the air,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 132–143, Feb 2018.
  • [21] M. Z. Hameed, A. Gyorgy, and D. Gunduz, “Communication without interception: Defense against deep-learning-based modulation detection,” Feb. 2019, Available: https://arxiv.org/abs/1902.10674/.
  • [22] A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” 2018. [Online]. Available: https://openreview.net/forum?id=H1kG7GZAW
  • [23] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in ICLR, 2014.
  • [24] H. Kim and A. Mnih, “Disentangling by factorising,” in Proc. of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80.   Stockholmsmässan, Stockholm Sweden: PMLR, 10–15 Jul 2018, pp. 2649–2658. [Online]. Available: http://proceedings.mlr.press/v80/kim18b.html
  • [25] S. Zhao, J. Song, and S. Ermon, “InfoVAE: Information maximizing variational autoencoders,” ArXiv, vol. 706.02262, 2017.
  • [26] C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, “Understanding disentangling in -VAE,” CoRR, vol. abs/1804.03599, 2018.
  • [27] X. Nguyen, M. J. Wainwright, and M. I. Jordan, “Estimating divergence functionals and the likelihood ratio by convex risk minimization,” IEEE Transactions on Information Theory, vol. 56, no. 11, pp. 5847–5861, Nov 2010.
  • [28] J. S. D. Rohde, “Narrowband internet of things,” Aug. 2016, Available: https://www.rohde-schwarz.com/us/applications/narrowband-internet-of-things-application-note_56280-314242.html/.