Analysis of detected radio signals enables classification of communication technology and modulation schemes employed by the source that emitted the signals; this information helps optimize spectrum allocation and mitigate radio interference, supports wireless environment analysis and enables improvement of communication efficiency. However, increase in the numbers of emitter types and sources of interference, as well as temporal variations in the effects of wireless environment on the transmitted signals, render the accurate inference of communication schemes and emitter types computationally challenging.
Existing methods for modulation and technology classification can be organized into two sub-groups, likelihood-based and feature-based 
. Likelihood-based methods make a decision by evaluating a likelihood function of the received signal and comparing the likelihood ratio with a pre-defined threshold. Although the likelihood-based classifiers are optimal in that they minimize the probability of false classification, they suffer from high computational complexity
. On the other hand, feature-based approaches are relatively simple to implement and may achieve near-optimal performance but the features and decision criteria need to be carefully designed. Such methods rely on expert features including cyclic moments and their variations , and spectral correlation functions of analog and digital modulated signals [7, 6];  describes novel decision criteria which utilize pre-existing expert features. 
facilitates classification via a multilayer perceptron that relies on spectral correlation functions. Expert systems have been shown to achieve high accuracy on certain special tasks but may be challenging to apply in general settings since the crafted features may not fully reflect all the real-world channel effects. As an alternative, deep learning based methods that learn directly from the received signals have recently been proposed. In particular,
utilizes a convolutional neural network (CNN) that operates on in-phase and quadrature-phase (IQ) data and outperforms expert features based methods.
combines CNN and long short-term memory (LSTM)[11, 8] to further improve classification accuracy.  utilizes an LSTM on amplitude and phase data by simply transferring IQ data for modulation classification, outperforming the proposed model in .  proposes two classification models with one adapting the Visual Geometry Group (VGG) architecture  principles to a 1D CNN and the other utilizing the ideas of deep residual networks (RNs) . Note that while spectrum monitoring devices are capable of acquiring detailed wireless signal’s IQ components, storage of such data on distributed sensing devices or their transmission to a cloud or edge device for processing is often infeasible due to resource constraints. To this end, distributed spectrum monitoring systems such as Electrosense  formulate technology detection task as the classification that uses more compact Power Spectral Density (PSD) data as features. Note, however, that the aforementioned deep learning architectures are infeasible for use in distributed settings and on low-cost computational platforms. More details on practical aspects of RF acquisition can be found in .
In this paper, we propose a new learning framework for both modulation as well as technology classification problems based on an LSTM denoising auto-encoder. The framework aims to estimate posterior probabilities of the modulation or technology types using time domain amplitude and phase of a radio signal. Auto-encoders in an unsupervised manner learn a low-dimensional representation of data; more specifically, they attempt to perform a dimensionality reduction while robustly capturing essential content of high-dimensional data. Typically, auto-encoders consist of two blocks: an encoder and a decoder. The encoder converts input data into the so-called codes while the decoder reconstructs the input from the codes. The act of copying the input data to the output would be of little interest without an important additional constraint – namely, the constraint that the dimension of codes is smaller than the dimension of the input. This enables auto-encoders to extract salient features of the input data. A denoising auto-encoder (DAE)  can help extract stable and robust features by introduction noise corruption to the input signal. In our proposed framework, the received radio signals are first partially corrupted and the framework then recovers the destroyed signals, simultaneously learning stable and robust low-dimensional signal representations and classifying the signals based on the learned features.
Our main contributions are summarized as follows:
We propose a new learning framework which uses amplitude and phase data for modulation classification; the framework is based on an LSTM denoising auto-encoder and achieves state-of-the-art modulation classification accuracy.
We extend the proposed framework to technology classification using power spectral density data.
The proposed framework achieves significantly higher top-1 classification accuracy while having much simpler structure than the existing models. This enables real-time modulation and/or technology classification on compact and affordable computational devices, as we demonstrate using Raspberry PI platforms.
Ii-a Problem Formulation
Let denote a sequence of -dimensional features characterizing samples of the received radio signal sampled starting at time . The goal of modulation (technology) classification is to identify the modulation (technology) type of the radio signal among -classes by estimating where denotes the class and is the true class of the signal.
For modulation classification, the features are the IQ components of the sampled signal (i.e., ). Figure 1
shows examples of the IQ components for 11 different modulation types found in RadioML2016.10A dataset for signal-to-noise ratio (SNR) ofdB. Although there are differences between the IQ components, it is challenging even for a domain expert to distinguish between them due to pulse shaping, distortion and other channel effects .
For technology classification, the spectrum of interest is scanned by selecting a candidate carrier frequency
in discrete increments, and for each such frequency a fast Fourier transform (FFT) of the received signal demodulated into baseband is computed. Average values of the FFT coefficients computed for eachare then concatenated to form a sequence of features used to perform the classification task. For the Electrosense data that we analyze in this paper, and the scanning resolution is MHz. Figure 2 shows an example of wireless magnitude spectrum data from one of the Electrosense sensors.
To characterize the performance of modulation and technology classification methods, we rely on top-1 classification accuracy over SNR, confusion matrix, time and space complexity in terms of the number of trainable parameters and model size, and testing time on Raspberry Pi.
Ii-B An LSTM Denoising Auto-Encoder
In this section, we describe the design of our proposed classifier based on a denoising auto-encoder and recurrent neural networks. Instead of using IQ components, for modulation classification we rely on L2-normalized amplitude and normalized phase (falling between -1 and 1, in radians); such normalization benefits learning temporal dependencies.
A sampled radio signal results in a time series, and an LSTM is utilized to efficiently capture temporal structure of such a series. Figure 3 shows the structure of an LSTM cell with a forget gate. The input gate, output gate and forget gate can be expressed respectively as
while the cell state vector and hidden state vector are defined as
denotes the sigmoid function (i.e.,), denotes a weight matrix for the input time series, is a weight matrix for the hidden state vector, and
represents a bias vector.
The denoising auto-encoder corrupts the signal by randomly setting a portion of samples of to , thus obtaining a partially destroyed signal . The partially destroyed signal is fed to the auto-encoder for training while the original signal is utilized for testing.
Figure 4 shows the LSTM denoising auto-encoder classifier. The classifier is connected to the last hidden state vector and it consists of 3 fully connected layers followed by a softmax function, i.e.,
where denotes the weight matrix, is the bias vector, denotes the output of a fully connected layer for classification ( represents its entry), denotes the probability of predicting as the class andinto hidden state vectors while a shared fully connected layer operate as a decoder, i.e.,
where is the recovered sample, denotes the weight matrix of the decoder and is the bias vector for the fully connected layer. Note that we break the symmetry of the architecture by using a shared fully connected layer for the decoder since doing so reduces computational complexity.
Therefore, the loss function of the network consists of the reconstruction loss,, and the classification loss, . The final loss function is a weighted combination of these two terms, i.e.,
is a hyperparameter balancingand . It is worth pointing out that a small eliminates the effects of classification layers while a large distorts the learned representation of data. We set the value of to 0.1 to promote extraction of reliable low-dimensional representations of the original signals and thus enable efficient classification with reduced dimensionality of the hidden state of an LSTM cell. This allows the proposed model to achieve higher classification accuracy at a significantly reduced computational complexity. The reconstruction loss is defined to be the mean-squared error (MSE) and can be expressed as
while the classification loss is defined to be the categorical cross entropy
where if belongs to the class and otherwise.
Ii-C Model Parameters
For both tasks, we rely on Adam optimizer 
since it helps avoid local optima. The dimensionality of the hidden states of the LSTM in our denoising autoencoder is set to. Please note that prior LSTM-based methods require more than 128 hidden states to achieve desired level of accuracy; otherwise, the classification accuracy deteriorates significantly as shown in . The number of nodes in the dense layer of decoder is set to and the number of nodes in the fully connected layers of the classifier are set to , and , respectively. The learning rate is set to
and the number of epochs is set to. Dropout rate is chosen to be for the LSTMs and fully connected layers; randomly selected 10% of the entries of the input signal in the training data are masked by . The models are implemented on a computer with 3.70GHz Intel i7-8700K processor, 2 NVIDIA GeForce GTX 1080Ti computer graphics cards and 32GB of RAM. The minibatch size of 128 is utilized. The parameter controlling how the reconstruction loss and classification loss are combined is set to .
Iii-a Performance Comparison on RadioML2016.10A
We first evaluate performance of the proposed model for modulation classification on a realistic RadioML2016.10A dataset. RadioML data111https://www.deepsig.io/datasets includes a series of synthetic and over-the-air modulation classification sets created by DeepSig Inc. Among them, RadioML2016.10A has in particular been widely used for benchmark testing [17, 22, 19]. Radio channel effects including time-delay, time-scaling, phase rotation, frequency offset and additive thermal noise are accounted for to emulate practical radio communications (details can be found in ). The set contains data for 11 modulation schemes (8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK and WBFM). The SNR ranges from -20dB to 18dB with 2dB step size; there are samples for each SNR resulting in the total of 220k samples. The sample length is 128. We used 50%, 25% and 25% of the dataset for training, validation and testing, respectively.
The two-layer LSTM denoising auto-encoder is trained on SNRs ranging from -20dB to 18dB. The top-1 classification accuracy of CNN , CLDNN , LSTM  and our proposed model is shown in Figure 5 and Table I. The classification accuracy computed over all SNRs achieved by CNN, CLDNN, LSTM and our proposed model is 51.29%, 56.78%, 60.49% and 61.72%, respectively. It is worth pointing out that training with noise helps increase the classification accuracy (computed across all SNRs) by 1.1% as compared to training with the original signal. The auto-encoder enables extraction of stable low-dimensional features with a significantly reduced dimension of hidden LSTM states and thus contributes to the improvement in classification accuracy and the reduction of computational complexity. The average classification accuracy for SNRs ranging from 0dB to 18dB achieved by CNN, CLDNN, LSTM and our proposed model is 73.9%, 83.31%, 90.26% and 91.55%, respectively. The proposed model outperforms selected models for almost all SNRs. The considered softwares were executed with their default settings, i.e., we use the same hyperparameters as the authors of existing methods did when running their methods on RadioML10.A dataset. The results in Figure 5 are averaged over 10 experiments. Model parameters were initialized at the beginning of each experiment. Note that the benchmarking results for the pre-existing methods that we obtained closely match those reported in . It is worth pointing out that our proposed model significantly outperforms CLDNN and CNN in high SNR regimes, while marginally outperforming LSTM in terms of top-1 classification accuracy. Figure 6-8 illustrate the confusion matrices in the experiment that achieved the highest overall top-1 classification accuracy for the proposed model and LSTM for SNRs 18dB, 0dB and -4dB. For the SNR of 18dB, the diagonal is much more sharp even though there are some confusions in separating AM-DSB from WBFM signals, which are mainly due to the silence periods of audio . Similar to Figure 7, there are some difficulties in separating AM-DSB and WBFM at the SNR of 0dB. Besides, there is some level of confusion between QAM16 and QAM64 since QAM16 is a subset of QAM64. It is also worth mentioning that the proposed model performs better on AM-SSB signals and on distinguishing QAM64 from QAM16 signals at high SNRs. As shown in Figure 8, it becomes much more difficult to distinguish the signals at low SNRs.
|Model||# Parameters||# FLOPs||Memory|
|Raspberry Pi 4||Mean||241||42||45||127|
|Raspberry Pi 3||Mean||119||19||20||45|
Next, we compare considered models in terms of the number of trainable parameters, the number of floating point operations (FLOPs), the memory cost, and the number of classifications per second on different computational platforms. As shown in Table II, the proposed model has the smallest number of trainable parameters and requires the fewest FLOPs and the smallest memory space. Table III shows that in terms of the number of classifications per second on Raspberry Pi 4, the proposed model is on average approximately , and faster than LSTM, CLDNN and CNN, respectively. On Raspberry Pi 3, the proposed model is on average approximately , and
faster than LSTM, CLDNN and CNN, respectively. The mean and standard deviation of the number of classifications per second are averaged over 10 experiments. Note that the complexity of existing methods cannot be reduced without causing severe deterioration of classification accuracy.
Iii-B Performance Comparison on RadioML2018.01A
We next evaluate performance of the proposed model on the modulation classification task using the realistic over-the-air RadioML2018.01A data with specific radio channel effect settings including carrier frequency offset, symbol rate offset, delay spread and thermal noise . Signals over the so-called Normal Classes that are commonly seen in impaired environments, including OOK, 4ASK, BPSK, QPSK, 8PSK, 16QAM, AM-SSB-SC, AM-DSB-SC, FM, GMSK and OQPSK, are utilized. The data contains 11 modulations and the SNR range is from -20dB to 30dB with 2dB step size. For each SNR and modulation scheme there are 4096 samples, leading to about 1.17M samples in total. The sample length is 1024 and each sample is composed of IQ components. 50%, 25% and 25% of the entire dataset are used for training, validation and testing, respectively.
Table IV shows the mean and standard deviation of top-1 classification accuracy over the range of SNRs for a number of models computed over 10 experiments. The classification accuracy computed over all SNRs achieved by VGG, RN and our proposed model is 64.03%, 66.00% and 67.30%, respectively. Note that training with noise helps increase the classification accuracy (computed across all SNRs) by 0.9% as compared to training with the original signal. As before, the auto-encoder enables extraction of stable low-dimensional features with a significantly reduced dimension of hidden LSTM states, hence contributing to the improvement in classification accuracy and the reduction of computational complexity. The classification accuracy over the range of SNR from dB to dB achieved by VGG, RN and our proposed model is 92.16%, 94.89% and 96.56%, respectively. The proposed model outperforms state-of-the-art models for almost all SNRs.
The top-1 classification accuracy of the VGG and residual networks (RN) used in  and the proposed model across different SNRs are shown in Figure 9. Results in Figure 9 are averaged over 10 experiments.
Figure 10-12 illustrate the confusion matrices for the experiment with the highest overall top-1 classification accuracy for the proposed model and LSTM at the SNRs 18dB, 6dB and 0dB. For the SNR of 18dB, the diagonal is very sharp for the proposed model while there are some confusions between AM-SSB-SC and 4ASK signals for RN and VGG. At the SNR of 0dB, it becomes more difficult for the proposed model to separate AM-SSB-SC and 4ASK. As shown in Figure 12, it becomes much more difficult to distinguish the signals at low SNRs and all considered models start making mistakes differentiating between GMSK, OQPSK and BPSK signals.
In addition to the top-1 classification accuracy, we also compare considered models in terms of the number of trainable parameters, the number of floating point operations, the memory cost and the number of classifications per second on different computational platforms. As shown in Table V, the proposed model has the fewest trainable parameters and requires the smallest number of FLOPs and memory space. It is noticeable in Table VI that the proposed model is on average approximately and faster than RN and VGG on Raspberry Pi 4 in terms of the number of classifications per second, respectively. On Raspberry Pi 3, the proposed model is on average approximately and faster than RN and VGG, respectively. The mean and standard deviation of the number of classifications per second are calculated over 10 experiments on 1024 signals.
|Model||# Parameters||# FLOPs||Memory|
|Raspberry Pi 4||Mean||24.43||10.32||15.43|
|Raspberry Pi 3||Mean||12.12||5.28||9.06|
Iii-C Performance Comparison on Electrosense Data
We further evaluate performance of the proposed model on real-time over-the-air PSD data from Electrosense. The goal of Electrosense initiative is to enable more efficient, safe and reliable monitoring of the electromagnetic space by improving accessibility of spectrum data to general public . The aggregated spectrum measurements collected from sensors all over the world could be retrieved from the Electrosense API222https://electrosense.org/open-api-spec.html. Six commercially deployed technologies (WFM, TETRA, DVB, RADAR, LTE and GSM) are collected from indoor sensors with omni-directional antennas by setting frequency resolution to 100kHz and time resolution to 60s 
. 10k samples of length 2000 are retrieved for each technology and are padded with 0s accordingly for the consistency of the sample lengths. 50%, 25% and 25% of the entire dataset are used for training, validation and testing, respectively.
Figure 13 shows the confusion matrices of our proposed model and LSTM for technology classification on Electrosense data. The proposed model performs slightly better than LSTM. It is noticeable that distinguishing DVB from LTE based on PSD is difficult since the power spectra of DVB and LTE is highly similar and both of them are based on OFDM .
Next, we compare the considered models in terms of the number of trainable parameters, the number of FLOPs, the memory cost and the number of classifications per second on different platforms. As shown in Table VII, the proposed model has significantly fewer trainable parameters and requires much fewer FLOPs and memory space. Table VIII shows that the proposed model is on average approximately faster than LSTM on Raspberry Pi 4 in terms of the number of classifications per second. On Raspberry Pi 3, the proposed model is on average approximately faster than LSTM. The mean and standard deviation of the number of classifications per second are calculated over 10 experiments on 1024 signals.
|Model||# Parameters||# FLOPs||Memory|
|Raspberry Pi 4||Mean||12.79||2.92|
|Raspberry Pi 3||Mean||6.45||1.27|
In this paper, we introduce a denoising auto-encoder to the problem of inferring the modulation and technology type of a received radio signal. In particular, an LSTM auto-encoder is trained to learn stable and robust features from the noise corrupted received signals, reconstruct the original received signals and infer the modulation or technology type, simultaneously. Empirical studies show that the proposed framework generally outperforms top-1 classification accuracy of the competing methods while requiring significantly smaller computation resources. In particular, the proposed framework employs a compact architecture that it can be implemented on affordable computational devices, enabling real-time classification of the received signals at required levels of accuracy.
Learning the chinese sentence representation with lstm autoencoder. Proceedings of WWW ’18: The Web Conference. Cited by: §II-B.
-  (2007) Survey of automatic modulation classification techniques: classical approaches and new trends. IET Communications 1 (2), pp. 137–156. Cited by: §I.
-  (2005) A new approach to signal classification using spectral correlation and neural networks. First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005. (144-150). Cited by: §I.
-  (1992) Signal interception: performance advantages of cyclic-feature detectors. IEEE Transactions on Communications 40 (1), pp. 149–159. Cited by: §I.
-  (1988) Signal interception: a unifying theoretical framework for feature detection. IEEE Transactions on Communications 36 (8), pp. 897–906. Cited by: §I.
-  (1987) Spectral correlation of modulated signals: part ii - digital modulation. IEEE Transactions on Communications 35 (6), pp. 595–601. Cited by: §I.
-  (1987) Spectral correlation of modulated signals: part i - analog modulation. IEEE Transactions on Communications 35 (6), pp. 584–594. Cited by: §I.
-  (1999) Learning to forget: continual prediction with lstm. 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470) 2, pp. 850–855. Cited by: §I.
-  (2016) Deep learning. MIT Press. Cited by: §I.
-  (2016) Deep residual learning for image recognition. , pp. 770–778. Cited by: §I.
-  (1997-12) Long short-term memory. Neural computation 9, pp. 1735–80. Cited by: §I.
A graph auto-encoder for haplotype assembly and viral quasispecies reconstruction.
Proceedings of The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 719–726. Cited by: §II-B.
-  (2014-12) Adam: a method for stochastic optimization. International Conference on Learning Representations. Cited by: §II-C.
-  (2016) Unsupervised representation learning of structured radio communication signals. 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), pp. 1–5. Cited by: §III-A.
-  (2018-02) Over-the-air deep learning based radio signal classification. IEEE Journal of Selected Topics in Signal Processing 12 (1), pp. 168–179. Cited by: §I, §III-B, §III-B.
-  (2016) Convolutional radio modulation recognition networks. Engineering Applications of Neural Networks, pp. 213–226. Cited by: §I, §II-A, §III-A.
Radio machine learning dataset generation with gnu radio. Proceedings of the GNU Radio Conference 1 (1). Cited by: §III-A.
-  (2018) Electrosense: open and big spectrum data. IEEE Communications Magazin 56 (1), pp. 210–217. Cited by: §I, §III-C.
-  (2018) Deep learning models for wireless signal classification with distributed low-cost spectrum sensors. IEEE Transactions on Cognitive Communications and Networking 4 (3), pp. 433–445. Cited by: §I, §II-B, §II-C, §III-A, §III-A, §III-A, §III-C, §III-C.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv e-prints, pp. arXiv:1409.1556. Cited by: §I.
-  (2008) Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. Cited by: §I, §II-B.
-  (2017) Deep architectures for modulation recognition. 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), pp. 1–6. Cited by: §I, §III-A, §III-A.
-  (2019) Deep learning for over-the-air non-orthogonal signal classification. arXiv:1911.06174. Cited by: §I.
-  (2006) Automatic modulation classification of communication signals. Ph.D. Thesis, Department of Electrical and Computer Engineering, New Jersey Institute of Technology. Cited by: §I.