PETCGDNN
An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation
view repo
Automatic modulation recognition (AMR) is a promising technology for intelligent communication receivers to detect signal modulation schemes. Recently, the emerging deep learning (DL) research has facilitated highperformance DLAMR approaches. However, most DLAMR models only focus on recognition accuracy, leading to huge model sizes and high computational complexity, while some lightweight and lowcomplexity models struggle to meet the accuracy requirements. This letter proposes an efficient DLAMR model based on phase parameter estimation and transformation, with convolutional neural network (CNN) and gated recurrent unit (GRU) as the feature extraction layers, which can achieve high recognition accuracy equivalent to the existing stateoftheart models but reduces more than a third of the volume of their parameters. Meanwhile, our model is more competitive in training time and test time than the benchmark models with similar recognition accuracy. Moreover, we further propose to compress our model by pruning, which maintains the recognition accuracy higher than 90 parameters comparing with stateoftheart models.
READ FULL TEXT VIEW PDFAn Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation
Automatic modulation recognition (AMR) has made it possible to identify signal modulation schemes automatically by the receiver in the noncooperative communications scenarios, which has various civilian and military applications, such as spectral interference detection, spectrum sensing, cognitive radio, etc [1]. Deep learning (DL) based AMR methods are showing improved performance in terms of recognition accuracy and complexity compared with traditional likelihoodbased methods and featurebased methods [6].
Prior DLAMR models have achieved benchmark performance for AMR by using various neural network layers, e.g. convolutional neural network (CNN) [6, 2, 14]
, recurrent neural network (RNN)
[9, 3], and hybrid network [13], etc. Considering simple neural networks lacking the ability to eliminate signal distortion caused by the wireless channel, [15] designed a signal distortion correction module to equalize the carrier frequency and phase offset, which demonstrated the potential to improve DLAMR models by incorporating expert domain knowledge. With the rapid development of 5G/B5G in recent years, the growth of massive Internetofthings (IoT) devices demand improved communications performance with limited available resources [10], and thus efficient AMR models are crucially important for the future IoT devices with limited computing and energy resources. Consequently, researchers begin to research the lightweight and lowcomplexity DLAMR models by reducing the model size or accelerating the computation time while ensuring the recognition accuracy [12, 4, 11], making them increasingly possible to deploy in resourcelimited devices. However, the existing models with high recognition accuracy rarely considered the model size and complexity in the design process, while lightweight and lowcomplexity models struggle to achieve high accuracy.In this letter, we propose an efficient DLAMR model inspired by radio transformer networks (RTN)
[5], CNN, and GRU. The original data is processed by a parameter estimation network and a transformation module, and then the spatial and temporal features of the signals are extracted by CNN and gated recurrent unit (GRU) for classification. The model can achieve high recognition accuracy equivalent to stateoftheart models but with much fewer parameters, while the training time and test time of our model outperform the benchmark models without suffering recognition accuracy. A network pruning method [16] is further applied to compress the model size while maintaining high recognition accuracy, making it a promising candidate for resourcelimited systems.The contributions of this letter are summarized as follows:
An efficient model that can achieve stateoftheart recognition accuracy over three benchmark datasets but with least parameters is proposed based on parameter estimator, parameter transformer, CNN, and GRU.
To efficiently utilize the spatialtemporal features of AMR signals, we propose to decrease the kernel size and feature maps in the CNN layers, and introduce parameter estimator and transformer to reduce the adverse effects on phase, leading to improved recognition accuracy.
We demonstrate that the proposed lightweight model can be further compressed by five times using pruning method to fit the scenarios with extremely limited resources.
After the signal passes through the channel and is sampled, the equivalent baseband signal can be expressed by:
(1) 
where is the signal modulated by the transmitter in a certain modulation scheme, denotes the complex Additive Gaussian Noise (AWGN), represents the channel gain, is the frequency offset, is the phase offset, denotes the th value observed by the receiver and is the number of symbols in a signal sample. To facilitate data processing and modulation recognition, the received signals can be stored in inphase/quadrature (I/Q) form, denoted as .
The proposed parameter estimation and transformation based CNNGRU deep neural network (PETCGDNN) model comprises a parameter estimator, a parameter transformer, and a hybrid neural network, which is shown in Fig. 1.
As described in the signal model, the sampled I/Q data is affected by noise and interference from the channel and imperfect hardware design, which may result in adverse effects such as temporal shifting, linear mixing/rotating, and spinning of the received signal. Many of such effects can be inverted using parametric transformations according to classic signal processing theories. Hence, the parameter estimator (Part 1) and the parameter transformer (Part 2) have been introduced in our model to extract phase offset related information and perform phase parameter transformation, which are the key to enhancing the recognition accuracy of our model.
The parameter estimator in Part 1 can estimate the phase parameter by cotraining with the subsequent model. For each I/Q sample with the dimension of 2 × ( = 128 or 1024), one phase parameter, which carries the phase offset related information, is estimated by a neural network composed of a Flatten layer and a Dense layer (fully connected layer). The input signal y
is flattened into a vector by the Flatten layer to satisfy the input dimension of the Dense layer, and then the data in this vector are correlated through the Dense layer to obtain the phase parameter
. The activation function of the Dense layer is Linear, which yields an estimated phase parameter from a continuous unbounded range.
The parameter transformer in Part 2 is a customized layer, which performs parametric inverse transformation by taking the input y and , given by:
(2) 
where is the estimated phase parameter and is the output of Part 2.
Part 3 consists of CNN, GRU, and Dense layer which can realize feature extraction and classification. The first convolutional layer has 75 filters and 2 × 8 kernel size, which extracts the spatial features of the signal, while the second convolutional layer has 25 filters and 1 × 5 kernel size, which further compacts the extracted features. The subsequent GRU layer extracts the temporal features of the signal with 128 units. Finally, the classification task is completed through the Dense layer, and the number of hidden units is C, which is equal to the number of modulation classes. The first two convolution layers use rectified linear unit (ReLU) activation functions, and the activation function of the last Dense layer is Softmax. Aiming at building a model with a smaller size and low computational cost, the corresponding network in the third part is designed to control the model size with a small number of parameters in the CNN layer.
Although our model already has a small number of parameters, CNN based structures usually have intrinsic redundancy, which could cause a large model size and unnecessary computational cost. To further compress the model, a network pruning method is adopted to reduce the redundancy. We apply the following approach for the proposed DLAMR model, aiming to maintain a high recognition accuracy while reducing the model size, modeled as follows,
(3) 
where is initial sparsity value, denotes final sparsity value, refers to the step to start training with pruning frequency , the finetuning process is divided into steps to gradually increase the sparsity, and
. Binary mask variables of the same size and shape as the weights tensor are added to Dense, CNN, and GRU layers. The weights masks of each layer are sorted in the finetuning process, and the corresponding weights in the smallest portion are set to zero until reaching the target sparsity goal.
The experiments are conducted on RML2016.10a, RML2016.10b [7] and RML2018.01a [8] datasets with the input data dimension of 2 × 128, 2 × 128 and 2 × 1024, respectively. RML2016.10a dataset includes 220,000 modulated signals with 11 commonly used modulation schemes, RML2016.10b dataset contains 1,200,000 signals with 10 schemes, and RML2018.01a dataset has over 2.5 million signals with 24 modulation schemes. Owing to the hardware limitations, only half of RML2018.01a dataset is randomly selected in our experiment. RML2016.10a and RML2016.10b are generated by simulation using GNU radio while RML2018.01a is produced in a laboratory environment.
We divide the datasets into training, validation and test at the ratio of 6:2:2 per class with random selection. The loss function is categorical crossentropy, and the optimizer is Adam. When the validation loss does not decrease in 5 epochs, it is multiplied by a coefficient of 0.5. When the validation loss does not decrease in 50 epochs, the training process is stopped and the trained model is saved with the minimum validation loss. The experiments are implemented using GeForce GTX 1080Ti GPU and Keras with Tensorflow as the backend.
Several indicators are selected in Table I
for performance comparison and complexity analysis, including the number of parameters, training time, test time, highest accuracy of all SNRs (signaltonoise ratio), and average accuracy of all SNRs. All models are assessed on the three datasets, with input and output layers adjusted to fit the data dimensions, so the number of parameters varies, as shown in the table. Our models in Table
I are the vanilla version without pruning.Model  Datasets  Parameters 





ICAMCNET 







MCNET 







LSTM2 







GRU2 







MCLDNN 














It is clear that the proposed PETCGDNN model without pruning is already with the least parameters compared with benchmark models in Table I. The time cost of PETCGDNN has obvious advantages compared with GRU2, LSTM2, and MCLDNN, and is comparable with ICACMNET and MCNET but having much higher accuracy.
Note that the parameters of PETCGDNN are relatively stable when tested on datasets with a larger input dimension. The ICAMCNET, originally designed for the dataset with an input data length of 128, would see 6.8 times more parameters for higherdimensional input data (RML2018.01a). In addition, ICAMCNET and MCNET are composed of CNNs, which have lower computational complexity than RNN, leading to a low level of training and test time. PETCGDNN has some disadvantages in time cost compared with ICAMCNET and MCNET, but the relatively higher cost is offset by the higher recognition accuracy in all test cases. Specifically, PETCGDNN has the shortest training time on RML2016.10b. Comparing with benchmark high accuracy models such as MCLDNN and LSTM, our model sacrifices little recognition accuracy but greatly reduces the complexity and model size.
We implement the pruning method in TensorFlow, which has been integrated into TensorFlow as a Kerasbased pruning tool. In our experiment, the finetuning process has 5 epochs which are divided into 5,160 steps by setting the batch size to 128. The model is pruned and tested with different sparsity, while the size of the model is measured by the number of nonzero (NNZ) parameters. Table II and Fig. 3 present the number of model parameters and the recognition accuracy after pruning. While the number of parameters is less than 15K with a sparsity of 0.8, the pruned model can maintain an accuracy above 90%. The recognition accuracy of the pruned models remains stable on RML2016.10a and RML2016.10b when the sparsity is between 0 and 0.8, even though the model size is only 1/5 of the original model. The recognition accuracy of the pruned models decreases slightly on RML2018.01a, which indicates more connections are needed to fully extract information from the data with an input length of 1024.
Sparsity  Datasets 





0 (Original) 





0.5 





0.8 





0.9 





0.95 




To further illustrate the effectiveness of the proposed model, we conduct an ablation experiment to compare the recognition accuracy of PETCGDNN with the final module: PETCGDNNPart 3, which reflects the contribution of Part 1 and Part 2 to the recognition accuracy. It can be seen from Fig. 2LABEL:sub@figcom that the model (PETCGDNNPart 3) without the parameter estimation and transformation module cannot achieve the equivalent recognition accuracy of PETCGDNN in the high SNR range (above 0 dB). The average recognition accuracy and the best recognition accuracy of PETCGDNN exhibit overall better performance, while the model size and time cost are almost the same.
To further assess the functions of the parameter estimator and parameter transformer (Part 1 and Part 2) in PETCGDNN, the output of Part 2 is visualized in the I/Q plane with a focus on the constellation distribution features rather than the value of these points (the origin of coordinates are different in each image). Fig. 4 shows the output of Part 2 in comparison with the inputs to the model when the signals’ SNR is +10 dB. It is clearly visible that the signals converge to tighter clusters after the two modules, which benefit the classification module and lead to the improvement of the overall recognition accuracy compared with PETCGDNNPart 3.
In this letter, an efficient DLAMR model, named PETCGDNN, is proposed to achieve stateoftheart performance. With the blessing of expert domain knowledge on phase offset estimation and compensation, the model enjoys the characteristics of lightweight, lowcomplexity, and high recognition accuracy, which also exhibits good stability on different datasets. Moreover, a pruning method has been applied to further reduce the model size, which exhibits the possibility to compress an AMR model, even if the model parameters are already very small. The efficient AMR model will have potentially wide application in the scenarios of massive machinetype communications and ultrareliable and low latency communications in the future, which is in line with the development trend of future communication systems.
Radio machine learning dataset generation with gnu radio
. In Proc. GNU Radio Conf., Vol. 1. Cited by: §III.