1 Introduction
The increase in communication demands and the shortage of spectrum resources has caused the cognitive radio (CR) and multipleinput multipleoutput (MIMO) techniques to be implemented in wireless communication systems. As one of the essential steps of CR, modulation classification (MC) is widely applied in both civil and military applications, such as spectrum surveillance, electronic surveillance, electronic warfare, and network control and management 8471222 . It improves radio spectrum utilisation and enables intelligent decisionmaking for contextaware autonomous wireless spectrum monitoring systems liao2019sequential . However, most of the existing MC methods are focussed on singleinput singleoutput (SISO) scenarios, which cannot be directly applied when multiple transmit antennas are equipped at the transceivers 7384691 . Therefore, it is crucial to research the performance of the MC method for MIMO communication systems.
Traditional MC approaches for the SISO systems discussed in the literature can be classified into two main categories: likelihoodbased (LB) approaches and featurebased (FB) approaches 4167652 . The LB approaches can theoretically achieve optimal performance as they compute the likelihood functions of the different modulated signals to maximise the classification accuracy. However, they have a very high computational complexity and require prior information, such as the channel coefficient 5461606 8333735
. Hence, the LB approaches cannot be directly applied in fast modulation classification and blind modulation classification (BMC). By contrast, the FB approaches cannot obtain the optimal result, but they have lower computational complexity and do not require prior information. The FB methods usually include two steps: feature extraction and classifier design. The higherorder statistics, instantaneous statistics, and other features are calculated in the feature extraction. Then the popular classification methods, such as decision tree
7295481 7728143 8017570 , and artificial neural network (ANN) 8754798 UAV are adopted as the classifiers.With the rapid rise of artificial intelligence and the emerging requirements of intelligent wireless communication, deep learningbased approaches are now becoming widely studied and used in different aspects of wireless communication, such as the transceiver design at the physical layer
8054694 and BMC problems ramjee2019fast 8357902 8643801 8760481 8454504 8761426 . As for BMC in SISO scenarios, the raw inphase and quadrature phase (IQ) data or the timedomain amplitude and phase data can be directly used as the input of the deep learning neural network. More specifically, the authors in ramjee2019fast presented convolutional long shortterm deep neural network and deep residual network (Resnet) algorithms to identify 10 different modulation types, with a high classification accuracy over a wide range of signaltonoise ratio (SNR) values. Rajendran et al. 8357902proposed a new datadriven model for BMC based on long shortterm memory (LSTM), which learnt the features from the timedomain amplitude and phase information of the modulation schemes and yielded an average classification accuracy close to 90% for SNRs from 0 dB to 20 dB. Zhang et al.
8643801 , adopting the Resnet model as the classifier, presented an approach to fuse the timefrequency images and the handcrafted features of the modulated signals to obtain more discriminating features. The experimental results showed that the proposed scheme has a superior performance. The latest research indicates that the deep learningbased MC methods achieve better overall performance than the traditional LB and FB approaches for the SISO systems.Although the MC (or BMC) method systems on SISO networks are becoming more mature, research into using MC for MIMO networks has just begun 7384691 . The authors in 6117042 and 7459788
proposed similar methods for the MC of MIMO transceiver systems, which calculate the higher order statistical moments and cumulants of the received signal. Then the artificial neural network is employed to classify the modulation types. In
8422500, a clustering classifier based on centroid reconstruction is presented to identify the modulation scheme with unknown channel matrix and noise variance in MIMO systems. The simulation results showed that their algorithm can obtain excellent performance, even at low SNR and with a very short observation interval. To deal with the BMC problem and the two major constraints in the railway transmission environment (i.e. the high speeds and impulsive nature of the noise), Kharbech et al.
8356664 proposed a featurebased process of blind identification that includes three parts: impulsive noise mitigation, feature extraction, and classification. By analysing the correlation functions of the received signals for certain modulation formats, Mohamed et al. resolved the BMC problem in single and multipleantenna systems operating over frequencyselective channels in MareyBlindMIMO , and the BMC problem in Alamouti STBC System MareyBlind .For MIMO systems, it is difficult to directly apply deep learning to the raw IQ data or the timedomain amplitude and phase data, since the overlapped signals at the receiver of the MIMO system destroy the statistical features 8533632
. Hence, it is crucial to extract the distinguishable features or convert the raw signals for BMC in MIMO systems. Due to the timefrequency analysis methods can jointly analyse the timedomain and frequencydomain features of signals, and the different modulation types have distinct timedomain and frequencydomain features. Hence, in this paper, in order to overcome the effect of the overlapped signals at the receiver, we analyse the timefrequency features of the modulated signals to resolve the BMC problem in MIMO systems. First, the timefrequency analysis method based on the windowed shorttime Fourier transform (STFT)
2016xli is employed to generate the spectrum of the MIMOmodulated signals. Then the spectrum in different time windows is converted to a greyscale image, and the greyscale image is converted to a redgreenblue (RGB) spectrogram image OZER2018505 . Second, a finetuned AlexNetbased convolutional neural network (CNN) model is introduced to learn the features from the RGB spectrogram images. The modulation scheme of each receiving stream among the receiving MIMO signals is identified in this stage. Finally, the previously produced decisions are merged to form the final result. In addition, this method can be simplified to directly apply to SISO systems. The simulation results show that the proposed method achieves a superior performance at low SNR scenarios for both MIMO and SISO systems.This paper is organised as follows. The signal model of the MIMO and SISO systems and the STFTbased timefrequency analysis method are introduced in Section 2. Section 3 presents the BMC scheme for MIMO systems, including the proposed CNN model and the decision method. Then the RGB spectrogram image and the classification performance in different scenarios are analysed in Section 4. Finally, conclusions are drawn in Section 5.
2 Signal model and timefrequency analysis method
In this section, we define the MIMO signal model, and then the simplified SISO signal model is derived. Then the STFTbased timefrequency analysis method is introduced to generate the spectrogram image of the MIMO modulated signals.
2.1 MIMO signal model
We consider a MIMObased singlecarrier wireless communication system with transmit antennas and receive antennas. The flatfading and timeinvariant MIMO channel is adopted herein. Therefore, the MIMO channel is defined as
(1) 
where represents the channel coefficient between the th transmit antenna and th receive antenna. The channel matrix is assumed to be fullcolumn rank and the channel gains remain constant over the observation interval. Let denote the transmitted data streams, where represent the transmitted modulated signal at the th transmit antenna. Likewise, let represent the received data streams, where is the received signal at the th receive antenna. Then the received signals can be further described by
(2) 
where the vector represents the additive white Gaussian noise (AWGN) vector and each element of
is an identically and independently distributed (i.i.d.) random variable with zero mean and variance
(i.e. ). In order to obtain the RGB spectrogram image of , the datasets generated in this paper are timedomain signals 8643801 , instead of the baseband signals used in 8364579 ; 8846691 .2.2 SISO signal model
When , the MIMObased signal model in Section 2.1 can be converted into a SISObased signal model. The received signals corrupted by the AWGN in the SISO system can then be represented as
(3) 
where represents the original digital modulated signals, represents the digital modulated signals over the wireless channel, represents the channel attenuation coefficient, and denotes the AWGN. In this paper, the original digital modulated signals may be multiple amplitudeshift keying (MASK), multiple frequencyshift keying (MFSK), multiple phaseshift keying (MPSK) and quadrature amplitude modulation (QAM) signals proakis2001digital .
The timedomain expression of MASKmodulated signals is described as
(4) 
where , , , and represent the modulation amplitude, symbol period, carrier frequency, and initial phase, respectively. The value of depends on the symbol sequence and the modulation order . In addition, is a baseband signal waveform and is usually a squareroot raised cosine pulses.
Similarly, the timedomain expressions of MFSK and MPSK are defined as
(5) 
and
(6) 
respectively.
In (5) and (6), and are the modulation frequency and phase, respectively, and the values of these parameters depend on the symbol sequence and the modulation order .
However, the QAM signal is slightly different from the MXSK (MASK, MFSK, and MPSK) modulated signals, because the QAMmodulated signal has two orthogonal carriers. Therefore, it can be represented as
(7)  
where , , and the two carriers are individually modulated by and 8017570 .
2.3 STFTbased timefrequency analysis
In this paper, the STFT is adopted in the modulated signal analysis. That is, we use STFT to analyse the frequency and phase of local sections of the timevarying modulated signals with a time window function 7181637 . Then the spectrogram image (the visual representation of the frequency spectrum of a signal) is constructed. In this subsection, we introduce the theory of the STFT, and then we present the method to generate the STFTbased RGB spectrogram image for the modulated signals.
2.3.1 Theory of the STFT
Consider a signal and a real, even window , whose Fourier transforms (FT) are and , respectively. To obtain a localised spectrum of at time , the signal is multiplied by the window centred at time , which results in
(8) 
Next, the FT at is taken at time , obtaining
(9) 
where is the STFT 2016xli .
2.3.2 Generating the STFTbased RGB spectrogram image for the modulated signals
In order to perform the STFT and obtain the spectrogram image of the modulated signals, we implement the process showed in Fig. 1. Dividing a given discrete modulated signal vector of length into highly overlapped frames each with length generates the spectral vector , where is obtained by sampling the received modulated signal . Hence, the signal in the current frame, , is
(10) 
where is the current frame, is the window function, the window function can be hamming,hanning or blackman, and we choose hamming in this paper mitra2006digital . Then the is the incrementation between two consecutive frames, which is calculated by
(11) 
Herein, the () is the length of overlapped signals between two consecutive frames, and the number of frames can be calculated by
(12) 
The larger the , the greater the , and hence the higher time resolution of the STFT.
The hamming window function is defined as
(13) 
where is a rectangular window with length .
Based on (10), we can obtain the spectral magnitude vector of the current frame ,
(14) 
where is the number of points of the Fourier transform. The larger the , the higher the frequency resolution of the STFT. Therefore, the linear value of the spectral magnitude vector is obtained as
(15) 
The linear value of the spectral magnitude vector can be normalised in the range of [0, 1] as
(16) 
By combining the normalised linear spectral magnitude vector of all the frames as
(17) 
we can obtain the timefrequency matrix . This matrix is a greyscale image of the spectral magnitude vector, the size of this image is , the horizontal axis of this image represents time, and the vertical axis represents frequency.
Next, the greyscale image is quantised into its RGB components, the mapping type is the in matlab r2016b jet . The mapping is expressed as
(18) 
where is the RGB spectrogram image and is the nonlinear quantisation function OZER2018505 . It is worth noting that, to facilitate the observation and analysis of RGB spectrogram image, we deploy the color mapping in this paper, this step can be omitted in practical applications.
For the STFT, by adjusting the values of the window length and overlapped signal length , we can tune the time resolution of the RGB spectrogram image. Moreover, by adjusting the number of points of the Fourier transform , we can also tune the frequency resolution of the RGB spectrogram image.
3 Proposed BMC scheme
In this section, a timefrequency analysis is conducted and a deep learningbased BMC scheme is proposed. The block diagram of the proposed BMC scheme is shown in Fig. 2, which shows four modules: signal generator, timefrequency analysis, CNN classifier, and decision fusion. The signal generator outputs the modulated signals (with the same modulation type) for each transmit antenna 6117042 . This process was described in subsections 2.1 and 2.2. Then the timefrequency analysis is performed for the received signal for each receive antenna, which generates the RGB spectrogram image (partially described in subsection 2.3). Next, the AlexNetbased CNN classifier is trained based on a number of RGB spectrogram images in the training stage, and the modulation type of each received signal is identified in the test stage. Finally, the decisions of different signal branches are combined by the decision fusion module for the final decision. In the next three sections , we will illustrate in detail the procedures of the timefrequency analysis, CNN based classifier, and decision fusion.
3.1 Timefrequency analysis for received signals
The flow chart of STFTbased timefrequency analysis is shown in Fig. 1. First, using the ASK signal as an example, the received signal is divided into frames by the hamming window with length , the details of which are described in Eqs. (10)(13). Second, the spectrum of the windowed signal is obtained by its Fourier transform. Third, by normalising and combining the linear spectral magnitude vector, the greyscale spectrogram image is obtained (the size of the related greyscale matrix is ). Finally, to accommodate the input layer of AlexNet and improve the distinguishability of the spectrogram image, the greyscale spectrogram image is mapped into RGB spectrogram image (the size of the related RGB matrix is
). Then, the RGB matrix is cut or padded into
before feeding it into the CNN.3.2 AlexNet based CNN classifier
In our proposed BMC scheme, AlexNet, which is utilised for object detection Krizhevsky2012ImageNet
and was the winner of the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), is adopted as the classifier. The network architecture of AlexNet is shown in Fig.
3 8401505 .As depicted in Fig. 3, AlexNet contains eight layers; the first five are convolutional and the remaining three are fully connected . The output of the last fully connected layer is fed to a 1000way softmax that produces a distribution over the 1000 class labels Krizhevsky2012ImageNet
. AlexNet uses the rectified linear unit (ReLU) as the activation function of the CNN. In practice, the dropout and max pooling techniques are applied to the CNN. AlexNet has an excellent performance in visual tracking and object detection due to its capability in sensing the pattern position on the image. Therefore, considering that the spectrogram image has rich pattern position information, it is sensible to choose AlexNet as the classifier network.
The motivation of transfer learning comes from the fact that people can intelligently apply knowledge learned previously to solve new problems faster or with better solutions 5288526 . In order to utilise the pretrained AlexNet, transfer learning is employed to finetune AlexNet and accelerate the training process. The last layer of the pretrained AlexNet network in Fig. 3 is configured with 1000 classes, and this layer must be finetuned to accommodate the new classification task. First, all layers except the last layer are extracted, then the last layer is replaced with a new fully connected layer that contains eight neurons (i.e. the number of modulation categories in this paper). In the end, the parameters of the activation layer and the classification output layer are set to accommodate the new classification task. Therefore, with such finetuning, the output of AlexNet can precisely perform the modulation classification of the received signals.
3.3 Decision fusion
Since there are multiple antennas at the receiver of the MIMO network, it is possible for each branch to cooperate with each other to achieve higher identification reliability 6117042 . As shown in Fig. 2, the received signals are classified independently because the influences of signal overlapping, interchannel noise, and random phase shifting may cause each received signal to be identified as a different modulation type. This may lead to incorrect identification results. The decision fusion among all the receive antennas aims to improve the average classification accuracy. The decision vector of the th received signal, , can be defined as
(19) 
where is the number of modulation types,
is the probability of identifying the received signal
as modulation type , and meets the following condition,(20) 
Therefore, the modulation type of the received signal is the modulation type which has the maximum probability. The modulation type with maximum probability can be defined as a set as follows,
(21) 
Note that there are two cases for the above equations, 1) the maximum probability is unique, i.e., , the modulation type of th received signal is the element of ; 2) the maximum probability is not unique, i.e., , the modulation type of th received signal is randomly chosen from .
Hence, the decision fusion can be converted to the problem of deciding the final modulation type according to , . The fusion rule at the fusion module can be OR, AND, or majority rule, which can be generalised as the “noutof rule”. Atapattu2014Energy . That is, a certain modulation scheme is identified when a classifier is decided on among the classifiers. Take the as an example and the possible modulation types formulate the set {2PSK, 4PSK, 8PSK}, if there are more than three classifiers identify the modulation type as 2PSK (4PSK or 8PSK), then the final modulation type is 2PSK (4PSK or 8PSK); if there are two classifiers identify the modulation type as 2PSK and the other two classifiers identify the modulation type as 4PSK and 8PSK, respectively, then the final decision is 2PSK; in addition, if the two classifiers identify the modulation type as 2PSK and the other two classifiers identify the modulation type as 4PSK (or 8PSK), the decision fusion centre will randomly choose a modulation type between 2PSK and 4PSK (or 8PSK) as the final result.
4 Performance analysis
In this section, the proposed timefrequency analysis and deep learningbased BMC algorithm is tested under different modulation schemes in both the SISO and MIMO scenarios. Specifically, the random channel attenuation assigns a value from , and random phase shifts within one symbol interval are considered for the MIMO scenario. The AWGNs with different SNRs are added into the modulated signals for both the SISO and MIMO scenarios. In addition, we consider the following MIMO antenna configurations: and . In the simulations, the 2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 8PSK, and 16QAM modulation schemes are considered, unless otherwise stated. The parameters of the modulated signals are assigned as follows. The sampling frequency is 16 KHz, the carrier frequency is 2 KHz, the symbol rate is 100 Hz, and the length of original digital signal is 14 (i.e. each modulated signal contains sample points). In addition, in the training stage, 100 modulated signals for each modulation type and SNR are randomly generated for both the SISO and MIMO scenarios, in which the SNR varies from 4 to 10 dB at intervals of 2 dB 8643801 . In the test stage, 100 modulated signals for each modulation type and SNR are randomly generated. All the signal samples are generated in Matlab 2017b, and the training and testing of AlexNet are based on the Matlab neural network toolbox. Additionally, the parameters to generate the RGB spectrogram image are set as , , , and .
We now discuss how the modulation order, SNR, and overlapping of the MIMO signals influence the RGB spectrogram image of the modulated signals. Then the classification performance of the proposed scheme is validated for different scenarios.
4.1 RGB spectrogram image of the modulated signals
In this subsection, in order to simplify the analysis, we select only certain binary and quaternary digital signal sequences (as shown in Fig. 4) to generate the RGB spectrogram image. The binary signal 4(a) is used to generate the 2order modulated signals (i.e. 2ASK, 2FSK, and 2PSK) and the quaternary signal 4(b) is used for the 4order modulated signals (i.e. 4ASK, 4FSK, and 4PSK).
4.1.1 RGB spectrogram image of the modulated signals with different modulation orders
We first evaluate how the modulation order affects the RGB spectrogram image at a SNR of 10 dB for the SISO scenario. The considered modulation schemes are ASK, FSK, and PSK, which are shown in Fig. 5. They are analysed separately as follows.
First of all, the RGB spectrogram image is a timefrequency distribution image of the modulated signal. The horizontal axis of this image represents time and the vertical axis represents frequency. In addition, the colour of the RGB spectrogram image represents the value of the normalised spectral magnitude (i.e. the values corresponding to blue and red are zero and one, respectively).
Figs. 5(a) and 5(d) show the RGB spectrogram image of the ASKmodulated signals. The power of the ASKmodulated signals concentrate on one frequency band in the image, and the power in the image is discontinuous over time. In addition, the colour in the image is blue when the digital signal sequence is at zero level in Fig. 4, and it is red when the digital signal sequence is at a nonzero level, which corresponds to the values of the spectral magnitude. In addition, compared with the 2ASK signal, the spectral magnitude of the 4ASK signal has a larger average value (i.e. more pixels in the 4ASK RGB spectrogram image have a value of 1).
Figs. 5(b) and 5(e) show the RGB spectrogram image of the FSKmodulated signals at a SNR of 10 dB. The spectral magnitude of the 2FSK modulated signals has a larger value over two subbands, and the spectral magnitude of the 4FSK modulated signals has a larger value over four subbands . For the FSK signals, the modulation order is equal to the number of modulated frequencies, which is the number of subbands in the RGB spectrogram image.
The RGB spectrogram images of the PSKmodulated signals are shown in Figs. 5(c) and 5(f). The phase mutation of modulated signals is captured in the RGB spectrogram images. Specifically, Figs. 4(a) and 5(c) both have the phase mutation in the 2PSK modulated signal from 0 to 1 and from 1 to 0 in the binary digital signal sequences. The phase mutation decreases the value of the power spectral density at the modulated frequency, which appears as a ’ring’ in the RGB spectrogram image. Similarly, comparing Figs. 4(b) and 5(f), the and phase mutations also partly decrease the value of the power spectral density at the modulated frequency, but they appear as a ’halfring’ in the RGB spectrogram image. Therefore, modulated signals with different modulation orders have different timefrequency features, and it is reasonable to classify the modulated signals using the timefrequency analysis.
4.1.2 RGB spectrogram image of the modulated signals with different SNRs
In this paper, only the 2order modulation schemes are analysed for different SNRs of the RGB spectrogram image. For the 2ASK modulated signals with dB and dB, the corresponding RGB spectrograms are shown in Figs. 5(a) and 6(a), respectively. For the 2ASKmodulated signals, as the noise power increases, the components of the noise power become more prominent, as shown by the white patches in the RGB spectrogram image. However, the main features of the RGB spectrogram image of the 2ASK modulated signals are not destroyed. That is, the power distribution of the 2ASKmodulated signals is still concentrated in one subband in the RGB spectrogram image. In addition, the distribution of the power values of the power spectral density are almost the same at different SNRs. Similarly, the RGB spectrograms for the 2FSK and 2PSKmodulated signals with dB and dB are shown in Figs. 5(b) and 6(b) and Figs. 5(c) and 6(c), respectively. From these figures, we can conclude that increases in the noise power do not destroy the main features of the RGB spectrogram image of these modulated signals, and thus they can be used as the features for modulation classification even in the low SNR region.
4.1.3 RGB spectrogram image of the modulated signals for the MIMO channels
We now analyse how the MIMO channel influences the RGB spectrogram image of the modulated signals. The 2ASK, 2FSK, and 2PSK modulation schemes are discussed herein. The antenna configuration for the MIMO system is and , then the random channel attenuation assigns a value from , and random phase shifts within one symbol interval are considered for the MIMO scenario, and the AWGNs with 10dB SNRs are added into the modulated signals. In addition, a multiplexingbased transmission scheme is adopted for the MIMO system. Specifically, two transmit antennas send two independent data streams, but with the same modulation scheme (e.g. 2ASK, 2FSK, or 2PSK). The result is shown in Fig. 7.
A comparison of Figs. 7 and 5 shows that, for all the modulated signals, the signal overlapping of the MIMO system has no effect on the power distribution of the modulated signals in the frequency domain, but the power distribution over the time domain is changed. The latter can be explained by the fact that the overlapping of different transmitted signals partly destroys the timefrequency characteristics of raw modulated signals. In spite of this, some crucial timefrequency characteristics are not destroyed by the MIMO signal overlapping, such as the ’ring’ that is caused by the phase mutation in the 2PSK signal (shown in Figs. 5(c) and 7(c)). Hence, the overlapping of modulated signals partially destroy the timefrequency characteristics, but some of the crucial timefrequency characteristics are still preserved in the RGB spectrogram image. Therefore, the RGB spectrogram image can still be used to identify the modulation type, even in the MIMO scenario.
4.2 Classification accuracy of proposed scheme
The classification accuracy of the proposed scheme is tested and verified for both the SISO and MIMO scenarios. We first randomly generate the data stream, and then it is modulated and passed through the SISO or MIMO channels. In order to verify the performance of the proposed scheme, some benchmark schemes are introduced, such as the scheme based on the smooth pseudo WignerVille distribution (SPWVD) proposed in 8643801 and the scheme based on the WignerVille distribution (WVD) proposed in 2016xli .
4.2.1 Classification accuracy in the SISO scenario
For the SISO scenario, the average classification accuracy of the proposed scheme is evaluated by varying the SNR of the signals from 4 dB to 10 dB. The result is shown in Fig. 8. As the SNRs of the signals increase, the classification accuracies of all three classification schemes gradually improve. Moreover, our proposed scheme always has the highest average accuracy. Its classification accuracy is always larger than 92.37 %, even at dB, and it can reaches a classification accuracy of 99.12% at dB. This significantly outperforms the SPWVD and WVDbased methods. These results show that our method has high accuracy and robustness, even at low SNR.
The confusion matrices of the classification results, from which the classification accuracy of each modulation type can be derived, are shown in Fig. 9. Figure 9(a) indicates that if the SNR of the modulated signals is low, the classification accuracies of the 2PSK and 8PSKmodulated signals are low. For example, at dB, the 2PSK and 8PSK signals may be incorrectly identified as 4PSK with probabilities of 0.14 and 0.4, respectively. This can be explained by the fact that different PSK signals have similar timefrequency characteristics in the RGB spectrogram image, especially for the 4PSK and 8PSKmodulated signals. By contrast, with a high SNR, all the modulated signals are successfully identified except the 8PSK signals, which achieved an identification accuracy of 93%. Therefore, the proposed timefrequency analysis and deep learningbased BMC scheme can achieve excellent performance at both low and high SNR in the SISO scenario.
4.2.2 Classification accuracy in the MIMO scenario
The classification performance of the proposed scheme in the MIMO scenario is now verified. In order to better understand the performance of the proposed scheme, the model is trained and tested with two data sets (as in 6117042 ): one for the modulation set = {2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 8PSK, 16QAM} and another for a smaller modulation set = {2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK}. In the testing stage , the SNR of the modulated signals is varied from dB to dB, and the result is shown in Fig. 10. For both scenarios with and without the decision fusion module, the classification accuracy of the proposed scheme increases as the SNR of the modulated signals increases, which is consistent with the theoretical analysis. However, by introducing the decision fusion module, a 10% performance improvement in the classification accuracy can be achieved. In more detail, the proposed scheme can achieve 80.42% and 87.92% accuracy at 4 and 10dB SNR in , and 87.78% and 93.33% accuracy at 4 and 10dB SNR in . In addition, the average classification accuracy for the MIMO scenario is lower than the SISO scenario. This is due to the fact that, by using multiple antennas in the system, the structure of the original signals is destroyed by overlapping at the receive antenna, as mentioned in section 4.1.
Similarly, the confusion matrices of the classification results are shown in Fig. 11(a) and (b) for modulated signal SNRs of 4 dB and 10 dB, respectively. The MFSK and QAMmodulated signals have the highest classification accuracies at both 4 dB and 10 dB, and the MASKmodulated signals have the second highest . The MPSK signals (especially the 4PSK signals) exhibit the worst classification performance, as shown in Fig. 11(a). Most of the 4PSK are misclassified as 8PSK at dB, and the performance is improved only slightly at dB. This result indicates that the MIMO system structure has negative effects on the timefrequency characteristics of the MPSK signals, which is consistent with the theoretical analysis. Hence, our proposed scheme has difficulty identifying the highorder PSK signals in the MIMO system. However, the timefrequency analysis and deep learningbased scheme have excellent performance in classifying the MFSK, ASK, and QAMmodulated signals, and it can obtain superior average classification accuracy for the MIMO system.
5 Conclusion
In this paper, we resolve the problem of blind modulation classification for the MIMO system. Specifically, the windowed STFT was used to analyse the timefrequency characteristics of the modulation signals, and the timefrequency graphs of the modulated signals were converted to RGB spectrogram images. Then transfer learning was utilised to finetune AlexNet to adapt to our classification problem, and the generated RGB spectrogram images were fed into the finetuned CNN to extract features and train the net. Finally, the decision of each received signal from the MIMO receivers were combined by the decision fusion module for the final decision. The STFTbased timefrequency analysis results showed that each modulation type had unique timefrequency characteristics, and that the additive noise had limited influence on the timefrequency characteristics of the modulation signals. The final classification results indicated that the proposed scheme can achieve 92.37% and 99.12% classification accuracy at SNRs of 4 dB and 10 dB in the SISO scenario. For the MIMO system, the proposed scheme still achieved 70% and 80% at 4 dB for the large and small modulation sets, respectively. In future work, we plan to improve the performance of the proposed scheme for the highorder PSK signals.
References
 (1) R. Gupta, S. Majhi, O. A. Dobre, Design and implementation of a treebased blind modulation classification algorithm for multipleantenna systems, IEEE Transactions on Instrumentation and Measurement 68 (8) (2019) 3020–3031 (Aug 2019). doi:10.1109/TIM.2018.2868556.
 (2) kaisheng Liao, G. Tao, Y. Zhong, Y. Zhang, Z. Zhang, Sequential convolutional recurrent neural networks for fast automatic modulation classification (2019). arXiv:1909.03050.
 (3) Y. A. Eldemerdash, O. A. Dobre, M. Öner, Signal identification for multipleantenna wireless systems: Achievements and challenges, IEEE Communications Surveys Tutorials 18 (3) (2016) 1524–1551 (thirdquarter 2016). doi:10.1109/COMST.2016.2519148.
 (4) O. A. Dobre, A. Abdi, Y. BarNess, W. Su, Survey of automatic modulation classification techniques: classical approaches and new trends, IET Communications 1 (2) (2007) 137–156 (April 2007). doi:10.1049/ietcom:20050176.
 (5) J. L. Xu, W. Su, M. Zhou, Likelihood functionbased modulation classification in bandwidthconstrained sensor networks, in: 2010 International Conference on Networking, Sensing and Control (ICNSC), 2010, pp. 530–533 (April 2010). doi:10.1109/ICNSC.2010.5461606.
 (6) M. Abdelbar, W. H. Tranter, T. Bose, Cooperative cumulantsbased modulation classification in distributed networks, IEEE Transactions on Cognitive Communications and Networking 4 (3) (2018) 446–461 (Sep. 2018). doi:10.1109/TCCN.2018.2824326.
 (7) R. Harjani, D. Cabric, D. Markovic, B. M. Sadler, R. K. Palani, A. Saha, H. Shin, E. Rebeiz, S. BasirKazeruni, F. Yuan, Wideband blind signal classification on a battery budget, IEEE Communications Magazine 53 (10) (2015) 173–181 (October 2015). doi:10.1109/MCOM.2015.7295481.
 (8) L. Han, F. Gao, Z. Li, O. A. Dobre, Low complexity automatic modulation classification based on orderstatistics, IEEE Transactions on Wireless Communications 16 (1) (2017) 400–411 (Jan 2017). doi:10.1109/TWC.2016.2623716.
 (9) Z. Wu, S. Zhou, Z. Yin, B. Ma, Z. Yang, Robust automatic modulation classification under varying noise conditions, IEEE Access 5 (2017) 19733–19741 (2017). doi:10.1109/ACCESS.2017.2746140.
 (10) S. I. H. Shah, S. Alam, S. A. Ghauri, A. Hussain, F. Ahmed Ansari, A novel hybrid cuckoo search extreme learning machine approach for modulation classification, IEEE Access 7 (2019) 90525–90537 (2019). doi:10.1109/ACCESS.2019.2926615.
 (11) W. Li, Z. Dou, Y. Lin, C. Shi, Wavelet transform based modulation classification for 5g and uav communication in multipath fading channel, Physical Communication 34 (Feb 2019). doi:10.1016/j.phycom.2018.12.019.
 (12) T. O Shea, J. Hoydis, An introduction to deep learning for the physical layer, IEEE Transactions on Cognitive Communications and Networking 3 (4) (2017) 563–575 (Dec 2017). doi:10.1109/TCCN.2017.2758370.
 (13) S. Ramjee, S. Ju, D. Yang, X. Liu, A. E. Gamal, Y. C. Eldar, Fast deep learning for automatic modulation classification (2019). arXiv:1901.05850.
 (14) S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, S. Pollin, Deep learning models for wireless signal classification with distributed lowcost spectrum sensors, IEEE Transactions on Cognitive Communications and Networking 4 (3) (2018) 433–445 (Sep. 2018). doi:10.1109/TCCN.2018.2835460.
 (15) Z. Zhang, C. Wang, C. Gan, S. Sun, M. Wang, Automatic modulation classification using convolutional neural network with features fusion of spwvd and bjd, IEEE Transactions on Signal and Information Processing over Networks 5 (3) (2019) 469–478 (Sep. 2019). doi:10.1109/TSIPN.2019.2900201.
 (16) J. Nie, Y. Zhang, Z. He, S. Chen, S. Gong, W. Zhang, Deep hierarchical network for automatic modulation classification, IEEE Access 7 (2019) 94604–94613 (2019). doi:10.1109/ACCESS.2019.2928463.
 (17) F. Meng, P. Chen, L. Wu, X. Wang, Automatic modulation classification: A deep learning enabled approach, IEEE Transactions on Vehicular Technology 67 (11) (2018) 10760–10772 (Nov 2018). doi:10.1109/TVT.2018.2868698.
 (18) J. Ma, S. Lin, H. Gao, T. Qiu, Automatic modulation classification under nongaussian noise: A deep residual learning approach, in: 2019 IEEE International Conference on Communications (ICC), 2019, pp. 1–6 (May 2019). doi:10.1109/ICC.2019.8761426.
 (19) K. Hassan, I. Dayoub, W. Hamouda, C. N. Nzeza, M. Berbineau, Blind digital modulation identification for spatiallycorrelated mimo systems, IEEE Transactions on Wireless Communications 11 (2) (2012) 683–693 (February 2012). doi:10.1109/TWC.2011.122211.110236.
 (20) S. Kharbech, I. Dayoub, M. ZwingelsteinColin, E. P. Simon, On classifiers for blind featurebased automatic modulation classification over multipleinput cmultipleoutput channels, IET Communications 10 (7) (2016) 790–795 (2016). doi:10.1049/ietcom.2015.1124.

(21)
J. Tian, Y. Pei, Y. Huang, Y. Liang, A machine learning approach to blind modulation classification for mimo systems, in: 2018 IEEE International Conference on Communications (ICC), 2018, pp. 1–6 (May 2018).
doi:10.1109/ICC.2018.8422500.  (22) S. Kharbech, I. Dayoub, M. ZwingelsteinColin, E. P. Simon, Blind digital modulation identification for mimo systems in railway environments with highspeed channels and impulsive noise, IEEE Transactions on Vehicular Technology 67 (8) (2018) 7370–7379 (Aug 2018). doi:10.1109/TVT.2018.2834869.
 (23) M. Marey, O. A. Dobre, Blind modulation classification algorithm for single and multipleantenna systems over frequencyselective channels, IEEE Signal Processing Letters 21 (9) 1098–1102.
 (24) M. Marey, O. A. Dobre, Blind modulation classification for alamouti stbc system with transmission impairments, IEEE Wireless Communications Letters 4 (5) 521–524.
 (25) M. Gao, Y. Li, O. A. Dobre, N. AlDhahir, Joint blind identification of the number of transmit antennas and mimo schemes using gerschgorin radii and fnn, IEEE Transactions on Wireless Communications 18 (1) (2019) 373–387 (Jan 2019). doi:10.1109/TWC.2018.2879941.
 (26) B. Boashash (Ed.), TimeFrequency Signal Analysis and Processing (Second Edition), Oxford Academic Press, 2016 (2016).
 (27) I. Ozer, Z. Ozer, O. Findik, Noise robust sound event classification with convolutional neural network, Neurocomputing 272 (2018) 505–512 (2018).
 (28) M. Öner, On the classification of binary space shift keying modulation, IEEE Communications Letters 22 (8) (2018) 1584–1587 (Aug 2018). doi:10.1109/LCOMM.2018.2840147.
 (29) T. V. R. O. Câmara, A. D. L. Lima, B. M. M. Lima, A. I. R. Fontes, A. D. M. Martins, L. F. Q. Silveira, Automatic modulation classification architectures based on cyclostationary features in impulsive environments, IEEE Access 7 (2019) 138512–138527 (2019). doi:10.1109/ACCESS.2019.2943300.
 (30) J. G. Proakis, M. Salehi, Digital communications, Vol. 4, McGrawhill New York, 2001 (2001).

(31)
B. Kim, S. Kong, S. Kim, Low computational enhancement of stftbased parameter estimation, IEEE Journal of Selected Topics in Signal Processing 9 (8) (2015) 1610–1619 (Dec 2015).
doi:10.1109/JSTSP.2015.2465310.  (32) S. K. Mitra, Y. Kuo, Digital signal processing: a computerbased approach, Vol. 2, McGrawHill New York, 2006 (2006).
 (33) mathworks, Jet color chart array, https://ww2.mathworks.cn/help/matlab/ref/jet.html, accessed March 19, 2020.
 (34) M. Rezaee, M. Mahdianpari, Y. Zhang, B. Salehi, Deep convolutional neural network for complex wetland classification using optical remote sensing imagery, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (9) (2018) 3030–3039 (Sep. 2018). doi:10.1109/JSTARS.2018.2846178.
 (35) A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2) (2012).
 (36) S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering 22 (10) (2010) 1345–1359 (Oct 2010). doi:10.1109/TKDE.2009.191.
 (37) S. Atapattu, C. Tellambura, J. Hai, Energy Detection for Spectrum Sensing in Cognitive Radio, Springer Publishing Company, Incorporated, 2014 (2014).
Comments
There are no comments yet.