I Introduction
In traditional communication systems, cooperation between a transmitter and a receiver is the default configuration to ensure reliable signal recovery at the receiver. Therefore, signal format is the important side information that has to be mutually known to both transmitters and receivers. The side information is delivered from a transmitter to notify a receiver and such an overhead would additionally occupy either time, frequency or space resources. Furthermore, wireless channels are timevariant and the side information of signal format would be out of date when a signal reaches a receiver after a time delay, which would subsequently cause inaccurate signal detection. Therefore, a more reliable solution is required, which avoids transmitting side information and lets the receiver timely extract signal format information from received signals.
An intelligent receiver can automatically identify signal formats based on data training. Deep learning (DL) is initially proposed to deal with image processing since it can automatically and efficiently extract features from twodimensional images. The representative deep learning strategy is convolutional neural network (CNN
), which employs multiple convolutional layers for feature extractions. CNN has been successfully applied in single carrier modulation classification
[OShea_classification_2018] and multicarrier orthogonal frequency division multiplexing (OFDM) modulation classification [OFDM_classification_Zhou2019ARM, OFDM_classification_access2019]. The classification for nonorthogonal spectrally efficient frequency division multiplexing (SEFDM) signals [TongyangTVT2017] has been theoretically and practically investigated in work [tongyang_VTC2020_DL_classification], in which a trained CNN classifier can efficiently identify featurediversity dominant signals while it cannot accurately classify featuresimilarity dominant signals. Although the deep learning classifier is trained to automatically extract signal features without domainknowledge, the tremendous finetuning for optimal neural network hyperparameters is time consuming and inefficient. Therefore, manually extracting signal features, based on expert knowledge and traditional machine learning (ML), would be more efficient and convincing.This work will firstly study different statistical features in support vector machine (SVM
) for nonorthogonal signal classification. Modelling results reveal that either timedomain or frequencydomain statistical features are unable to train accurate classifiers for nonorthogonal signals. Therefore, a wavelet transform
[wavelet_tour_Mallat_2008, wavelet_transform_TIT_1990] based timefrequency feature extraction approach is applied in this work. Previous work has explored multilevel structured wavelet decomposition [wavelet_classification_TPAMI_1997, wavelet_classification_TITB_2007] and wavelet scattering [wavelet_scattering_TSP_2014] for feature extraction and classification. This work focuses on a singlelevel wavelet filtering (WF) strategy. Results indicate that the timefrequency feature with statistical dimensionality reduction can assist SVM to identify signals at high accuracy. In addition, this work evaluates classifier accuracy for signals at different Es/N0. Finally, a lowcost experiment is set up to verify the trained classifiers using overtheair signals.The main contributions of this work are as the following.

Statistical features are investigated in SVM for nonorthogonal signal classification.

Twodimensional timefrequency features are evaluated via singlelevel wavelet transform. Various timefrequency feature dimensionality reduction methods are studied to simplify the features and further improve the classification accuracy.

Lowcost overtheair experiment is designed for nonorthogonal signal classification. Practical results verify the robustness of the wavelet classification.
Ii SEFDM Waveform
The timedomain SEFDM signals are illustrated in Fig. 1 where two types of signal patterns are presented. It is inferred that the classification of TypeI signal pattern is easier than that of TypeII signal pattern since the signal features in TypeI are more distinguishable.
The discrete format of one timedomain SEFDM symbol is defined as
(1) 
where the expression is very similar to that of OFDM except the bandwidth compression factor , in which is the subcarrier spacing and is the time period of one SEFDM symbol. The signal spectral bandwidth in (1) is compressed when and is equivalent to that of OFDM when . The number of subcarriers is determined by . is the singlecarrier symbol within one SEFDM symbol and is the time sample with . The instantaneous power for one SEFDM symbol is computed in the following
(2) 
It is clearly shown that the inter carrier interference (ICI) term in (2), which is related to the value of , determines the possibility of identifying different SEFDM signals. It is inferred that when SEFDM signals have similar values of , the ICI term will become similar and would complicate signal classification.
Iii Classification Strategies
Iiia CNN Classification
A multilayer CNN classifier is trained in a recent work [tongyang_VTC2020_DL_classification] to automatically extract signal features in either timedomain or frequencydomain. Based on extracted features, classification results are compared in Fig. 2, in which the timedomain classifier achieves higher accuracy than its frequencydomain counterpart. Classification accuracy can reach 95% when considering limited number of nonorthogonal signals in TypeI. However, the accuracy drops greatly when adding more similar signals in TypeII.
IiiB SVM Classification
The limitation of the previous work [tongyang_VTC2020_DL_classification] is obvious and the motivation for this work is to accurately classify TypeII signals. The training of a multilayer CNN classifier is timeconsuming since it requires extensive hyperparameter tuning and iterative back propagation optimization. Therefore, it would be more efficient to use traditional machine learning strategies with manual feature extractions. The SVM classifier, based on domainknowledge dependent features, is applied in this work. Firstly, the training is fast since features are obtained in advance rather than timeconsuming data training. Secondly, the methodology of machine learning is deterministic and its working principle can be well explained. Since there are multiple signal classes in TypeI and TypeII, therefore a multiclass errorcorrecting output codes (ECOC) model [ECOC_Dietterich_1995] is applied here. A oneversusone [ECOC_2014_codingDesign] coding strategy is implemented for separating different classes, which simplifies the multiclass classification task into multiple binary class classification tasks. Thus, multiple binary SVM learners, with a polynomial kernel of order two, are used for the multiclass classification.
Iv Feature Selection
This section will firstly explore the impacts of different onedimensional statistical features and their combinations either in timedomain or frequencydomain. The second part will investigate the impact of twodimensional timefrequency features via the singlelevel wavelet transform.
Iva Statistical Features
The commonly used statistical feature is arithmetic mean, which computes the average value of a dataset. Variance is used to measure the variations of a dataset. Small variance indicates that the values of dataset elements are closer to the arithmetic mean while large variance indicates that the dataset elements are spread out away from the mean. Skewness
[Feature_skewness_2011] is a way to measure data distribution characteristics. Negative skewness indicates that a dataset distributes more data to the left side relative to its mean; positive skewness indicates that data is more distributed to the right side of the mean. The ratio between the maximum value and the minimum value is also studied here and the MaxMin ratio can tell the fluctuations of a dataset. Interquartile range (IQR) [book_IQR_1996] is a way to measure data dispersion, which equals the difference between the 25th percentile and the 75th percentile.IvB TimeFrequency Features
The previous work [tongyang_VTC2020_DL_classification] revealed that independent timedomain features or frequencydomain features cannot efficiently identify TypeII signals. Therefore, the joint analysis of timefrequency signal features is important since feature diversity would be enhanced by considering two domains. This section applies wavelet transform [wavelet_tour_Mallat_2008] to manually extract hidden signal features in timefrequency dimensions.
There are two types of wavelet transform for timefrequency analysis, namely continuous wavelet transform (CWT) and discrete wavelet transform (DWT). CWT provides a detailed representation for signals by using fine scale factors. It therefore leads to highresolution signal analysis and can capture crucial signal features. However, the obvious disadvantage of CWT is its higher computational complexity over DWT. A large timefrequency spectrogram grid would be obtained with the fine representation of scales. In this work, we would like to explore the accurate signal transient localization via detailed timefrequency analysis. Therefore, the highresolution wavelet transform CWT is used rather than its coarse wavelet transform DWT.
There are several wavelet candidates for wavelet transform. This work employs the widely used Morse wavelet and the effects of different wavelets are not taken into account. The CWT timefrequency analysis for OFDM and SEFDM signals using Morse wavelet is illustrated in Fig. 3
. It is clearly shown that with the reduction of alpha, the frequency scales for SEFDM shrink to show the effect of bandwidth compression while its time scales are stretched to show the timedomain sample characteristics. Typical artificial intelligent solutions are to feed the timefrequency grid as an image to a deep learning neural network such as
CNN. However, this would cause extra training complexity since the optimal neural network hyperparameters have to be tuned based on iterative attempts. Therefore, preprocessing is required to simplify the twodimensional timefrequency feature representation into a onedimensional feature vector as illustrated in Fig. 4. The strategy is to maintain the fine frequency scales of CWT while reducing time samples dimensionality using the statistical knowledge explained in Section IVA.V Classifier Training and Testing
To have a realistic training scenario, channel/hardware impairments have to be considered. The wireless channel power delay profile (PDP) and hardware impairments are defined in [tongyang_VTC2020_DL_classification] and are reused in this work. Signals are generated according to Table LABEL:tab:table_signal_specifications where 2048 time samples are produced at the transmitter for each OFDM/SEFDM symbol. There is no synchronization mechanism between the transmitter and the receiver. Therefore, the receiver would capture 2048 time samples and randomly truncate 1024 samples for training. At the training stage, 2,000 OFDM/SEFDM symbols are generated for each class (i.e. each ) following the data augmentation principle in [tongyang_VTC2020_DL_classification]. In this case, there are overall 8,000 symbols for the TypeI signal pattern and 14,000 symbols for the TypeII signal pattern. For testing, there are overall 3,200 OFDM/SEFDM symbols for TypeI and 5,600 symbols for TypeII.
Sampling frequency (kHz)  200 

IFFT sample length  2048 
Oversampling factor  8 
No. of data subcarriers  256 
Bandwidth compression factor  1,0.95,0.9,0.85,0.8,0.75,0.7 
Modulation scheme  QPSK 
At first, we assume a simple training and testing scenario, in which both the training data and testing data are contaminated by additive white Gaussian noise (AWGN) at a single Es/N0=20 dB. Multiple timedomain statistical features are extracted from the training dataset, which are labelled as ‘TStatisticsSVM’. Joint statistical features are investigated by combining each statistical feature. In addition, the raw data without any manual feature extractions is also evaluated. Results in Fig. 5 show that all the statistical features cannot properly classify TypeI signals. It should be noted that even the joint feature cannot improve the accuracy. Similar results are obtained as well for the TypeII signals which have more challenging signal featuresimilarity issues. The same feature extraction and training operations are repeated to the frequencydomain dataset, which are labelled as ‘FStatisticsSVM’. The same conclusion is obtained in Fig. 5 that single domain statistical features cannot classify signals even in the frequencydomain.
The above results naturally lead to the joint timefrequency analysis, which would enhance the feature extraction efficiency. Wavelet transform will create a twodimensional timefrequency feature grid. The scale range of the Morse wavelet is configured to have 7 octaves and 10 scales per octave. Therefore, considering both real and imaginary part of a signal, there are overall 140 frequency scales. In terms of time scale, following the signal specifications in Table LABEL:tab:table_signal_specifications and the 50% random symbol truncation mechanism, 1024 time sample scales will be reserved. Therefore, CWT will generate a pair of twodimensional 701024 timefrequency analysis matrices.
There are many ways to reduce the timefrequency feature dimensionality. This work applies statistical transform to reduce the amount of time samples. Thus, the twodimensional 701024 timefrequency grid is simplified into a 701 frequencyscale vector following the dimensionality reduction method in Fig. 4. Different statistical transform methods are evaluated at each frequency scale and results are shown in Fig. 6. It is clearly seen that the IQR and variance features enable higher classification accuracy than other features, which can even classify the featuresimilarity dominant TypeII signals. The following classifier training will be based on those two statistical features.
A wavelet classifier is firstly trained using data at a fixed Es/N0=20 dB and tested at various Es/N0 with accuracy results shown in Fig. 7(a). It clearly shows that all the curves reach the peak accuracy at 20 dB. However, for other Es/N0 values, accuracy drops significantly. It indicates that training data at a fixed Es/N0 is not robust to train a classifier that can classify signals at a wide range of Es/N0.
To train a robust classifier, a dataset covering different Es/N0 (20, 30, 40 dB) is generated. The classification results are shown in Fig. 7(b), in which better accuracy is reached at high Es/N0 for both TypeI and TypeII signals. However, the accuracy at low Es/N0 still needs improvement.
To enhance the classification sensitivity at low Es/N0, a dataset, covering low Es/N0 (0, 10, 20 dBs), is trained with results shown in Fig. 7(c). All the curves are raised to achieve higher accuracy at low Es/N0. It should be noted that the variance feature enabled wavelet classifier can identify signals even below noise power and it achieves 78% classification accuracy when Es/N0=0 dB. However, its performance drops obviously at high Es/N0, especially those beyond Es/N0=20 dB. For the IQR feature trained classifiers, both TypeI and TypeII curves are stable at high Es/N0. It should be noted that the IQR feature trained TypeI classifier outperforms the variance feature trained model at high Es/N0. It is concluded from the figure that the variance trained model is robust at low Es/N0 while the IQR trained model is robust at high Es/N0.
Based on the above results, it is inferred that classifiers trained at high Es/N0 would enable high testing accuracy merely at high Es/N0 while classifiers trained at low Es/N0 would lead to high testing accuracy at low Es/N0. This indicates that a wider Es/N0 range has to be considered for the training data. In Fig. 7(d), classifiers are trained with data covering an Es/N0 range from 0 dB to 40 dB with an increment step of 10 dB, which basically combines the two Es/N0 ranges in Fig. 7(b) and Fig. 7(c). It clearly shows accuracy improvement for all the curves at both low and high Es/N0. In Fig. 7(e), a wider Es/N0 range between 20 dB and 50 dB is considered. The variance feature trained classifier shows apparent accuracy improvement for classifying TypeI signals at high Es/N0 while all other curves have no obvious improvement. However, there is still a minor performance degradation for the variance feature based classifier at high Es/N0 when compared with the IQR trained classifier. The robust feature performance of variance at low Es/N0 and IQR at high Es/N0 inspires to combine the two features for a more reliable classifier.
The composite classifiers, trained by joint variance and IQR features, can reach high classification accuracy for both TypeI and TypeII signals at both low and high Es/N0 ranges in Fig. 7(f). Therefore, the composite classifiers will be used in the following overtheair experiments.
Vi LowCost Experiment and Results
The experiment is operated indoor in an open space, in which facilities would cause signal reflections and further result in frequency selective channel impairments. In addition, people movement in the space would cause Doppler spread and therefore dynamic spectral fluctuations. This work will use a pair of lowcost Analog Devices softwaredefined radio (SDR) PLUTO [PlutoSDR] to practically transmit and classify overtheair signals. The signals are designed according to Table LABEL:tab:table_signal_specifications and transmitted at a freelicensed 900 MHz (33centimeter band) carrier frequency.
The experiment setup, shown in Fig. 8, is low cost since a laptop and two PLUTO devices are sufficient to realize signal generation, overtheair transmission, signal reception and classifier training. In order to collect diversified data from an indoor environment, we fix the position of the transmitter side SDR device and place the receiver side SDR device at different locations. In this case, a number of training datasets, impaired by channel multipath fading, power degradation and Doppler effect, are collected. Unlike the CNN classifier where a large number of training symbols are required for feature extractions, the wavelet classifier can manually extract features based on a limited dataset. Therefore, in this experiment, at each location, 400 symbols are collected for the TypeI signal pattern and 700 symbols for the TypeII signal pattern. There are four data collections considering four different locations of the receiver. Therefore, the overall collected training symbols for TypeI and TypeII are 1,600 and 2,800, respectively. For testing, the same process is repeated with four data collections. To have a fair comparison with the previous work [tongyang_VTC2020_DL_classification], the number of testing symbols per class is fixed at 800.
The collected data will be used to train wavelet classifiers offline using Matlab. Once a wavelet classifier is trained, the model will be saved. Therefore, SDR devices will reuse the saved model for online signal classification and there is no need to retrain classifiers. Thus, the offline training is a onetime operation. The confusion matrices are presented in Fig. 9. The classification accuracy for the TypeI signal pattern is nearly 100%. For TypeII signals, the accuracy is 90%, which is much higher than the 70.75% in [tongyang_VTC2020_DL_classification]
where a transfer learning enabled
CNN classifier is applied.Vii Conclusion
This work aims to explore typical machine learning (ML) algorithms for nonorthogonal signal classification in noncooperative communications. Multiple statistical approaches are tested for feature extractions in either timedomain or frequencydomain but showing unreliable classification accuracy. Wavelet transform is therefore applied to extract twodimensional timefrequency features, which are further converted to a onedimensional feature vector using statistical transform. Simulation results discovered that Es/N0 has great impacts on classification accuracy at the training stage. Results show increased classification accuracy over a wide range of training Es/N0. Classifiers are trained and tested with results showing that variance and IQR are the most efficient features. The combination of variance and IQR, associated with wavelet transform, enables classification accuracy up to 100%. Furthermore, the wavelet classifier can even identify signals when the signal power is below its noise power. Results show that the variance feature enabled wavelet classifier achieves 78% classification accuracy when Es/N0=0 dB. A lowcost experiment is set up using one laptop and two SDR devices. Practical results verify the efficacy of the wavelet enabled timefrequency features. Confusion matrices are obtained to show nearly 100% classification accuracy for the TypeI signal pattern and 90% accuracy for TypeII.
Comments
There are no comments yet.