In traditional communication systems, cooperation between a transmitter and a receiver is the default configuration to ensure reliable signal recovery at the receiver. Therefore, signal format is the important side information that has to be mutually known to both transmitters and receivers. The side information is delivered from a transmitter to notify a receiver and such an overhead would additionally occupy either time, frequency or space resources. Furthermore, wireless channels are time-variant and the side information of signal format would be out of date when a signal reaches a receiver after a time delay, which would subsequently cause inaccurate signal detection. Therefore, a more reliable solution is required, which avoids transmitting side information and lets the receiver timely extract signal format information from received signals.
An intelligent receiver can automatically identify signal formats based on data training. Deep learning (DL) is initially proposed to deal with image processing since it can automatically and efficiently extract features from two-dimensional images. The representative deep learning strategy is convolutional neural network (CNN
), which employs multiple convolutional layers for feature extractions. CNN has been successfully applied in single carrier modulation classification[OShea_classification_2018] and multicarrier orthogonal frequency division multiplexing (OFDM) modulation classification [OFDM_classification_Zhou2019ARM, OFDM_classification_access2019]. The classification for non-orthogonal spectrally efficient frequency division multiplexing (SEFDM) signals [TongyangTVT2017] has been theoretically and practically investigated in work [tongyang_VTC2020_DL_classification], in which a trained CNN classifier can efficiently identify feature-diversity dominant signals while it cannot accurately classify feature-similarity dominant signals. Although the deep learning classifier is trained to automatically extract signal features without domain-knowledge, the tremendous fine-tuning for optimal neural network hyperparameters is time consuming and inefficient. Therefore, manually extracting signal features, based on expert knowledge and traditional machine learning (ML), would be more efficient and convincing.
This work will firstly study different statistical features in support vector machine (SVM
) for non-orthogonal signal classification. Modelling results reveal that either time-domain or frequency-domain statistical features are unable to train accurate classifiers for non-orthogonal signals. Therefore, a wavelet transform[wavelet_tour_Mallat_2008, wavelet_transform_TIT_1990] based time-frequency feature extraction approach is applied in this work. Previous work has explored multilevel structured wavelet decomposition [wavelet_classification_TPAMI_1997, wavelet_classification_TITB_2007] and wavelet scattering [wavelet_scattering_TSP_2014] for feature extraction and classification. This work focuses on a single-level wavelet filtering (WF) strategy. Results indicate that the time-frequency feature with statistical dimensionality reduction can assist SVM to identify signals at high accuracy. In addition, this work evaluates classifier accuracy for signals at different Es/N0. Finally, a low-cost experiment is set up to verify the trained classifiers using over-the-air signals.
The main contributions of this work are as the following.
Statistical features are investigated in SVM for non-orthogonal signal classification.
Two-dimensional time-frequency features are evaluated via single-level wavelet transform. Various time-frequency feature dimensionality reduction methods are studied to simplify the features and further improve the classification accuracy.
Low-cost over-the-air experiment is designed for non-orthogonal signal classification. Practical results verify the robustness of the wavelet classification.
Ii SEFDM Waveform
The time-domain SEFDM signals are illustrated in Fig. 1 where two types of signal patterns are presented. It is inferred that the classification of Type-I signal pattern is easier than that of Type-II signal pattern since the signal features in Type-I are more distinguishable.
The discrete format of one time-domain SEFDM symbol is defined as
where the expression is very similar to that of OFDM except the bandwidth compression factor , in which is the sub-carrier spacing and is the time period of one SEFDM symbol. The signal spectral bandwidth in (1) is compressed when and is equivalent to that of OFDM when . The number of sub-carriers is determined by . is the single-carrier symbol within one SEFDM symbol and is the time sample with . The instantaneous power for one SEFDM symbol is computed in the following
It is clearly shown that the inter carrier interference (ICI) term in (2), which is related to the value of , determines the possibility of identifying different SEFDM signals. It is inferred that when SEFDM signals have similar values of , the ICI term will become similar and would complicate signal classification.
Iii Classification Strategies
Iii-a CNN Classification
A multi-layer CNN classifier is trained in a recent work [tongyang_VTC2020_DL_classification] to automatically extract signal features in either time-domain or frequency-domain. Based on extracted features, classification results are compared in Fig. 2, in which the time-domain classifier achieves higher accuracy than its frequency-domain counterpart. Classification accuracy can reach 95% when considering limited number of non-orthogonal signals in Type-I. However, the accuracy drops greatly when adding more similar signals in Type-II.
Iii-B SVM Classification
The limitation of the previous work [tongyang_VTC2020_DL_classification] is obvious and the motivation for this work is to accurately classify Type-II signals. The training of a multi-layer CNN classifier is time-consuming since it requires extensive hyperparameter tuning and iterative back propagation optimization. Therefore, it would be more efficient to use traditional machine learning strategies with manual feature extractions. The SVM classifier, based on domain-knowledge dependent features, is applied in this work. Firstly, the training is fast since features are obtained in advance rather than time-consuming data training. Secondly, the methodology of machine learning is deterministic and its working principle can be well explained. Since there are multiple signal classes in Type-I and Type-II, therefore a multiclass error-correcting output codes (ECOC) model [ECOC_Dietterich_1995] is applied here. A one-versus-one [ECOC_2014_codingDesign] coding strategy is implemented for separating different classes, which simplifies the multiclass classification task into multiple binary class classification tasks. Thus, multiple binary SVM learners, with a polynomial kernel of order two, are used for the multiclass classification.
Iv Feature Selection
This section will firstly explore the impacts of different one-dimensional statistical features and their combinations either in time-domain or frequency-domain. The second part will investigate the impact of two-dimensional time-frequency features via the single-level wavelet transform.
Iv-a Statistical Features
The commonly used statistical feature is arithmetic mean, which computes the average value of a dataset. Variance is used to measure the variations of a dataset. Small variance indicates that the values of dataset elements are closer to the arithmetic mean while large variance indicates that the dataset elements are spread out away from the mean. Skewness[Feature_skewness_2011] is a way to measure data distribution characteristics. Negative skewness indicates that a dataset distributes more data to the left side relative to its mean; positive skewness indicates that data is more distributed to the right side of the mean. The ratio between the maximum value and the minimum value is also studied here and the MaxMin ratio can tell the fluctuations of a dataset. Interquartile range (IQR) [book_IQR_1996] is a way to measure data dispersion, which equals the difference between the 25th percentile and the 75th percentile.
Iv-B Time-Frequency Features
The previous work [tongyang_VTC2020_DL_classification] revealed that independent time-domain features or frequency-domain features cannot efficiently identify Type-II signals. Therefore, the joint analysis of time-frequency signal features is important since feature diversity would be enhanced by considering two domains. This section applies wavelet transform [wavelet_tour_Mallat_2008] to manually extract hidden signal features in time-frequency dimensions.
There are two types of wavelet transform for time-frequency analysis, namely continuous wavelet transform (CWT) and discrete wavelet transform (DWT). CWT provides a detailed representation for signals by using fine scale factors. It therefore leads to high-resolution signal analysis and can capture crucial signal features. However, the obvious disadvantage of CWT is its higher computational complexity over DWT. A large time-frequency spectrogram grid would be obtained with the fine representation of scales. In this work, we would like to explore the accurate signal transient localization via detailed time-frequency analysis. Therefore, the high-resolution wavelet transform CWT is used rather than its coarse wavelet transform DWT.
There are several wavelet candidates for wavelet transform. This work employs the widely used Morse wavelet and the effects of different wavelets are not taken into account. The CWT time-frequency analysis for OFDM and SEFDM signals using Morse wavelet is illustrated in Fig. 3
. It is clearly shown that with the reduction of alpha, the frequency scales for SEFDM shrink to show the effect of bandwidth compression while its time scales are stretched to show the time-domain sample characteristics. Typical artificial intelligent solutions are to feed the time-frequency grid as an image to a deep learning neural network such asCNN. However, this would cause extra training complexity since the optimal neural network hyperparameters have to be tuned based on iterative attempts. Therefore, pre-processing is required to simplify the two-dimensional time-frequency feature representation into a one-dimensional feature vector as illustrated in Fig. 4. The strategy is to maintain the fine frequency scales of CWT while reducing time samples dimensionality using the statistical knowledge explained in Section IV-A.
V Classifier Training and Testing
To have a realistic training scenario, channel/hardware impairments have to be considered. The wireless channel power delay profile (PDP) and hardware impairments are defined in [tongyang_VTC2020_DL_classification] and are reused in this work. Signals are generated according to Table LABEL:tab:table_signal_specifications where 2048 time samples are produced at the transmitter for each OFDM/SEFDM symbol. There is no synchronization mechanism between the transmitter and the receiver. Therefore, the receiver would capture 2048 time samples and randomly truncate 1024 samples for training. At the training stage, 2,000 OFDM/SEFDM symbols are generated for each class (i.e. each ) following the data augmentation principle in [tongyang_VTC2020_DL_classification]. In this case, there are overall 8,000 symbols for the Type-I signal pattern and 14,000 symbols for the Type-II signal pattern. For testing, there are overall 3,200 OFDM/SEFDM symbols for Type-I and 5,600 symbols for Type-II.
|Sampling frequency (kHz)||200|
|IFFT sample length||2048|
|No. of data sub-carriers||256|
|Bandwidth compression factor||1,0.95,0.9,0.85,0.8,0.75,0.7|
At first, we assume a simple training and testing scenario, in which both the training data and testing data are contaminated by additive white Gaussian noise (AWGN) at a single Es/N0=20 dB. Multiple time-domain statistical features are extracted from the training dataset, which are labelled as ‘T-Statistics-SVM’. Joint statistical features are investigated by combining each statistical feature. In addition, the raw data without any manual feature extractions is also evaluated. Results in Fig. 5 show that all the statistical features cannot properly classify Type-I signals. It should be noted that even the joint feature cannot improve the accuracy. Similar results are obtained as well for the Type-II signals which have more challenging signal feature-similarity issues. The same feature extraction and training operations are repeated to the frequency-domain dataset, which are labelled as ‘F-Statistics-SVM’. The same conclusion is obtained in Fig. 5 that single domain statistical features cannot classify signals even in the frequency-domain.
The above results naturally lead to the joint time-frequency analysis, which would enhance the feature extraction efficiency. Wavelet transform will create a two-dimensional time-frequency feature grid. The scale range of the Morse wavelet is configured to have 7 octaves and 10 scales per octave. Therefore, considering both real and imaginary part of a signal, there are overall 140 frequency scales. In terms of time scale, following the signal specifications in Table LABEL:tab:table_signal_specifications and the 50% random symbol truncation mechanism, 1024 time sample scales will be reserved. Therefore, CWT will generate a pair of two-dimensional 701024 time-frequency analysis matrices.
There are many ways to reduce the time-frequency feature dimensionality. This work applies statistical transform to reduce the amount of time samples. Thus, the two-dimensional 701024 time-frequency grid is simplified into a 701 frequency-scale vector following the dimensionality reduction method in Fig. 4. Different statistical transform methods are evaluated at each frequency scale and results are shown in Fig. 6. It is clearly seen that the IQR and variance features enable higher classification accuracy than other features, which can even classify the feature-similarity dominant Type-II signals. The following classifier training will be based on those two statistical features.
A wavelet classifier is firstly trained using data at a fixed Es/N0=20 dB and tested at various Es/N0 with accuracy results shown in Fig. 7(a). It clearly shows that all the curves reach the peak accuracy at 20 dB. However, for other Es/N0 values, accuracy drops significantly. It indicates that training data at a fixed Es/N0 is not robust to train a classifier that can classify signals at a wide range of Es/N0.
To train a robust classifier, a dataset covering different Es/N0 (20, 30, 40 dB) is generated. The classification results are shown in Fig. 7(b), in which better accuracy is reached at high Es/N0 for both Type-I and Type-II signals. However, the accuracy at low Es/N0 still needs improvement.
To enhance the classification sensitivity at low Es/N0, a dataset, covering low Es/N0 (0, 10, 20 dBs), is trained with results shown in Fig. 7(c). All the curves are raised to achieve higher accuracy at low Es/N0. It should be noted that the variance feature enabled wavelet classifier can identify signals even below noise power and it achieves 78% classification accuracy when Es/N0=0 dB. However, its performance drops obviously at high Es/N0, especially those beyond Es/N0=20 dB. For the IQR feature trained classifiers, both Type-I and Type-II curves are stable at high Es/N0. It should be noted that the IQR feature trained Type-I classifier outperforms the variance feature trained model at high Es/N0. It is concluded from the figure that the variance trained model is robust at low Es/N0 while the IQR trained model is robust at high Es/N0.
Based on the above results, it is inferred that classifiers trained at high Es/N0 would enable high testing accuracy merely at high Es/N0 while classifiers trained at low Es/N0 would lead to high testing accuracy at low Es/N0. This indicates that a wider Es/N0 range has to be considered for the training data. In Fig. 7(d), classifiers are trained with data covering an Es/N0 range from 0 dB to 40 dB with an increment step of 10 dB, which basically combines the two Es/N0 ranges in Fig. 7(b) and Fig. 7(c). It clearly shows accuracy improvement for all the curves at both low and high Es/N0. In Fig. 7(e), a wider Es/N0 range between -20 dB and 50 dB is considered. The variance feature trained classifier shows apparent accuracy improvement for classifying Type-I signals at high Es/N0 while all other curves have no obvious improvement. However, there is still a minor performance degradation for the variance feature based classifier at high Es/N0 when compared with the IQR trained classifier. The robust feature performance of variance at low Es/N0 and IQR at high Es/N0 inspires to combine the two features for a more reliable classifier.
The composite classifiers, trained by joint variance and IQR features, can reach high classification accuracy for both Type-I and Type-II signals at both low and high Es/N0 ranges in Fig. 7(f). Therefore, the composite classifiers will be used in the following over-the-air experiments.
Vi Low-Cost Experiment and Results
The experiment is operated indoor in an open space, in which facilities would cause signal reflections and further result in frequency selective channel impairments. In addition, people movement in the space would cause Doppler spread and therefore dynamic spectral fluctuations. This work will use a pair of low-cost Analog Devices software-defined radio (SDR) PLUTO [PlutoSDR] to practically transmit and classify over-the-air signals. The signals are designed according to Table LABEL:tab:table_signal_specifications and transmitted at a free-licensed 900 MHz (33-centimeter band) carrier frequency.
The experiment setup, shown in Fig. 8, is low cost since a laptop and two PLUTO devices are sufficient to realize signal generation, over-the-air transmission, signal reception and classifier training. In order to collect diversified data from an indoor environment, we fix the position of the transmitter side SDR device and place the receiver side SDR device at different locations. In this case, a number of training datasets, impaired by channel multipath fading, power degradation and Doppler effect, are collected. Unlike the CNN classifier where a large number of training symbols are required for feature extractions, the wavelet classifier can manually extract features based on a limited dataset. Therefore, in this experiment, at each location, 400 symbols are collected for the Type-I signal pattern and 700 symbols for the Type-II signal pattern. There are four data collections considering four different locations of the receiver. Therefore, the overall collected training symbols for Type-I and Type-II are 1,600 and 2,800, respectively. For testing, the same process is repeated with four data collections. To have a fair comparison with the previous work [tongyang_VTC2020_DL_classification], the number of testing symbols per class is fixed at 800.
The collected data will be used to train wavelet classifiers off-line using Matlab. Once a wavelet classifier is trained, the model will be saved. Therefore, SDR devices will reuse the saved model for online signal classification and there is no need to re-train classifiers. Thus, the off-line training is a one-time operation. The confusion matrices are presented in Fig. 9. The classification accuracy for the Type-I signal pattern is nearly 100%. For Type-II signals, the accuracy is 90%, which is much higher than the 70.75% in [tongyang_VTC2020_DL_classification]
where a transfer learning enabledCNN classifier is applied.
This work aims to explore typical machine learning (ML) algorithms for non-orthogonal signal classification in non-cooperative communications. Multiple statistical approaches are tested for feature extractions in either time-domain or frequency-domain but showing unreliable classification accuracy. Wavelet transform is therefore applied to extract two-dimensional time-frequency features, which are further converted to a one-dimensional feature vector using statistical transform. Simulation results discovered that Es/N0 has great impacts on classification accuracy at the training stage. Results show increased classification accuracy over a wide range of training Es/N0. Classifiers are trained and tested with results showing that variance and IQR are the most efficient features. The combination of variance and IQR, associated with wavelet transform, enables classification accuracy up to 100%. Furthermore, the wavelet classifier can even identify signals when the signal power is below its noise power. Results show that the variance feature enabled wavelet classifier achieves 78% classification accuracy when Es/N0=0 dB. A low-cost experiment is set up using one laptop and two SDR devices. Practical results verify the efficacy of the wavelet enabled time-frequency features. Confusion matrices are obtained to show nearly 100% classification accuracy for the Type-I signal pattern and 90% accuracy for Type-II.