SR2CNN: Zero-Shot Learning for Signal Recognition

04/10/2020
by   Yihong Dong, et al.
0

Signal recognition is one of significant and challenging tasks in the signal processing and communications field. It is often a common situation that there's no training data accessible for some signal classes to perform a recognition task. Hence, as widely-used in image processing field, zero-shot learning (ZSL) is also very important for signal recognition. Unfortunately, ZSL regarding this field has hardly been studied due to inexplicable signal semantics. This paper proposes a ZSL framework, signal recognition and reconstruction convolutional neural networks (SR2CNN), to address relevant problems in this situation. The key idea behind SR2CNN is to learn the representation of signal semantic feature space by introducing a proper combination of cross entropy loss, center loss and autoencoder loss, as well as adopting a suitable distance metric space such that semantic features have greater minimal inter-class distance than maximal intra-class distance. The proposed SR2CNN can discriminate signals even if no training data is available for some signal class. Moreover, SR2CNN can gradually improve itself in the aid of signal detection, because of constantly refined class center vectors in semantic feature space. These merits are all verified by extensive experiments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

07/18/2019

Discriminative Embedding Autoencoder with a Regressor Feedback for Zero-Shot Learning

Zero-shot learning (ZSL) aims to recognize the novel object categories u...
01/25/2018

Class label autoencoder for zero-shot learning

Existing zero-shot learning (ZSL) methods usually learn a projection fun...
11/20/2017

Zero-shot Learning via Shared-Reconstruction-Graph Pursuit

Zero-shot learning (ZSL) aims to recognize objects from novel unseen cla...
12/23/2020

Vehicle Re-identification Based on Dual Distance Center Loss

Recently, deep learning has been widely used in the field of vehicle re-...
10/07/2020

Learning Clusterable Visual Features for Zero-Shot Recognition

In zero-shot learning (ZSL), conditional generators have been widely use...
04/10/2017

Semantically Consistent Regularization for Zero-Shot Recognition

The role of semantics in zero-shot learning is considered. The effective...
03/25/2021

Orthogonal Projection Loss

Deep neural networks have achieved remarkable performance on a range of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays, developments in deep convolutional neural networks (CNNs) have made remarkable achievement in the area of signal recognition, improving the state of the art significantly, such as [16, 7, 11] and so on. Generally, a vast majority of existing learning methods follow a closed-set assumption[6, 13], that is, all of the test classes are assumed to be the same as the training classes. However, in the real-world applications new signal categories often appear while the model is only trained for the current dataset with some limited known classes. It is open-set learning [22, 1] that was proposed to partially tackle this issue (i.e., test samples could be from unknown classes). The goal of an open-set recognition system is to reject test samples from unknown classes while maintaining the performance on known classes. However, in some cases, the learned model should be able to not only differentiate the unknown classes from known classes, but also distinguish among different unknown classes. Zero-shot learning (ZSL) [18, 23] is one way to address the above challenges and has been applied in image tasks. For images, it is easy for us to extract some human-specified high-level descriptions as semantic attributes. For example, from a picture of zebra, we can extract the following semantic attributes 1) color: white and black, 2) stripes: yes, 3) size: medium, 4) shape: horse, 5) land: yes. However, for a real-world signal it is almost impossible to have a high-level description due to obscure signal semantics. Therefore, although ZSL has been widely used in image tasks, to the best of our knowledge it has not yet been studied for signal recognition.

Fig. 1: Overview of SR2CNN. In SR2CNN, a pre-processing (top left) transforms signal data to input . A deep net (right) is trained to provide semantic feature

within known classes while maintaining the performance on decoder and classifier according to reconstruction

and prediction . A zero-shot learning classifier, which consists of a known classifier and an unknown classifier, exploits for discriminator.

Fig. 2: The architecture of feature extractor (), classifier () and decoder (). takes any input signal and produces a latent semantic feature . is used by and to predict class label and to reconstruct the signal , respectively. The , and are calculated on training these networks.

In this paper, unlike the conventional signal recognition task where a classifier is learned to distinguish only known classes (i.e., the labels of test data and training data are all within the same set of classes), we aims to propose a learning framework that can not only classify known classes but also unknown classes without annotations. To do so, a key issue that needs to be addressed is to automatically learn a representation of semantic attribute space of signals. In our scheme, CNN combined with autoencoder is exploited to extract the semantic attribute features. Afterwards, semantic attribute features are well-classified using a suitably defined distance metric. The overview of proposed scheme is illustrated in Fig. 1.

In addition, to make a self-evolution learning model, incremental learning [3, 20] needs to be considered when the algorithm is executed continuously. The goal of incremental learning is to dynamically adapt the model to new knowledge from newly coming data without forgetting the already learned one. Based on incremental learning, the obtained model will gradually improve its performance over time.

In summary, the main contribution of this paper is threefold:

  • First, we propose a deep CNN-based zero-shot learning framework, called SR2CNN, for open-set signal recognition. SR2CNN is trained to extract semantic feature while maintaining the performance on decoder and classifier. Afterwards, the semantic feature is exploited to discriminate signal classes.

  • Second, extensive experiments on various signal datasets show that the proposed SR2CNN can discriminate not only known classes but also unknown classes and it can gradually improve itself.

  • Last but not least, we provide a new signal dataset SIGNAL-202002 including eight digital and three analog modulation classes.

The code and dataset of this paper will be published upon acceptance.

2 Related Work

In recent years, signal recognition via deep learning has achieved a series of successes. The work

[15] proposed the Convolutional Radio Modulation Recognition Networks, which can adapts itself to the complex temporal radio signal domain, and also works well at low SNRs. Another paper [12] proposed an ensemble model of deep convolutional networks, to recognize 7 classes of signals from real-life data in the fiber optic field. Moreover, [17] used Residual Neural Network [8] to perform the signal recognition tasks across a range of configurations and channel impairments, offering referable statistics. These experiments basically follow closed-set assumption, namely, their deep models are expected to, whilst are only capable to distinguish among already-known signal classes.

When considering the recognition task of those unknown signal classes, some traditional machine learning methods like anomaly (also called outlier or novelty) detection can more or less provide some guidance. Isolation Forest

[10] constructs a binary search tree to preferentially isolate those anomalies. Elliptic Envelope [21], fits an ellipse for enveloping these central data points, while rejecting the outsiders. One-class SVM [5]

, an extension of SVM, finds a decision hyperplane to separate the positive samples and the outliers. Local Outlier Factor

[2], uses distance and density to determine whether a data point is abnormal or not. The above open-set learning methods can indeed identify known samples (positive samples) and detect unknown ones (outliers). However, a common and inevitable defect of these methods are that they can never carry out any further classification tasks for the unknown signal classes.

Zero-shot learning is well-known to be able to classify unknown classes and it has already been widely used in image tasks. For example, the work [18] proposed a ZSL framework that can predict unknown classes omitted from a training set by leveraging a semantic knowledge base. Another paper [23] proposed a novel model for jointly doing standard and ZSL classification based on deeply learned word and image representations. The efficiency of ZSL in image processing field majorly profits from the perspicuous semantic attributes which can be manually defined by high-level descriptions. However, it is almost impossible to give any high-level descriptions regarding signals and thus the corresponding semantic attributes cannot be easily acquired beforehand. This may be the main reason why ZSL has not yet been studied in signal recognition.

(a) Max unpooling
(b) Average unpooling
(c) Deconvolution
Fig. 3: The diagrams of max unpooling, average unpooling and deconvolution. (a) Max unpooling with grid of

, where the stride and padding are 2 and 0. (b) Average unpooling with grid of

, where the stride and padding are 2 and 0. (c) Deconvolution with kernel of , where the stride and padding are 1 and 0 respectively.

3 Problem Definition

We begin by formalizing the problem. Let , be the signal input space and output space. The set is partitioned into and , denoting the collection of known class labels and unknown labels, respectively.

Given training data , the task is to extrapolate and recognize signal class that belongs to . Specifically, when we obtain the signal input data , the proposed learning framework, elaborated in the sequel, can rightly predict the label . Notice that our learning framework differs from open-set learning in that we not only classify the into either or , but also predict the label . Note that includes both known classes and unknown classes .

We restrict our attention to ZSL that uses semantic knowledge to recognize and extrapolate to . To this end, we first map from into the semantic space , and then map this semantic encoding to a class label. Mathematically, we can use nonlinear mapping to describe our scheme as follows. is the composition of two other functions, and defined below, such that:

(1)

Hence, our task is left to find proper and to build up a learning framework that can identify both known signal classes and unknown signal classes.

4 Proposed Approach

This section formally presents a non-annotation zero-shot learning framework for signal recognition. Overall, the proposed framework is mainly composed of four modules as follows:

  1. Feature Extractor ()

  2. Classifier ()

  3. Decoder (), and

  4. Discriminator ()

Fig. 2 shows the architecture of feature extractor (), classifier () and decoder (). The feature extractor () is modeled by a CNN architecture that projects the input signal onto a latent semantic space representation. The classifier (), modeled by a fully-connected neural network, takes the latent semantic space representation as input and determines the label of data. The decoder (), modeled by another CNN architecture, aims to produce the reconstructed signal which is expected to be as similar as possible to the input signal. Finally, the discriminator () is devised to discriminate among all classes including both known and unknown.

4.1 Feature Extractor, Classifier and Decoder

The feature extractor networks can be represented by a mapping from the input space to the latent semantic space . It consists of four convolutional layers and two fully connected layers. In order to minimize the intra-class variations in space while keeping the inter-classes’ semantic features well separated, center loss [24] is used. Let and be the label of , then . Assuming that batch size is , the center loss is expressed as follows:

(2)

where denotes the semantic center vector of class in and the needs to be updated as the semantic features of class changed. Ideally, entire training dataset should be taken into account and the features of each class need to be averaged in every iterations. In practice, can be updated for each batch according to , where is the learning rate and is computed via

(3)

where if the condition inside holds true, and otherwise.

The classifier will discriminate the label of samples based on semantic features. It consists of several fully connected layers. Furthermore, cross entropy loss is utilized to control the error of classifier , which is defined as

(4)

where is the prediction of .

Further, auto-encoder [4, 9, 14] is used in order to retain the effective semantic information in . As shown in the right part of Fig 2, decoder is used to reconstruct from

. It is made up of deconvolution, unpooling and fully connected layers. Among them, unpooling is the reverse of pooling and deconvolution is the reverse of convolution. Specifically, max unpooling keeps the maximum position information during max pooling, and then it restores the maximum values to the corresponding positions and set zeros to the rest positions as shown in Fig.

3(a). Analogously, average unpooling expands the feature map in the way of copying it as shown in Fig. 3(b).

The deconvolution is also called transpose convolution to recover the shape of input from output, as shown in Fig. 3(c). See appendix A for the detailed convolution and deconvolution Operation, as well as toy examples.

In addition, autoencoder loss is utilized to evaluate the difference between original signal data and reconstructed signal data.

(5)

where is the reconstruction of signal . Intuitively, the more complete signal is reconstructed, the more valid information is carried within . Thus, the auto-encoder greatly help the model to generate appropriate semantic features.

As a result, the total loss function combines cross entropy loss, center loss and autoencoder loss as

(6)

where the weights and are used to balance the three loss functions. The whole learning process with loss is summarized in Algorithm 1, where , , denote the model parameters of the feature extractor , the classifier and the decoder , respectively.

0:  Labeled input and output set

and hyperparameters

.
0:  Parameters and .
  Initial parameters .
  Initial parameter .
  repeat
     for each batch with size  do
        Update for each :
        Calculate via Eq. (2).
        Calculate via Eq. (4).
        Calculate via Eq. (5).
        .
        Update : .
        Update : .
        Update : .
     end for
  until convergence
Algorithm 1 Pseudocode for SR2CNN Update

4.2 Discriminator

The discriminator is the tail but the core of the proposed framework. It discriminates among known and unknown classes based on the latent semantic space . For each known class , the feature extractor extracts and computes the corresponding semantic center vector as:

(7)

where is the number of data points in class . When a test signal appears and is obtained, the difference between the vector and can be measured for each . Specifically, the generalized distance between and is used, which is defined as follows:

(8)

where is the transformation matrix associated with class and denotes the inverse of matrix . When is the covariance matrix of semantic features of signals of class , is called Mahalanobis distance. When

is the identity matrix

111This is also the only possible choice in the case when the covariance matrix is not available, which happens for example when the signal set of some class is singleton. , is reduced to Euclidean distance. also can be and where is a diagonal matrix formed by taking diagonal elements of and with being the dimension of . The corresponding distance based on and are called the second distance and third distance. Note that when the Mahalanobis distance, second distance and third distance are applied, the covariance matrix of each known class needs to be computed in advance.

With the above distance metric, we can establish our discriminant model which is divided into two steps. Firstly, distinguish between known and unknown classes. Secondly, discriminate which known classes or unknown classes the test signal belongs to. The first step is done by comparing the threshold with the minimal distance given by

(9)

where is the set of known semantic center vectors. Let us denote by the prediction of . If , , otherwise . Owing to utilizing the center loss in training, the semantic features of signals of class

are assumed to obey n-dimensional Gaussian distribution. Thus,

can be set according to the three-sigma rule [19], i.e.,

(10)

where is a control parameter. We also refer to as the discrimination coefficient.

The second step is more complicated. If belongs to the known classes, its label can be easily obtained via

(11)

Obviously the main difficulty lies in dealing with the case when is classified as unknown in the first step. To illustrate, let us denote by the recorded unknown classes and define to be the set of the semantic center vectors of . In this difficult case with , a new signal label is added to and is set to be the semantic center vector . The unknown signal is saved in set and let . While in the difficult case with , the threshold is compared to the minimal distance which is defined by

(12)

total samples # of samples each class # of samples each SNR feature dimension classes (modulations) 220000 20000 1000 11 modulation types 8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, WBFM # of SNR values SNR values 20 -20,-18,-16,-14,-12,-10,-8,-6,-4,-2,0,2,4,6,8,10,12,14,16,18

TABLE I: Standard metadata of dataset 2016.10A. For a larger version, 2016.10B, the class ”AM-SSB” is removed, while the number of samples for each class is sixfold (120000). For a smaller one, 2016.04C, all 11 classes is included, but the number of samples for each class is disparate (range from 4120 to 24940).

Here, the threshold is set as

(13)

where is the median distance between and each , and is used to balance the two distances. The above formula is obtained by following the intuition that is much related to and . To proceed, let denote the number of recorded signal labels in . Then, if , a new signal label is added to and set . Otherwise we set

(14)

and save the signal in . Accordingly, is updated via

(15)

where denotes the number of signals in set . As a result, with the increase of the number of predictions for unknown signals, the model will gradually improve itself by way of refining ’s.

To summarize, we present the whole procedure of the discriminator in Algorithm 2.

0:  Test input , transformation matrices , sets and hyperparametes , .
0:  .
  Calculate .
  Calculate via Eq. (9).
  Calculate via Eq. (12).
  if  then
     Calculate via Eq. (11).
  else if  and  then
     Add to .
      .
  else if , and  then
     Add to .
     .
  else
     Calculate via Eq. (14)
  end if
  Save in .
  update via Eq. (15).
Algorithm 2 Pseudocode for Discriminator

5 Experiments and Results

In this section, we demonstrate the effectiveness of the proposed SR2CNN approach by conducting extensive experiments with the dataset 2016.10A, as well as its two counterparts, 2016.10B and 2016.04C [15]. The data description is presented in Table I. All types of modulations are numbered with class labels from left to right.

Fig. 4: In-training statistics on three datasets. The accuracy is based on the known test set.

Sieve samples. Samples with SNR less than 16 are firstly filtered out, only leaving a purer and higher-quality portion (one-tenth of origin) to serve as the overall datasets in our experiments.

Choose unknown classes.

Empirically, a class whose features are hard to learn is an arduous challenge for a standard supervised learning model, let alone when it plays an unknown role in our ZSL scenario. Hence, necessarily, an completely supervised learning stage is carried out beforehand, to help us nominate suitable unknown classes. As shown in Table

V, the ultimate candidates fall on AM-SSB(3) and GFSK(6) for 2016.10A and 2016.04C, while CPFSK(5) and GFSK(6) for 2016.10B.

Split training and test data. 80% of the samples from the known classes makes up the overall training set while the rest 20% makes up the known test set. For the unknown classes, there’s only a test set needed, which consists of 20% of the unknown samples.

Due to the three preprocessing steps, we get a small copy of, e.g., dataset 2016.10A, which contains a training set of samples, a known test set of samples and an unknown test set of samples.

All of the networks in SR2CNN are computed on a single GTX Titan X graphic processor and implemented in Python, and trained using the Adam optimizer with learning rate and batch size

. Generally, we allow our model to learn and update itself maximally for 250 epochs. However, interestingly, we find in our experiments that the best performance is always achieved in exactly 150 epochs.

5.1 In-training Views

Basically, the average softmax accuracy of the known test set will converge roughly to on both 2016.10A and 2016.10B, while to on 2016.04C, as indicated in Fig. 4. Note that there’s almost no perceptible loss on the accuracy when using the clustering approach (i.e., the distance measure-based classification method described in Section 4) to predict instead of softmax, meaning that the established semantic features space by our SR2CNN functions very well. For ease of exposition, we will refer to the known cluster accuracy as upbound (UB).

It can be inferred that the cross entropy loss remains the decisive factor which affects accuracy the most, as the curves of these two indicators in Fig. 4 roughly imply negative correlation on the whole. During the training course, the cross entropy loss undergoes sharp and violent oscillations. This phenomenxon makes sense, since the extra center loss and autoencoder loss will intermittently shift the learning focus of the SR2CNN.

indicatorscenario 2016.10A 2016.10B 2016.04C supervised ZSL supervised ZSL supervised ZSL accuracy 8PSK (1) 85.0% 85.3% 95.5% 86.7% 74.9% 69.3% AM-DSB (2) 100.0% 66.0% 100.0% 41.3% 100.0% 91.1% BPSK (4) 99.0% 95.5% 99.8% 96.5% 99.8% 97.6% PAM4 (7) 98.5% 95.0% 97.6% 93.4% 99.6% 96.8% QAM16 (8) 41.6% 24.8% 56.8% 40.0% 97.6% 98.4% QAM64 (9) 60.6% 58.0% 47.5% 49.6% 94.0% 97.6% QPSK (10) 95.0% 87.3% 98.9% 90.6% 86.8% 81.5% WBFM (11) 38.2% 46.8% 39.6% 50.4% 88.8% 86.9% 6-6 CPFSK (5) 100.0% 99.5% 100.0% 75.9%/8.4% 100.0% 96.2% 4-48-8 GFSK (6) 100.0% 99.0% 100.0% 95.6%/2.3% 100.0% 82.0% AM-SSB (3) 100.0% 100.0% - - 100.0% 100.0% average total accuracy 83.5% 77.9% 83.6% 72.0% 94.7% 91.5% average known accuracy 79.8% 73.1% 79.5% 68.5% 93.5% 91.6% true known rate - 95.9% - 86.9% - 97.0% true unknown rate - 99.5% - 91.1% - 90.0%

TABLE II: Contrast between supervised learning and our ZSL learning scenario on three datasets. Dash lines in the ZSL column specify the boundary between known and unknown classes. Bold: accuracy for a certain unknown class. Italic: accuracy computed only to help draw a transverse comparision. Items split by slash ”/” like ”75.9%/8.4%” denote the accuracy respectively for two isotopic classes.

Fig. 5: Correlation between true known/unknown accuracy and discrimination coefficient () on three datasets.

5.2 Critical Results

The most critical results are presented in Table V. To better illustrate it, we will firstly make a few definitions in analogy to the binary classification problem. By superseding the binary condition positive and negative with known and unknown respectively, we can similarly elicit true known (TK), true unknown (TU), false known (FK) and false unknown (FU). Subsequently, we get two important indicators as follows:

Furthermore, we define precision likewise as follows:

where denotes the total number of known samples that are classified to their exact known classes correctly, while denotes the total number of unknown samples that are classified to their exact newly-identified unknown classes correctly. Note that sometimes unexpectedly, our SR2CNN may classify a small portion of signals into different unknown classes but their real labels are actually identical and correspond to certain unknown class (we name these unknown classes as isotopic classes) . In this rare case, we only count the identified unknown class with the highest accuracy in calculating .

For ZSL, we test our SR2CNN with several different combinations of aforementioned parameters and , hopefully to snatch a certain satisfying result out of multiple trials. Fixing to 1 simply leads to fair performance, though still, we adjust in a range between 0.05 and 1.0. Here, the pre-defined indicators above play an indispensable part to help us sift the results. Generally, a well-chosen result is supposed to meet the following requirements: 1. the weighted true rate (WTR):  TKR+TUR  is as great as possible; 2. KPUB, where UB is the upbound defined as the known cluster accuracy; 3. 2 for all possible , where denotes the number of isotopic classes corresponding to a certain unknown class .

In order to better make a transverse comparision, we compute two extra indicators, average total accuracy in ZSL scenario and also average known accuracy in completely supervised learning, shown as italics in Table V.

indicatortraining config unknown classes AM-SSB and GFSK CPFSK and GFSK AM-SSB and CPFSK AM-SSB, CPFSK and GFSK accuracy AM-SSB(3) 100.0% - 100.0% 100.0% CPFSK(5) - 71.0% 87.8%/9.0% 65.5% GFSK(6) 99.0% 100.0% - 90.5% average known accuracy 73.1% 68.3% 75.6% 69.6% true known rate 95.9% 89.6% 96.2% 90.9% true unknown rate 99.5% 85.5% 98.4% 85.4% precision known 76.2% 76.2% 78.6% 76.6% unknown 100.0% 100.0% 95.4% 99.9%

TABLE III: Performance among different set of chosen unknown classes on 2016.10A. Bold: recall rate. Item split by slash ”/” like ”87.8%/9.0%” basically are of the same meanings with Table V.

On the whole, the results are promising and excellent. However, we have to admit that ZSL learning somewhat incurs a little bit performance loss as compared with the fully supervised model, especially reflected in the class AM-DSB among all modulations, while reflected in dataset 2016.10B compared with other two datasets. After all, when losing sight of the two unknown classes, SR2CNN can only acquire a segment of the intact knowledge that shall be totally learned in a supervised case. It is this imperfection that presumably leads to a fluctuation (better or worse) on each class’s accuracy when compared with supervised learning. Among these classes, the poorest victim is always AM-DSB, with considerable portion of its samples rejected as unknown ones. Besides, the features, especially those of the unknown classes, among these three datasets are not exactly in the same difficulty levels of learning. Some unknown features may even be akin to those known ones, which can consequently cause confusions in the discrimination tasks. It is no doubt that these uncertainties and differences in the feature domain matter a lot. Take 2016.10B, compared with its two counterparts, it emanates the greatest loss (more than 10%) on average accuracy (both total and known), and also a pair of inferior true rates. Moreover, it is indeed the single case, where both two unknown classes are separately identified into two isotopic classes.

ratedetector SR2CNN IsolationForest [10] EllipticEnvelope [21] OneClassSVM [5] LocalOutlierFactor [2] AM-SSB(3) 100.0% 72.3% 00.0% 100.0% 100.0% 96.3% 26.0% 100.0% GFSK(6) 99.0% 01.3% 00.0% 90.0% 00.0% 00.0% 00.0% 00.0% true known 95.9% 81.3% 99.9% 46.1% 97.6% 85.5% 92.0% 96.7% true unknown 99.5% 36.8% 00.0% 95.0% 50.0% 48.1% 13.0% 50.0%

TABLE IV: Comparision between our SR2CNN model and several traditional outlier detectors on 2016.10A. Bold: performance of the dominant SR2CNN model. Italic: performance of these traditional methods when true known rates reach the highest. Vertical bar ”” is used to split the standard results and the italic ones.

It is obvious that average accuracy strongly depends on the weighted true rate (WTR), i.e., the clearer for the discrimination between known and unknown, the more accurate for the further classification and identification. Therefore, to better study this discrimination ability, we depict Fig. 5 to elucidate its variation trends regarding discrimination coefficient (). At the same time, we introduce a new concept discrimination interval as an interval where the weighted true rate is always greater than 80%. The width of the above interval is used to help quantify this discrimination ability.

Apparently, the curves for the primary two kinds of true rate are monotonic, increasing for the known while decreasing for the unknown. The maximum points of these weighted true rate curves for each dataset, are about 0.4, 0.2 and 0.4 respectively, which exactly correspond to the results shown in Table V. Besides, the width of the discrimination interval of 2016.10B is only approximately one third of those of 2016.10A and 2016.04C. This implies that the features of 2016.10B are more difficult to learn, and just accounts for its relatively poor performance.

Fig. 6: Effect of center loss. The presence of center loss is distinguished by line shape(solid or dash). Interviewees(known or unknown accuracy) are distinguished by line color(blue or green).

Fig. 6 indicates that the usage of center loss on 2016.10A indeed helps our model to discriminate more distinctly, resulting in a notably broader discrimination interval. Still, there’s a thought-provoking point regarding 2016.10B, which is really worth our discussions. Referring back to Fig. 4, we can clearly see that curve of center loss for 2016.10B is the smoothest and converges to the lowest value. However, as a matter of fact, the performance of discrimination task on 2016.10B is actually the poorest, which seems like that 2016.10B has never benefit from the usage of center loss. Thus, revolving on this irregular phenomenon, we assume that it is still the features of the data that dominantly and unshakably determine the performance of discrimination, while center loss only works secondarily to help cluster samples for known classes.

indicatorscenario SIGNAL-202002 supervised learning zero-shot learning accuracy BPSK (1) 84.3% 70.8% QPSK (2) 86.5% 67.8% 8PSK (3) 67.8% 70.3% 16QAM (4) 99.5% 96.8% 64QAM (5) 95.5% 84.8% PAM4 (6) 97.0% 89.0% GFSK (7) 56.3% 38.3% AM-DSB (10) 63.8% 67.3% AM-SSB (11) 44.3% 62.0% 4-4 CPFSK (8) 100.0% 81.0% B-FM (9) 93.5% 74.5% average total accuracy 80.8% 73.0% average known accuracy 77.3% 71.9% true known rate - 82.3% true unknown rate - 84.9% precision known - 87.4% unknown - 91.6%

TABLE V: Contrast between supervised learning and our ZSL learning scenario on three datasets. Dash lines in the ZSL column specify the boundary between known and unknown classes. Bold: accuracy for a certain unknown class. Italic: accuracy computed only to help draw a transverse comparision.

5.3 Other Extensions

We tentatively change several unknown classes on 2016.10A, seeking to excavate more in the feature domain of data. As shown in Table III, both known precision (KP) and unknown precision (UP) are insensitive to the change of unknown classes, proving that the classification ability of SR2CNN are consistent and well-preserved for the considered dataset. Nevertheless, obviously, the unknown class CPFSK is always the hardest obstacle in the course of discrimination, since its accuracy is always the lowest as well as some isotopic classes are observed in this case. When class CPFSK and GFSK simultaneously play in the unknown roles, the performance loss (on both TKR and TUR) is quite striking. We accredit this phenomenon to the resemblances among the classes in the feature domain. Specifically, the unknown CPFSK and GFSK may share a considerable number of similarities with their known counterparts to a certain degree, which will unluckily mislead SR2CNN about the further discrimination task.

To justify SR2CNN’s superiority, we compare it with a couple of traditional methods prevailing in the field of outlier detection. The results are presented in Table

IV. Concretely, when exploiting these methods, a certain sample, which is said to be an outlier for each known class, will be regarded as an unknown sample. Note that there are no unknown classes identification tasks launched, only discrimination tasks are considered. Hence, here, for a certain unknown class , we compute its unknown rate, instead of accuracy, as , where denotes the number of samples from unknown class , while denotes the number of samples from unknown class , which are discriminated as unknown ones. Aforementioned requirement 1. the weighted true rate (WTR):  0.4TKR+0.6TUR  is as great as possible, is employed to help choose several standard results. As expected, SR2CNN stands out unquestionably, while the other traditional methods all confront a destructive performance loss and fail to discriminate normally. Only Elliptic Envelope can limpingly catch up a little. At least, its true unknown rate can indeed overtake 90%, though at the cost of badly losing its true known rate.

6 Dataset SIGNAL-202002

We newly synthesize a dataset, denominated as SIGNAL-202002, to hopefully be of great use for further researches in signal recognition field. Basically, the dataset consists of 11 modulation types, which are BPSK, QPSK, 8PSK, 16QAM, 64QAM, PAM4, GFSK, CPFSK, B-FM, AM-DSB and AM-SSB. Each type is composed of 20000 frames. Data is modulated at a rate of 8 samples per symbol, while 128 samples per frame. The channel impairments are modeled by a combination of additive white Gaussian noise, Rayleigh fading, multipath channel and clock offset. We pass each frame of our synthetic signals independently through the above channel model, seeking to emulate the real-world case, which shall consider translation, dilation and impulsive noise etc. The configuration is set as follows:

20000 samples per modulation type

feature dimension

20 different SNRs, even values between [2dB, 40dB]

The complete dataset is stored as a python pickle file which is about 450 MBytes in complex 32 bit floating point type. Related code for the generation process is implemented in MatLab.

We conduct zero-shot learning experiments on our newly-generated dataset and report the results here. As mentioned above, a supervised learning trial is similarly carried out to help us get an overview of the regular performance for each class of SIGNAL-202002. Unfortunately, as Table V shows, the original two candidates of 2016.10A, AM-SSB and GFSK, both fail to keep on top. Therefore, here, we relocate the unknown roles to another two modulations, CPFSK with the highest accuracy overall, as well as B-FM, which stands out in the three analogy modulation types (B-FM, AM-SSB and AM-DSB).

According to Table V, an apparent loss on the discrimination ability is observed, as both the TKR and the TUR just slightly pass 80%. However, our SR2CNN still maintain its classification ability, as the accuracy for each class remains encouraging compared with the completely-supervised model. The most interesting fact is that, the known precision (KP) is incredibly high, exceeding those KPs on 2016.10A by almost 10%, as shown in Table III. To account for this, we speculate that the absence of two unknown classes may unintentionally allow SR2CNN to better focus on the features of the known ones, which consequently, leads to a superior performance of known classification task.

7 Futrue Direction

It is worth mentioning that there’s still some room for our SR2CNN to improve and mature. For example, an obvious limitation is that, the randomness (each time we set a random value to the seed used to shuffle the test data) in the coming order of the unknown test samples may sometimes greatly derail our SR2CNN during the unknown classification task. To be more clear, consider that the first sample discriminated as an unknown one, is actually an anomaly of its corresponding unknown class (namely, it cannot represent the typical features of its class). In this case, however, SR2CNN is completely unaware of this abnormality, and will still routinely record this improper sample as a newly-identified semantic center, which as a result, can inevitably mess up the classification tasks of those follow-up test samples.

Therefore, when it comes to some further research, our preoccupation basically falls on handling with the uncertainty of the unknown samples, as we demonstrated above. Hopefully, in the near future, we can figure out an approach to strengthen and perfect our SR2CNN so that it can be more robust and omnipotent, and ultimately be widely applied in the ZSL of signal recognition field.

8 Conclusion

In this paper, we have proposed a ZSL framework SR2CNN, which can successfully extract precise semantic features of signals and discriminate both known classes and unknown classes. SR2CNN can works very well in the situation where we have no sufficient training data for certain class. Moreover, SR2CNN can generally improve itself in the way of updating semantic center vectors. Extensive experiments demonstrate the effectiveness of SR2CNN. In addition, we provide a new signal dataset SIGNAL-202002 including eight digital and three analog modulation classes for further research.

Appendix A Convolution and Deconvolution Operation

Let denote the vectorized input and output matrices. Then the convolution operation can be expressed as

(16)

where denotes the convolutional matrix, which is sparse. With back propagation of convolution, is obtained, thus

(17)

where denotes the -th element of , denotes the -th element of , denotes the element in the i-th row and j-th column of , and denotes the -th column of . Hence,

(18)

Similarly, the deconvolution operation can be notated as

(19)

where denotes a convolutional matrix that has the same shape as , and it needs to be learned. Then the back propagation of convolution can be formulated as follows:

(20)

For example, the size of the input and output matrices is and as shown in Fig. 3(c). Then is a 16-dimensional vector and is a 4-dimensional vector. Define convolutional kernel as

(21)

It is not hard to imagine that is a matrix, and it can be represented as follows:

(22)

Hence, deconvolution is expressed as left-multiplying in forward propagation, and left-multiplying in back propagation.

Acknowledgments

The authors would like to thank…

References

  • [1] A. Bendale and T. E. Boult (2016) Towards open set deep networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 1563–1572. Cited by: §1.
  • [2] M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander (2000) LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93–104. Cited by: §2, TABLE IV.
  • [3] G. Cauwenberghs and T. Poggio (2001)

    Incremental and decremental support vector machine learning

    .
    In Advances in neural information processing systems, pp. 409–415. Cited by: §1.
  • [4] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel (2016) Variational lossy autoencoder. arXiv preprint arXiv:1611.02731. Cited by: §4.1.
  • [5] Y. Chen, X. S. Zhou, and T. S. Huang (2001)

    One-class svm for learning in image retrieval

    .
    In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), Vol. 1, pp. 34–37. Cited by: §2, TABLE IV.
  • [6] G. C. Garriga, P. Kralj, and N. Lavrač (2008) Closed sets for labeled data. Journal of Machine Learning Research 9 (Apr), pp. 559–580. Cited by: §1.
  • [7] S. C. Hauser, W. C. Headley, and A. J. Michaels (2017) Signal detection effects on deep neural networks utilizing raw iq for modulation classification. In MILCOM 2017-2017 IEEE Military Communications Conference (MILCOM), pp. 121–127. Cited by: §1.
  • [8] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2.
  • [9] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner (2017) Beta-vae: learning basic visual concepts with a constrained variational framework.. ICLR 2 (5), pp. 6. Cited by: §4.1.
  • [10] F. T. Liu, K. M. Ting, and Z. Zhou (2008) Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. Cited by: §2, TABLE IV.
  • [11] M. Liu, W. Wu, Z. Gu, Z. Yu, F. Qi, and Y. Li (2018)

    Deep learning based on batch normalization for p300 signal detection

    .
    Neurocomputing 275, pp. 288–297. Cited by: §1.
  • [12] A. V. Makarenko (2016) Deep learning algorithms for signal recognition in long perimeter monitoring distributed fiber optic sensors. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. Cited by: §2.
  • [13] D. B. A. Mezghani, S. Z. Boujelbene, and N. Ellouze (2010) Evaluation of svm kernels and conventional machine learning algorithms for speaker identification. International journal of Hybrid information technology 3 (3), pp. 23–34. Cited by: §1.
  • [14] A. Ng et al. (2011) Sparse autoencoder. CS294A Lecture notes 72 (2011), pp. 1–19. Cited by: §4.1.
  • [15] T. J. O’Shea, J. Corgan, and T. C. Clancy (2016) Convolutional radio modulation recognition networks. In International conference on engineering applications of neural networks, pp. 213–226. Cited by: §2, §5.
  • [16] T. J. O’Shea, J. Corgan, and T. C. Clancy (2016) Unsupervised representation learning of structured radio communication signals. In 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), pp. 1–5. Cited by: §1.
  • [17] T. J. O’Shea, T. Roy, and T. C. Clancy (2018) Over-the-air deep learning based radio signal classification. IEEE Journal of Selected Topics in Signal Processing 12 (1), pp. 168–179. Cited by: §2.
  • [18] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell (2009) Zero-shot learning with semantic output codes. In Advances in neural information processing systems, pp. 1410–1418. Cited by: §1, §2.
  • [19] F. Pukelsheim (1994) The three sigma rule. The American Statistician 48 (2), pp. 88–91. Cited by: §4.2.
  • [20] D. A. Ross, J. Lim, R. Lin, and M. Yang (2008) Incremental learning for robust visual tracking. International journal of computer vision 77 (1-3), pp. 125–141. Cited by: §1.
  • [21] P. J. Rousseeuw and K. V. Driessen (1999)

    A fast algorithm for the minimum covariance determinant estimator

    .
    Technometrics 41 (3), pp. 212–223. Cited by: §2, TABLE IV.
  • [22] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult (2012) Toward open set recognition. IEEE transactions on pattern analysis and machine intelligence 35 (7), pp. 1757–1772. Cited by: §1.
  • [23] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng (2013) Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pp. 935–943. Cited by: §1, §2.
  • [24] Y. Wen, K. Zhang, Z. Li, and Y. Qiao (2016)

    A discriminative feature learning approach for deep face recognition

    .
    In European conference on computer vision, pp. 499–515. Cited by: §4.1.