Classification of simulated radio signals using Wide Residual Networks for use in the search for extra-terrestrial intelligence
We describe a new approach and algorithm for the detection of artificial signals and their classification in the search for extraterrestrial intelligence (SETI). The characteristics of radio signals observed during SETI research are often most apparent when those signals are represented as spectrograms. Additionally, many observed signals tend to share the same characteristics, allowing for sorting of the signals into different classes. For this work, complex-valued time-series data were simulated to produce a corpus of 140,000 signals from seven different signal classes. A wide residual neural network was then trained to classify these signal types using the gray-scale 2D spectrogram representation of those signals. An average F_1 score of 95.11% was attained when tested on previously unobserved simulated signals. We also report on the performance of the model across a range of signal amplitudes.READ FULL TEXT VIEW PDF
Classification of simulated radio signals using Wide Residual Networks for use in the search for extra-terrestrial intelligence
Advances over the last two decades in neural network training algorithms, increased computational power and available data have had astonishing success with automatic image classification and similar applications. In this work, we apply these techniques to the unique case of signal classification of time-series radio signals.
In a typical ETI search at radio frequencies, a radio telescope observes signals emanating from selected directions on the sky. After down-conversion and digitization, the raw data output of the telescope is a time series of digital voltage samples representing the electromagnetic field in the focal plane of the telescope. At the Allen Telescope Array (ATA), an array of 42, 6-meter dual-polarity offset-Gregorian radio telescopes[Welch2009], a specialized program, called SonATA, sifts through these time series data looking for weak radio signals with telltale signs of artificial origin. At the heart of SonATA is a sensitive algorithm (Doubling Accumulation Drift Detector or DADD) which uses conventional digital signal processing techniques honed over decades of effort[Cullers1985]
. DADD effectively detects just one kind of narrowband signal, a tone with a frequency that drifts linearly with time. While DADD has a low probability of generating false negatives, there are many different kinds of narrowband signals that generate false positives[HarpRichardsTarterApJ2016]. However, for a human, these signals are easily distinguishable from a drifting tone.
In this paper, we implement convolutional neural networks (CNN) for the purposes of signal detection and classification, which may one day replace or complement the use of the DADD algorithm. The CNN is a sensitive (low false negative rate) detector of several types of narrowband signals and additionally classifies those signals by type. The classifier makes our SETI search even more effective since signals generating false positive detections with DADD are correctly classified, effectively eliminating false positives without additional telescope time. Moreover, the CNN is a reliable detector of many different signal types (instead of just one). This expands our SETI search many-fold by generalizing the search to include a number of different signal types.
A common approach to signal detection is to transform the demodulated time-series signal to a spectrogram. The utility of a spectrogram is highlighted with the following signal observed from the ISEE3 satellite on August 20, 2013. Figure 0(a) shows the modulus of the time series dominated by noise and with a weak signal superimposed. The time series has been broken up (reshaped) into rows of length 1000 points and then arranged in time order from top to bottom. Visually, this representation of the time series is indistinguishable from pure noise.
A spectrogram is a representation of a signal’s power across the available frequency band, estimated by the squared modulus of the Fourier transform, as a function of time. To produce the spectrogram of the time series inFig. 0(a), the data in each row is passed through a frequency filter bank (based on the fast fourier transform) and the squared modulus of the resultant rows are arranged into a two dimensional array as before (Fig. 0(b)). Striking evidence of the superimposed signal is observed in the spectrogram.
In both Fig. 0(a) and Fig. 0(b), the same data is presented as a 2D representation, suitable for input to a machine vision algorithm. The difference is that we have used our domain knowledge to choose the representation where the signal features are most easily detected. The Fourier transformation efficiently adds structure that CNNs can utilize, while we do not know of any general purpose neural network architecture that can learn from the complex time-series with a reasonable quantity of sample data or within a reasonable training time. Training on data similar to Fig. 0(b) will succeed in producing an effective data classifier with a modest training set (demonstrated below).
It should be noted that deep neural networks have been used in other fields for the purpose of classifing spectrogram representations of time-series data; from music instrument recognition to noise detection at the gravitational-wave observatory, at the Laser Interferometer Gravitational-Wave Observatory (LIGO), and recently to discover new exoplanets with Kepler Space Telescope data [DBLP:journals/corr/ParkL15a, 2017arXiv170607446G, 2018ShallueVanderburgKepler90].
For full context of this work, we briefly describe normal signal detection operations at the ATA. For twelve hours each day, the SETI Institute searches for persistent radio signals emanating from an extra-terrestrial source. The telescope array combines observations from up to 42 dishes into three, dual-polarization beams which can be pointed anywhere within the large ATA field of view (FOV, frequency dependent with a FWHM of 3.5 at 1 GHz), in a process called beamforming. These pencil-shaped synthetic beams measure only a small region of the sky at one time (FWHM 0.1 at 1 GHz) centered on a target of interest. After downconversion and digital sampling, each beam produces voltage time series data at 104 million complex samples per second (MCSs), representing about 70 MHz of useful radio frequency bandwidth and center frequency tuned anywhere from 1-10 GHz.
The six data streams (three beams, two polarizations) are digitally processed as follows. A cascade of polyphase filter banks divide each high-speed beam stream into many parallel low-speed time series, each representing a sliver of about 8 kHz bandwidth. From here, each low-speed time series is examined independently for evidence of artificial signals. A spectrogram is created and then searched with the DADD algorithm.
The full data acquisition system is managed by SonATA. It manages the nightly observations, controls beamforming, maintains and uses an RFI database for comparison with observed signals, and performs signal classifications based on signal characteristics estimated from the DADD algorithm. SonATA also manages the follow-up observation procedure when signals are classified as candidate signals. A candidate signal means the data exhibits sufficient traits of a narrowband signal.
After a candidate signal is found, SonATA performs a series of tests for persistence. In particular, SonATA directs the ATA to make observations away from the direction from which the candidate signal appeared in order to determine that the candidate signal is not observed elsewhere, and then reforms a beam toward the original location to affirm its continued presense. This process repeats up to five times, at which point a human is notified. These series of tests can take many tens of minutes to perform.
In comparison to a neural net classifier, which we show can classify many different signal types at once, the DADD algorithm is limited. DADD can reliably find narrowband signals in spectrograms (low false negative rate) but can easily be fooled by signals that are clearly of another class (high false positive rate). Observing time is wasted every time DADD identifies a false positive, which must be followed up with additional observing. From this perspective, the neural net enabled classifier described here is more flexible and can potentially speed up the SETI search. On the other hand, a robust and efficient multi-class classifier opens up the possibility for SETI research to consider different signal classes beyond narrowband. For example, a signal that is initially detected as narrowband by the DADD may later possess a stochastically varying central frequency or amplitude modulations that make DADD detection unreliable. A classifier trained on signals with these different characteristics, however, would be more reliable.
Recently, attemps have been made to cluster and classify candidate signals found by SonATA/DADD in the data set from 2013 to 2015, which contained slightly more than 4 million candidate signals. Promising approaches included one technique that utilized simple affine transformation followed by a comparator to examples of known signal types, and another technique which utilized an autoencoder to extract a subset of features from spectrogram and then attempted to perform clustering with the t-SNE alorithm[Luus2018VAEForSETI]. However, neither approach resulted in satisfactory classification capabilities. As such, this approach to develop a classifier based on a set of simulated data was conceived.
The simulated data set was designed to achieve a number of goals: 1) reproduce real SonATA data in basic structure, 2) simulate a feasible number of different signal classes, 3) test machine-learning model performance over a range of signal strengths, 4) produce signals with distinct characteristics between the classes, yet still produce signals that may be difficult to classify. Ultimately, we simulated 140,000 time-series signals for the training data. There were six signal classes, plus a set of simulations that contained only background noise.
The data acquisition system at the ATA demodulates the observed signals from the GHz range and digitizes with 8-bit digitizers. This produces a complex-valued time-series data set with 8-bits for the real and imaginary components. As such, our simulation program also outputs complex-valued (8-bit real, 8-bit imaginary) time-series data. It should be noted that the ”digitization rate” of the simulated data are unspecified, which means the time- and frequency-resolution in the spectrograms is arbitrary. This has no effect, however, on the algorithm used to classify signals.
Based on domain knowledge from SETI Institute researchers, six signal classes were chosen to be simulated. These
classes represent, roughly, the more common signal types that have been observed. These signal types were
also relatively simple to simulate, making them attractive choices for this study. Signal types are specified by
the apparant ”shape” of the signal when observed as a gray-scale spectrogram.
In the examples in this paper, white was chosen to
represent the largest amplitude values and black
are zero (Fig. 2). The labels for the six simulated signal classes
were brightpixel, narrowband, narrowbanddrd,
squarepulsednarrowband, squiggle, and
squigglesquarepulsednarrowband. The label for the simulated data files that were noise only was noise. In the full training set there were 20,000 simulations for each of the seven classes.
Examples of the six simulated signal types. A seventh signal class was also simulated, but contained only gaussian white noise and is not shown here.
The simulated signals in the time-domain can all be described with a single equation,
where is the signal amplitude, is the frequency, and is a random phase offset. The noise component of all simulated signals, , was generated by random sampling from a Gaussian with zero mean and width of for each of the real and imaginary components at every . The seven signal types were generated using the following frequency and amplitude functions. The frequency function is
where is the starting frequency, is the drift rate, and is its derivative. The term is a uniformly sampled random value between -1 and 1 that is updated at each time step. The expression corresponds to a random walk in frequency versus time and the arbitrary constant is referred to as the squiggle amplitude. The time-dependent amplitude is
where is a square-wave modulation function that depends on a period, , a duty cycle, and starting phase, .
Each signal class was defined by a combination of setting some parameters to 0, and allowing the others to be randomly sampled (LABEL:tab:simparams). For all signals types, the starting frequency, , was sampled between [, ] and the drift rate, , was sampled between [-7.324e-6, 7.324e-6].
The narrowband class (Fig. 1(a)) is the simplest signal type, defined by , , and , where is the full length of the simulation, which was fixed at . The equation for narrowband reduces to
The squarepulsednarrowband (Fig. 1(d)), squigglesquarepulsednarrowband (Fig. 1(e))
and brightpixel (Fig. 1(f)) all have a time-dependent amplitude. For the squarepulsednarrowband and squigglesquarepulsednarrowband, the square-wave period, , and duty cycle, , were uniformly sampled within the range specified in LABEL:tab:simparams. For the brightpixel, the period was fixed at and the duty cycle was uniformly sampled within a much smaller range, . The square-wave phase , which sets the start-time of the modulation, was uniformily distributed between 7% and 93% of the full length of the signal, preventing the possibility of the signal only being found near the very beginning or end of the simulation. Finally, signals of the noise class were defined by .
The range of values for the simulation parameters were tuned, by hand, to produce a set of simulated signals that appeared to match the real ATA data sets. They were also tuned in such a way to generate signals from one class that appeared very similar to signals from another. For example, narrowband signals with very small values of appear much like noise (Fig. 3), while narrowbanddrd signals with small values of , and squiggle signals with small values of appear very much like narrowband (Fig. 3(a) and Fig. 3(b), respectively).
As may be noticed by those with signal processing experience, this model would generate spectrograms that would alias high-frequency signals to low (negative) frequency once . Similarly, low-frequency signals would alias up to high-frequencies in the spectrogram once . To prevent this aliasing effect, code was included in the simulation that set when the frequency crossed either of these thresholds.
We frame the challenge of classifying radio signal time series of complex amplitudes as an image recognition task on their 2D spectrograms and apply Wide Residual Network with 34 convolutional layers and a widening factor of 2.
The motivation to recast signal classification as an image classification problem comes from the fact that humans regularly classify these signals by inspecting a spectrogram. For this image recognition task, a spectrogram for a given time series of complex amplitude is created by first reshaping the data to height and width more appropriate for our neural network classifier. Then a Hanning window and Fourier transform are applied. The square of the absolute value of the result provides the spectrogram, which we take the logarithm of to produce one of the images to be supplied to our convolutional neural network. The second is the phase of the Fourier transformed signal, which was included for the purpose of providing additional useful features to the neural network models tested. Although a followup investigation indicated it did not increase the classification accuracy of our final model trained on simulated signals, phase may be useful when using real signals that have properties not included in our simulations, such as polarization, and is an area of future research.
The reciprocally dependent time- and frequency-resolution of a spectrogram can be arbitrarily chosen within the bounds of the total length of the signal. This choice results in spectogram images of different shapes (height and width). While these differences can change signal detectability as a human perceives it, for the purposes of classification by a convolutional neural network, we choose an aspect ratio close to square, since downsampling will reduce both horizontal and vertical resolution significantly, and a reduction to near zero would be unproductive.
Deep learning is an active area of research in which the advances of deep neural networks have produced near-human or superhuman accuracy in supervised machine learning tasks, including image recognition tasks [AlexNet]. Convolutional neural networks (aka convnets or CNNs) are at the core of these advances, and the gains in state-of-the-art performance demonstrated by ResNet is a go-to example[ResNet]. To accurately classify these images generated from radio signal time series, we test multiple convolutional neural network architectures in the classification task using only the spectrogram and phase images as input features.
The milestone advance in image classification performance seen in the 2015 ImageNet showing by the ResNet team suggests that this tried-and-true network is a suitable baseline for conducting these experiments. Augmentations and further improvements to the ResNet architecture have been presented since, so we tested these in a benchmark against ResNet and each other to determine which performed best on this particular dataset. Although these competing networks were able to show state-of-the-art performance on various tasks and datasets, their performances are close enough that for this task, somewhat different from the typical benchmarks for image recognition, empirical evidence is sought to determine which network best classifies simulated radio signal spectrograms.
ResNet introduced the residual connection between convolutional layers in a very deep convolutional neural network in order to combat the loss of signal backpropagation. Previously, very deep networks tended not to train successfully, but this skip connection that bypasses a layer’s nonlinearity allows gradients to backpropagate further in the network and allow for deeper, more expressive networks. Hence, we experimented with ResNets up to the limit of computational constraints for these medium-resolution images.
The development of wide residual networks was based on the observations that increasing the depth of ResNet provided diminishing returns on network performance improvement, and offered shallower convolutional networks with more convolutional filters at each layer as a solution[WideResNet]
. Additionally, the authors change the order of convolution, batch normalization, and activation, and add dropout to tune their architecture to train faster and perform competitively with deeper ResNet models. In our experiments, we used the variations they had the most success with; namely, using the same 3x3 basic convolutional block (as opposed to a bottleneck block), and a dropout rate of 0.3.
DenseNet extends the idea of residual connections introduced by ResNet by adding residual connections not only between consecutive convolutional layers, but also between all subsequent convolutional layers[DenseNet]
. The dense residual block allows the gradient signal to skip more layers, and which more closely ties the loss function to earlier layers of the network. The additional skip connections also are thought to encourage feature reuse by sending signal from multiple convolutional layers to later layers, leading to more expressive power in a more compact network.
Finally, we experimented with dual path networks, which integrate both residual networks and densely connected residual networks to realize the advantages of each, while sharing weights to maintain a reasonable model complexity[DualPathNetworks].
We trained several models for the ResNet, WRN, DenseNet, and DPN architectures, since each architecture can be implemented with varying sizes. The accuracy of a learned classifier depends on this size because less complex networks cannot necessarily express as complex a pattern as a larger network, but the larger networks do not necessarily succeed in learning the patterns they can express. Table 1 lists the models trained for each architecture type, with the best performing model shown in bold.
|ResNet||ResNet-18, ResNet-50, ResNet-101||94.99||42.6M|
|ResNet-152, and ResNet-203|
|Wide ResNet||WRN-34-2, WRN-16-8,||95.77||1.9M|
|Dual Path Network||DPN-92, DPN-98, and DPN-131||95.08||35.1M|
The accuracies included in Table 1 are validation accuracies on the same train-validate split for all architectures. The performances of the four best models are comparable, with the WRN-34-2 (Fig. 7) only slightly outperforming the rest. However, a very significant basis for choosing the WRN-34-2 over the other models is the size of the model. The WRN-34-2 contains only 1.9M parameters (also called weights). In comparison, the second smallest model, the DenseNet-201, is nearly ten times as large. The ResNet-101 is the largest, with over twenty times as many parameters. For the purposes of quick training and inference on-site at the ATA, it is ideal to have a small memory footprint.
The comparison of these four network architectures showed similar and encouraging results, but the edge that WRN-34-2 had over the others led us to continue working to improve its performance. A common practice to increase generalizability of a machine learning model that threatens to overfit on training data is to create an ensemble of models. For this task, ensemble averaging was used. Five WRN-34-2 models were trained on different training data but with the same hyperparameters and training strategy. While the comparative study of architectures used the same four of five folds for training and the fifth for validation, the ensemble members trained on the five distinct four-fold subsets of data, as with k-fold validation for.
To evaluate the ensemble model, each of the five member models outputs its softmax predictions and the average of these scores is taken as the final score. With no validation data left over, this ensemble model was evaluated on new test data provided as part of the code challenge, which yielded an accuracy of 94.99%.
The model with the best validation set accuracy, the WRN-34-2 using a 5-fold averaging, was then tested using a separate test set withheld during the training phase of all models. As this work was performed in the context of an online code challenge, the other models described in the previous section were not tested with this test data set.
|Actual / Predicted||bp||nb||drd||no||sqnb||sgl||sglsqnb|
Clearly the largest source of uncertainty was distinguishing brightpixel signals from noise. As we’ll see in the next section, this was due to very low-amplitude brightpixel signals, as one would intuitively expect.
In order to briefly explore the performance characteristics of this trained model in a controlled way, 14 new sets of test data were generated, each with 250 signals of each class. For each set, the signals were simulated with a fixed signal amplitude, 0.008, 0.01, 0.02, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.12, 0.16, 0.2, or 0.4. Many of these signal ampliutdes, it should be noted, are below the amplitudes of signals found in the training data set, allowing us to explore the model performance slightly outside of range of signal amplitudes on which it was trained. All other parameters of the test data, however, remained consistent with the training data.
For each of these test sets, we performed inference and recorded the model’s multinomial cross-entropy loss, classification accuracy (Fig. 7(a)), and score (Fig. 7(b)). The model performs as expected. Signals with smaller amplitudes were more difficult to classify and tend to be classified as noise. Also, the model does not appear to have any classification power outside of the trained signal amplitude space.
Furthermore, the onset rise of each class’s score (Fig. 7(b)) can be intuitively explained by considering the average amount of power per pixel in each signal, akin to the apparent brightness of the signal in the spectrogram. The brightpixel
signals have lowest total power, overall. These signals have zero amplitude for most of the simulation except for a brief moment at a particular frequency. As such, the model struggles to recognize these signals the most. The simulations that contain a non-zero(squiggle and squigglesquarepulsednarrowband) also have a reduced apparent brightness in the spectrogram. The stochastic flucutations of these signals result in the power being spread across a larger bandwidth of frequencies during any particular time range of the simulation, as compared with narrowband. As such, one expects the classification and recall of those signals to be reduced relative to the other signals that appear brighter in the spectrogram. The narrowband and narrowbanddrd have the earliest onset of significant scores, followed by squarepulsednarrowband, consistent with the argument that the brighter signal types are more easily recognizable.
This work demonstrates the potential usefulness of applying contemporary convolutional neural networks to SETI research. With further improvements, these algorithms may soon become part of SETI research data acquisition systems.
Subsequent work building on the results here could focus on a number of aspects. Clearly, the precision and recall scores for the different signal classes were quite good. The most immediate next step would be to build new models trained on data generated using a larger range of simulation parameters for each class (LABEL:tab:simparams). Additionally, more signal types could be added to training and test sets. For example, a common signal type not included in our repertoire is a short burst of power over all frequencies sometimes identified with radar pulses.
The model’s performance falloff just at the lower-bound of the signal amplitudes found in the training data (Section 4.2) leads to a particular question: to what small signal amplitude can we train a model of the same network architecutre and still retain robust classification accuracy? That is, if we were to construct new training data with smaller signal amplitudes and retrain new models, how small in ampliutude can we go before the models fail to accuractly classify signals?
Besides new signal types and characeteristics, the noise component of the signal, , could be more realistic. For this study, we used a very simple gaussian white noise model. In a previous version of the training data set, however, we used real observations from the Sun as the background component, . Observations from the Sun do not have constant power at all frequencies and are non-stationary. This background data was not used in this work, but may be the basis for simulations in future work.
Real observations from the ATA do not always contain just one signal or signal type in the spectrogram. Therefore, another future work would be to include a more complete set of commonly-observed signal types within the same simulation and build models that can find multiple signal types with techniques similar to those used to perform object detection in ordinary daily human photographs.
The set of simulated data could also be improved through the use of some type of generative network. This would require some effort, though not unreasonably burdensome, to hand-label real spectrogram observations. An autoencoder and t-SNE clustering approach was already taken with ATA data from 2013 to 2015 to cluster signals [Luus2018VAEForSETI]. Although these simulations from generative network model would appear to be more realistic, the signals would not have controllable (or learnable) parameters such as , , , etc.
Finally, the DADD algorithm currently in use at the ATA estimates a linear fit to observed signals, with parameters including the signal power, initial signal frequency, , and drift rate, . This regression is important because it allows for prediction of the future signal frequency, which is needed to identify the same signal in a subsequent observation. We speculate that either the parameters of human-constructed models (Eqs. 1 - 3
) for each signal class could be estimated through regression after signals classification with the 2D CNN, or that some type of recurrent neural network, or causal convolutional architecture would be able to estimate signal characteristics and predict future signal frequency by using each row of the spectrogram as the input vector at each time-step.
In conclusion, we report here the adaptation of a convolutional neural network, specifically a wide residual network, to the problem of signal discovery and classification relevant to SETI. We find that by treating spectrograms as if they were images, we can train an image classifying network and achieve very good results. The sensitivity of the neural network detector is comparable to that of a finely honed conventional processing algorithm, and that the signal classification accuracy is quite good. Using such signal classifications will improve the efficiency of radio SETI, by providing extra information that can be used to decide whether to follow-up on detected signals. Compared to the conventional algorithm, the neural network approach lacks the parameterization of signals necessary for extrapolation to later times. We suggest that this limitation may be overcome with a traditional approach or with additional neural networks.
We’d like to thank Galvanize, Skymind, Nimbix, and The SETI League for their financial contribution and to the hard work put in by many employees of those organizations that ensured a successful code challenge. Thanks to IBM for providing significant compute and data storage. We acknowledge the helpful suggestions from Francois Luus of IBM Research South Africa. Special thanks to Graham Mackintosh for establishing the IBM-SETI Institute partnership.