Signal Clustering with Class-independent Segmentation

by   Stefano Gasperini, et al.

Radar signals have been dramatically increasing in complexity, limiting the source separation ability of traditional approaches. In this paper we propose a Deep Learning-based clustering method, which encodes concurrent signals into images, and, for the first time, tackles clustering with image segmentation. Novel loss functions are introduced to optimize a Neural Network to separate the input pulses into pure and non-fragmented clusters. Outperforming a variety of baselines, the proposed approach is capable of clustering inputs directly with a Neural Network, in an end-to-end fashion.



There are no comments yet.


page 4


DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation

Many deep learning techniques are available to perform source separation...

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning ...

Performance Based Cost Functions for End-to-End Speech Separation

Recent neural network strategies for source separation attempt to model ...

Learning Neural Models for End-to-End Clustering

We propose a novel end-to-end neural network architecture that, once tra...

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Deep clustering is the first method to handle general audio separation s...

Orthonormal Embedding-based Deep Clustering for Single-channel Speech Separation

Deep clustering is a deep neural network-based speech separation algorit...

Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis

In cocktail party listening scenarios, the human brain is able to separa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Radars are used for situation awareness in a variety of applications, from weather forecasting to adaptive cruise control. Aircrafts utilize them also as a sensor, inspecting the received signals to get insights about their surroundings. Towards this end, signals are initially isolated and then compared against prior knowledge for identification. In this paper, we focus on separating simultaneous and aligned signals by source.

Radar signals are composed of pulses, described by time of arrival (TOA), radio frequency (RF), pulse width (PW), amplitude (AM) etc. [14]. This task is challenging, especially because the separation is done at pulse level, based on a few parameters, that can be shared across multiple sources.

Furthermore, due to the rapid progress of antenna technologies and electronics [14]

, traditional methods cannot cope anymore with the increased complexity of incoming signals. Based on statistics and pattern matching, they struggle even with a few concurrent inputs 

[10]. To overcome this issue, the task has been addressed with data analysis clustering methods, among which DBSCAN reached superior performance and wide applicability [16, 9].

Over the past years, Deep Learning (DL) has achieved state-of-the-art results in a plethora of tasks, including clustering [1]. In the imaging domain, Neural Networks (NNs) are trained to map the inputs to a clustering-friendly representation [1, 17, 5, 3]. Afterwards, these newly extracted features are clustered with a commonly used data analysis method [1, 12]

, such as K-means 

[20] or Hierarchical agglomerative clustering [21]. The same idea has been applied to the speaker separation task, using K-means [4]. Pairing a NN with traditional clustering methods leads to improved performance, but inherits shortcomings from both approaches, such as requiring a predefined number of clusters and a sufficiently large dataset for training a NN. Additionally, these methods would cluster pulses based on the similarity of their parameters, missing the aim of our task, which is grouping them by source. Furthermore, mapping separable features requires large input dimensions, while our pulses have only a few parameters.

In radar signal processing, NNs have been mostly applied to address classification and identification problems [2, 11, 6]. Recently, Recurrent NNs (RNNs) have been deployed to cluster pulses [7]. Each RNN was trained to identify a specific signal and checked at every possible starting pulse of a sequence. Tested on 5 simple signals [7], this method would not be feasible in real world applications with thousands of emitters, each capable of producing multiple signals. It would require long processing times and could not separate those signals for which an RNN has not been trained.

Figure 1: Domain change from signal to image processing.

The contribution of this paper can be summarized as follows: 1) We propose an original NN-based clustering method and we evaluate it on the challenging task of aligned radar pulse deinterleaving; 2) We introduce novel loss functions, aimed at delivering pure and non-fragmented clusters, while performing class-independent segmentation; 3) To the best of our knowledge, for the first time, we address clustering as an image segmentation problem.

2 Method

The idea is changing the problem domain towards imaging, to benefit from the extensive DL research done in the field. Image segmentation is the task of separating pixels into regions according to some given criteria, alike clustering, which groups elements based on similarity. Towards this end, we aim at exploiting image segmentation techniques to perform clustering and tackle the radar pulses deinterleaving problem.

Our proposed method enables a NN to cluster directly input elements through image segmentation. In this paper, we apply it on radar signals deinterleaving. Towards this end, we first encode the signals into spectrogram-inspired segmentable images (Section 2.1). These are then forwarded to a U-Net [13]

trained for image segmentation to group the inputs. The NN is optimized with novel loss functions derived from a newly adjusted confusion matrix, that we named soft confusion matrix (Section 

2.3). The objective functions aim at improving clustering performance indicators such as purity and fragmentation [15] (Section 2.2).

2.1 From Signals to Segmentable Images

To cross from the radar to the imaging domain, we need to encode the signals into images. Spectrograms are graphical representations of signals, showing RF over time. For radars, keeping the PW resolution within reasonable images of 512x512 pixels, would visualize too few pulses per signal, if any, complicating the clustering task.

Signals encoding. Towards this end, our representation is inspired by spectrograms, as we encode concurrent signals into an image, using it as a RF-TOA coordinate system. We keep a time resolution of 5 , which allows for the encoding of a reasonable amount of pulses within each 512x512 image. We indicate the pulses by marking in the RF-TOA grid those cells corresponding to incoming pulses at time , parametrized with an RF value of . Pixels have gray-scale values representing PW and AM in two separate channels. The encoding is schematized for PW in Fig. 1, the same is done for AM. We maximize the resolution by scaling each parameter individually to cover all its available range: [0, 512] for RF, and (0, 1] for PW and AM. This further diversifies the values, easing the separation task. Moreover, we improve readability, by extending the values across the neighboring 3x3 pixels. Overall, with this transformation no major loss of information occurs, and discretization can help against noise.

Our problem differs substantially from traditional image segmentation, since instead of identifying a preset amount of classes, we deal with an unknown and variable number of clusters. Nevertheless, we can retrieve the cluster assignments from a segmented image, by remembering the pulses location from the encoding step, filtering the background and applying majority voting within each 3x3 region.

2.2 Confusion Matrix-based Metrics

Figure 2: Confusion matrix and properties for evaluation.

We choose a U-Net architecture [13], due to its state-of-the-art performance in a variety of image segmentation tasks. While typically U-Nets are trained with pixel-wise cross-entropy or Dice loss, these functions are designed for a fixed amount of classes and are not suitable for clustering.

Instead, to address clustering problems, we derive new loss functions, that maximize performance metrics. Such indicators can be formalized with the help of a confusion matrix [15], like the one showed in Fig. 2. The matrix has the predictions along the horizontal axis, with the cells showing the elements count. With we refer to the cell of the confusion matrix , sized ; with and to its -th row and -th column respectively.

Cluster Purity. The first step is identifying the column maxima , shown green in Fig. 2. This matches each predicted cluster with a ground truth. In the example, what was predicted as cluster ”4” corresponds to ”B”. Cluster purity is defined as : it assesses how many elements define the identity of each predicted cluster. A system would score perfect purity predicting a different cluster for each input element, while performing rather poor.

Fragmentation Ratio. To address these cases, we additionally consider fragmentation ratio. It compares the number of fragments, marked with a circle in Fig. 2, with the amount of predicted clusters . It is defined as . Fragments occur when a row contains more than one column maxima.

2.3 Overcoming Non-differentiability

The two metrics are non-differentiable: they are extracted from a confusion matrix, which is based on an argmax function over the assignment probabilities. To circumvent this issue, we take a step back and compute the losses directly from the softmax probabilities along the predicted clusters.

Soft Confusion Matrix. Towards this end, we create a new matrix, that we name soft confusion matrix (SCM). Its construction process is shown in Fig. 3. Instead of reporting the discrete argmax assignments, we build the SCM with the softmax probabilities for each input element. We report the averages within its 3x3 pixels along its corresponding SCM row. After repeating this for all input elements, we complete the SCM by summing the probabilities accumulated within each cell. We construct the SCM using only differentiable operations, so we can compute our losses with its values. Following the notation of Section 2.2, is the SCM, sized , with being the amount of output channels of the NN.

Figure 3: Construction of the soft confusion matrix.

Soft Cluster Impurity Loss. From the cluster purity metric (Section 2.2), we derive its differentiable complement from the SCM. This is the cluster impurity loss, which maximizes the column maxima . It is defined as in Eq. 1.


Soft Cluster Fragmentation Loss. Analogously, soft fragmentation loss corresponds to the fragmentation metric (Section 2.2), which is computed with non-differentiable counting functions. We make it suitable for training, by applying the trick in Eq. 2, using only differentiable operations.


With reference to Eq. 2: is the set containing the SCM cells responsible for fragments, which are penalized by this loss; is the set of SCM column maxima within the -th row; is the Hadamard element-wise division. Therefore

is a unit vector with a length corresponding to the amount of fragments, and the

numerator is the differentiable equivalent of counting them.

Additionally, at each training iteration we randomly swap the cluster target values. With this strategy, we ensure the NN focuses on grouping, rather than learning specific associations. This way we also achieve class-independence.

The key of our method is the combination of the two aforementioned novel loss functions, which enables predicting the clusters directly, and addressing the clustering problem as an image segmentation task.

3 Experimental Setup

Dataset. Since there was no suitable public dataset available, we created one with the help of a domain expert. 140 realistic signals were defined and generated with an RF software simulator. Training and test sets were designed separately, increasing the complexity: 75 signals are utilized for training, 65 for testing. To create the dataset, we randomly made concurrent signals from different sources. Since a good clustering method should deal with a variable amount of clusters, random 2.56 portions of a signal appear alone, as well as with 1 to 10 other signals a variable amount of times. Concurrent signals are at most 11, all aligned, coming from the same direction, to be distinguished through TOA, RF, PW and AM. Overall, 1200 combinations are generated from the training signals and 300 from the testing, with the amount of pulses varying significantly throughout the dataset (min: 6, max: 1032, average: 467).

Figure 4: Ablative testing. Different configurations of the loss functions, indicated on the left columns, are evaluated with ARI and NMI. The last row shows the proposed method.

Architecture. A U-Net [13] was used for all experiments. To increase its robustness against overfitting and improve its suitability for embedded devices, we reduced its size: feature map channels start at 8 and reach 64 in the bottleneck layer. Since segmentation requires fixed amounts of classes, we set the concurrent predictable clusters to , following [8].

Model Training.

Across all experiments, hyperparameters remained constant, to ensure comparability. The U-Net was trained for 300 epochs with Adam optimizer, a learning rate of

and unit batch size. We optimized for the novel losses described in Section 2.3

. We implemented our method in PyTorch and trained our models on an NVIDIA Titan Xp GPU.

Figure 5: Comparison with baseline methods across five performance metrics.

Evaluation Metrics. The evaluation is based on the criteria presented in Section 2.2: cluster purity () and (non-) fragmentation ratio (). On top of pure and non-fragmented clusters, we want to ensure that each ground truth has a corresponding predicted cluster. Specifically, a cluster remains uncovered (marked red in Fig. 2) when its confusion matrix row contains no column maxima . This is measured by the detection ratio, which is defined as , with

. These three metrics should be considered combined, since they provide a good assessment of the clustering performance from different important aspects. Additionally, we deployed standard clustering evaluation metrics, namely Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). We repeated all experiments five times, reporting mean and standard deviation.

Ablative Testing. In order to showcase the effectiveness of the main components of our method, we performed ablative testing. We evaluated the impact on the clustering performance of different loss function configurations.

Baseline Comparison. Furthermore, we compared our method against various clustering approaches. Despite being popular and effective in other domains, several methods, such as K-means, are not applicable to the task at hand, since they require a predefined number of clusters, which in our case is unknown and varies over time. We deployed five suitable baseline methods, namely DBSCAN, OPTICS, BIRCH, Affinity propagation and Mean shift [19, 18].

4 Results and Discussion

Ablative Testing. Fig. 4 showcases that combining the losses can improve the results regarding both ARI and NMI. The purity loss achieves satisfying results when deployed by itself too, since its optimization acts on every cell of the confusion matrix (Section 2.2), trying to improve all predictions. On the other hand, the fragmentation loss focuses only on a few cells, rendering it unable to deliver good overall performance by itself. However, optimizing them jointly in a weighted fashion of 0.7 for purity and 0.3 for fragmentation delivers higher ARI and NMI. The fragmentation loss is important: it contributes by penalizing the prediction of unnecessary clusters.

Comparative Methods. Fig. 5 highlights the superior performance of our method compared to the other clustering approaches. Most baseline methods suffer from either over predicting the amount of clusters (high , low ), or underestimating it (low , high ). BIRCH stands out among the other baseline approaches. Nevertheless, our method outperforms BIRCH as well with an improvement of 1.5% for NMI and 8.9% for ARI. This is accounted to the learning capacity of our method, which is trained to improve both and jointly. Repeating the experiments with the baseline approaches leads to the same clusters. Our method, despite relying on a random initialization of its model weights, is able to deliver consistent results with a small standard deviation.

Metrics Trade-off. As can be seen in Fig. 5, there is often a trade-off between purity and non-fragmentation ratio scores. The two metrics are evaluating the performance from different perspectives and it is challenging to achieve high scores for both, especially in a task such as radar pulse deinterleaving. Instead, getting a perfect score for only one of the two is trivial, although decreasing the other metric substantially. Such unbalanced performances can be achieved by predicting a single cluster for maximum , or one per input for . The strength of the proposed method lies in its ability to optimize both metrics simultaneously, improving two complementary aspects.

Cross Domain Applicability. The results achieved show good potential for a previously unexplored way of solving clustering problems. Despite being applied on the pulses deinterleaving task, the method can be extended to other domains: the novel soft confusion matrix-based loss functions could pave the way towards new NN-based clustering approaches that could operate in an end-to-end fashion, without requiring conventional data analysis methods. This would alleviate shortcomings of traditional clustering approaches, such as dealing with a predefined number of clusters, or dependencies to cluster shapes and point densities.

5 Conclusion and Future Work

In this paper we proposed a new DL-based clustering method, which we applied on the challenging task of aligned radar pulse deinterleaving. For the first time, clustering was targeted as an image segmentation problem. We changed domain by encoding the concurrent signals into segmentable images. The NN was trained to predict the clusters directly in an end-to-end fashion, aiming at pure and non-fragmented clusters, thanks to new loss functions derived from a novel probability-based confusion matrix.

Furthermore, we plan on scheduling different loss weightings over training and exploring other input representations, suitable for a even larger amount of simultaneous and aligned signals. Moreover, we will adapt the proposed method and losses to be suitable for different domains, such as image clustering, comparing ours against other DL-based approaches.


  • [1] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers (2018) Clustering with deep learning: taxonomy and new methods. arXiv preprint arXiv:1801.07648. Cited by: §1.
  • [2] L. Cain, J. Clark, E. Pauls, B. Ausdenmoore, R. Clouse, and T. Josue (2018) Convolutional neural networks for radar emitter classification. In 2018 IEEE Annual Computing and Communication Workshop and Conference (CCWC), pp. 79–83. Cited by: §1.
  • [3] K. Ghasedi Dizaji, A. Herandi, C. Deng, W. Cai, and H. Huang (2017)

    Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization


    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    pp. 5736–5745. Cited by: §1.
  • [4] J. R. Hershey, Z. Chen, J. Le Roux, and S. Watanabe (2016) Deep clustering: discriminative embeddings for segmentation and separation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35. Cited by: §1.
  • [5] P. Huang, Y. Huang, W. Wang, and L. Wang (2014) Deep embedding network for clustering. In

    2014 22nd International Conference on Pattern Recognition (ICPR)

    pp. 1532–1537. Cited by: §1.
  • [6] H. Li, W. Jing, and Y. Bai (2016) Radar emitter recognition based on deep learning architecture. In 2016 CIE International Conference on Radar (RADAR), pp. 1–5. Cited by: §1.
  • [7] Z. Liu and S. Y. Philip (2018)

    Classification, denoising and deinterleaving of pulse streams with recurrent neural networks

    IEEE Transactions on Aerospace and Electronic Systems. Cited by: §1.
  • [8] Y. Lukic, C. Vogt, O. Dürr, and T. Stadelmann (2016) Speaker identification and clustering using convolutional neural networks. In

    2016 IEEE 26th international workshop on machine learning for signal processing (MLSP)

    pp. 1–6. Cited by: §3.
  • [9] S. Mahmod (2019) Deinterleaving pulse trains with DBSCAN and FART. Master thesis, Uppsala Universitet, Uppsala, Sweden. Cited by: §1.
  • [10] K. Manickchand, J. J. Strydom, and A. Mishra (2017) Comparative study of TOA based emitter deinterleaving and tracking algorithms. In 2017 IEEE AFRICON, pp. 221–226. Cited by: §1.
  • [11] J. Matuszewski (2018) Radar signal identification using a neural network and pattern recognition methods. In 2018 International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), pp. 79–83. Cited by: §1.
  • [12] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long (2018) A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access 6, pp. 39501–39514. Cited by: §1.
  • [13] O. Ronneberger, P. Fischer, and T. Brox (2015) U-Net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (MICCAI), pp. 234–241. Cited by: §2.2, §2, §3.
  • [14] J.A. Scheer, M.A. Richards, W.A. Holm, and W.L. Melvin (2013) Principles of modern radar. SciTech Publishing. External Links: ISBN 9781613532010, LCCN 2010013808 Cited by: §1, §1.
  • [15] M. J. Thompson, S. Lin, and J. C. Sciortino Jr (2006) Measures of effectiveness for analysis of radar pulse train deinterleavers. In Signal Processing, Sensor Fusion, and Target Recognition XV, Vol. 6235, pp. 62351I. Cited by: §2.2, §2.
  • [16] X. Wang, X. Zhang, R. Tian, and X. Qi (2013) A new method of unknown radar signals sorting. In Proceedings of 2013 Chinese Intelligent Automation Conference, pp. 727–733. Cited by: §1.
  • [17] J. Xie, R. Girshick, and A. Farhadi (2016)

    Unsupervised deep embedding for clustering analysis

    In Proceedings of the 33nd International Conference on Machine Learning (ICML), pp. 478–487. Cited by: §1.
  • [18] D. Xu and Y. Tian (2015) A comprehensive survey of clustering algorithms.

    Annals of Data Science

    2 (2), pp. 165–193.
    Cited by: §3.
  • [19] R. Xu and D. C. W. II (2005) Survey of clustering algorithms. IEEE Trans. Neural Networks 16 (3), pp. 645–678. External Links: Link, Document Cited by: §3.
  • [20] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong (2017) Towards K-Means-friendly spaces: simultaneous deep learning and clustering. In Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 3861–3870. Cited by: §1.
  • [21] J. Yang, D. Parikh, and D. Batra (2016)

    Joint unsupervised learning of deep representations and image clusters

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5147–5156. Cited by: §1.