Blind Deinterleaving of Signals in Time Series with Self-attention Based Soft Min-cost Flow Learning

10/24/2020
by   Oğul Can, et al.
0

We propose an end-to-end learning approach to address deinterleaving of patterns in time series, in particular, radar signals. We link signal clustering problem to min-cost flow as an equivalent problem once the proper costs exist. We formulate a bi-level optimization problem involving min-cost flow as a sub-problem to learn such costs from the supervised training data. We then approximate the lower level optimization problem by self-attention based neural networks and provide a trainable framework that clusters the patterns in the input as the distinct flows. We evaluate our method with extensive experiments on a large dataset with several challenging scenarios to show the efficiency.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/23/2019

Self-Attention for Raw Optical Satellite Time Series Classification

Deep learning methods have received increasing interest by the remote se...
10/18/2021

Finding Strong Gravitational Lenses Through Self-Attention

The upcoming large scale surveys are expected to find approximately 10^5...
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
09/24/2021

Attentive Contractive Flow: Improved Contractive Flows with Lipschitz-constrained Self-Attention

Normalizing flows provide an elegant method for obtaining tractable dens...
12/14/2021

Multi-Modal Temporal Attention Models for Crop Mapping from Satellite Time Series

Optical and radar satellite time series are synergetic: optical images c...
10/02/2020

Attention-Based Clustering: Learning a Kernel from Context

In machine learning, no data point stands alone. We believe that context...
06/09/2018

An Optimization Approach to the Langberg-Médard Multiple Unicast Conjecture

The Langberg-Médard multiple unicast conjecture claims that for any stro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Practical Electronic Warfare (EW) systems are often operated in environments where multiple signal sources exist simultaneously. Signal patterns from different emitters are received via the same channel of the receiver in time order, hence obtaining an interleaved signal sequence. For robust classification of radar signals, obtaining the individual sequences from emitters (i.e., deinterleaving, emitter separation or signal sorting) is essential [1, 2].

The first degree attributes of radar pulse streams are time of arrival (ToA), direction of arrival (DoA), pulse amplitude (PA), pulse width (PW) and radio frequency (RF), which are stored in pulse description words (PDWs) [1, 2]. DoA, PA, PW and RF may be missing or follow unpredictable irregular patterns [3, 4] though clustering the pulses with respect those attributes seem to be appealing. On the other hand, radar signals usually have characteristic pulse repetition interval (PRI) which is the first difference of the ToA sequence. Following the existing efforts [3, 4, 5, 6], we exploit only ToA measures to deinterleave radar signals .

Prior conventional methods construct histograms from ToA differences to capture the periodicity of the pulses [3, 4]. While the cumulative difference histogram (CDIF) [3] accumulates histogram values for each difference level of the ToA sequence in a single iteration, the sequential difference histogram (SDIF) [4]

computes the histogram values simultaneously for each difference order. Following that, they estimate PRI values of radar signals from calculated histogram by using heuristic threshold functions and sequentially search and cluster pulses, which follows estimated PRI regimes to form radar signals. Since those approaches are sensitive for sub-harmonics, PRI transform (PRIT)

[5] proposes to use a complex-valued autocorrelation-like integral instead of calculating histograms with the assumption that the ToA sequence is quite large. Although those conventional algorithms work well for simple PRI modulation types, they fail to succeed in scenarios where largely missing pulses, highly jittered PRI values and staggered PRI modulation modes exist.

Figure 1: Overview of the Soft Min-cost Flow Framework

Recently, heuristic approaches [7, 8, 9, 10]

are proposed to deinterleave radar signals which have complex PRI modulation types. However, their limitation to specific scenarios prevents generalizing to problems of different nature. Furthermore, machine learning techniques have been applied to solve the deinterleaving task. For instance, fuzzy adaptive resonance theory (ART) is utilized in

[11] to cluster pulse streams by exploiting RF, PW and DoA features before using one of the histogram based deinterleaving methods. Moreover, [12]

utilizes recurrent neural networks (RNNs) to capture the current context of the pulse stream and predict attributes of the next PDWs. Yet, RNNs classify PDWs from predefined radar list, hence, it identifies pulses rather than solving the deinterleaving problem. Finally, denoising pulse streams with autoencoders

[13] is visited to improve the deinterleaving solution.

We address alleviating the limitations of the aforementioned approaches through a novel end-to-end trainable clustering framework. Starting from linking deinterleaving to min-cost flow problem, we develop a bi-level optimization problem and relate the solver of the lower problem to the residual networks, yielding a novel unified self-attention based framework enabling to learn clustering arbitrary number of patterns in signals.

2 Method

2.1 Min-cost Flow Formulation of Signal Clustering

We consider deinterleaving of PRI patterns using relative time of arrival (RToA) measures though our formulation is general for deinterleaving in time series. RToA is computed from ToA differences: for with .

We have RToAs, , of mixed signals. Our aim is to divide into disjoint sets, , containing the RToAs of the distinct sources. We encode such sets by the ground truth assignment matrix, , where tells is preceding in a distinct pattern, i.e., and as .

We formulate deinterleaving problem as a min-cost flow problem on a directed graph. The vertices of the graph are RToAs which we denote as and the edges are the possible connections among RToAs. We augment the graph with a start and a terminal vertex, , to help formulation. We associate edges with costs, , and mean to find min-cost flows from some starting vertices to the terminal vertices. Such flows will be the deinterleaved signals. Formally, we consider the problem:

(P1)

to find an assignment matrix, , from which we can recover the flows. The constraints avoid one-to-many or many-to-one connections (1&2) and preserve flow continuity (3). Total unimodularity of the constraints ensures integer solutions for [14], meaning that we have a binary assignment matrix, , which encodes the linking of the flows. We note that the number of sources, , can be arbitrary; since, no explicit dependency on exists in the formulation.

2.2 Learning Association Costs and Soft Min-cost Flow

The optimality of the computed flows are based on the edge costs, . Ideally, given as the ground truth assignments, would yield the flows as the desired deinterleaved signals. In practice, we have no information of .

We are to learn association costs from some training data, , with known assignments, . To formulate our learning framework, we express the cost matrix as a parametric function of with parameters to be learned as where we drop the dependency on for simplicity. Then the problem becomes a bi-level optimization problem:

(P2)

where and are the flow constraints in (P1

). The Jacobian of the linear program (LP)

must be computed to perform gradient updates for . This can be achieved by transforming LP into an unconstraint problem with unique solution and using implicit differentiation [15]. Equality constraints can be eliminated by expressing where is any solution to and is the basis for the null space of the constraints . The inequality constraints can be relaxed by introducing strictly convex penalty terms as in [16]. Hence, the relaxed unconstrained problem, , has a unique solution to be obtained by gradient updates over as:

(2.1)

where is the iteration number. For a solution obtained at the iteration, , computing the partial derivatives for

can be seen as applying chain rule recursively on

s: , where we deliberately omit dependency on

for simplicity. To this end, with the abuse of the gradient expression, we consider each update as a non-linear transform of

added to itself:

(2.2)

which is similar to residual connections in deep learning

[17, 18]. The expression in Eqn. (2.2) informally suggests approximating a solver by stacked residual layers. We use this resemblance to build our formulation which couples cost learning and LP solving within a single framework. Specifically, we consider assignment matrix as a parametric function of directly, , rather than the solution of an LP and formulate the learning problem as:

(P3)
subject to (C3.1)
(C3.2)
(C3.3)
(C3.4)

where is realized by a stack of bidirectional transformer encoder [18] layers. We omit start node, , and related constraints from the formulation and embed flow continuity in (C3.3). We explicitly introduce flow constraints to the learning problem in terms of . Thus, the constraints are no longer linear nor convex. However, such a formulation allows us to use supervision to learn solving min-cost flow problem with proper costs inherently. Hence, no explicit LP solver to compute assignment matrix is required in practice.

We solve (P3) by relaxing the constraints with proper penalty terms to obtain an unconstrained problem. As we will show in Sec. 2.3, (C3.1) is already satisfied by design. We use hinge loss for (C3.2),

(2.3)

and squared L2 norm penalty for (C3.3),

(2.4)

For the binary constraint (C3.4), we exploit the fact always and equality holds when

is a one-hot vector.

(2.5)

where is the i row of . Hence, total loss becomes:

(2.6)

where

s are the penalty coefficient hyperparameters. Once the parameters,

are learned by , distinct patterns in a sequence, , can be clustered using computed flows.

2.3 Soft Min-cost Flow Architecture: p(x;)

We exploit stack of bidirectional transformer encoder (BTE) [18] layers to model owing to the success of the architectures [19, 18] equipped with such layers in encoding the high order relations among sequential data. A BTE layer is a stack of a self-attention and a feed-forward layer each of which contains a non-linear function and a residual connection. Thus, such layers meet our intuition for expressing a solver in terms of stacked residual units as in Eqn. (2.2).

Self-attention layers expresses the representation of each sample in the sequence as the convex combination of the representation of the all samples. The mixing weights are according to the relative similarities of the representations. Thus, stack of such layers hierarchically encodes higher order relations among the samples of the sequential data.

In our architecture which is depicted in Fig. 1, we first represent each sample, , in the sequence with a -dimensional vector. We normalize the input sequence, , and then assign each sample, , to some token with embedding according to quantization of the interval, . One can also use a neural network to embed s to ; however, we empirically found that the former works well in practice and simpler. We then perform relative position encoding as in [20] to emphasize ordering in time. These encoded representations are then fed to the stack of BTE layers to yield transformed representations, , which are enhanced with the higher order relations. We apply affine transformation followed by soft-max to these representations: where we consider columns of as the decision representatives which tell to which the corresponding sample is to be linked . Thus, is a proxy of the similarity between and the decision representative. Large means

is probably linked to

. Finally, soft-max of reads the assignment matrix and note that the constraint (C3.1) in (P3) is readily satisfied with this formulation: .

Though using fixed decision representatives brings the limitation of processing fixed-length sequence, this limitation can be alleviated during inference by processing the sequences in overlapping windows. We empirically found that interaction of the samples too far from each other can be neglected. Thus, fixed-length training of appropriate size achieves good generalization performance.

2.4 Inference

In general, we have soft assignments in our estimated assignment matrix, . We compute a binary assignment matrix from by solving (P1) with as the cost matrix. We also experimented greedily performing assignments starting from the highest . We provide the comparisons in Sec. 3.3.

For the sequences whose lengths exceed the model capacity, we take overlapping segments of the sequence and perform assignments. We take the most recent assignment for a sample in case of multiple assignments due to overlapping.

3 Experimental Work

Dataset - / / CDIF SDIF PRIT Baseline + Greedy Baseline + LP SMCF + Greedy SMCF + LP
Validation - Case 1 0.841 / 0.605 / 0.000 0.850 / 0.614 / 0.000 0.982 / 0.631 / 0.000 0.975 / 0.940 / 0.013 0.963 / 0.966 / 0.000 0.991 / 0.998 / 0.001 0.991 / 0.998 / 0.000
Validation - Case 2 0.519 / 0.312 / 0.000 0.518 / 0.309 / 0.000 0.631 / 0.538 / 0.000 0.894 / 0.758 / 0.201 0.885 / 0.764 / 0.000 0.945 / 0.996 / 0.004 0.943 / 0.994 / 0.000
Validation - Case 3 0.352 / 0.154 / 0.000 0.362 / 0.157 / 0.000 0.579 / 0.437 / 0.000 0.858 / 0.623 / 7.477 0.856 / 0.627 / 0.000 0.912 / 0.950 / 0.025 0.915 / 0.954 / 0.000
Validation - Case 4 0.679 / 0.480 / 0.000 0.683 / 0.481 / 0.000 0.768 / 0.545 / 0.000 0.956 / 0.977 / 0.011 0.955 / 0.977 / 0.000 0.991 / 0.999 / 0.002 0.990 / 0.998 / 0.000
Validation - Case 5 0.548 / 0.370 / 0.000 0.550 / 0.364 / 0.000 0.642 / 0.496 / 0.000 0.901 / 0.836 / 2.736 0.900 / 0.841 / 0.000 0.929 / 0.975 / 0.015 0.929 / 0.975 / 0.000
Test 0.304 / 0.147 / 0.000 0.311 / 0.148 / 0.000 0.564 / 0.436 / 0.000 0.842 / 0.539 / 11.943 0.823 / 0.555 / 0.000 0.903 / 0.883 / 0.098 0.898 / 0.886 / 0.000
Table 1: Comparison with the existing and baseline methods for the deinterleaving task. The best results are indicated in bold.

3.1 Datasets

We use simulator to generate ToA sequences for training and validation data. PRI types of the radars (Fig. 2) are constant, jitter, constant stagger, random stagger and switch&dwell [21]

. To produce realistic data, we select the maximum standard deviation of constant, jitter, constant stagger, random stagger and switch&dwell for ToA are +/-1%, +/-15%, +/-40%, +/-40% and +/-40%, respectively. Since normalization occurs in PRI as pre-processing of the algorithm, we choose PRI values from pre-defined range which is between 1 and 1000

s. Stagger level can vary from 2 to 9. We sample switch&dwell repetition parameter (the number of dwells) between 4 and 10 where the range of the number of switches is kept as same as the stagger level range. The least and the most number of sample belong to same radar is 5 and 100, respectively. We set the maximum number and minimum of radars in a sequence is 10 and 1, respectively.

We generate 200000 interleaved radar sequences for training and 5000 for validation data. Sequences in each dataset for the following cases are sampled with equal probability: Case 1: Only constant and jitter PRI type. Case 2: No missing occurs. While ending times of radar sequences are same, beginning times of radar sequences are close. Case 3: Missing occurs with maximum probability, 0.20. Consecutive missing is favored up to 10 pulses. While ending times of radar sequences are same, beginning times of radar sequences are close. Case 4: No missing occurs. While ending times of radar sequences are different, beginning times of radar sequences might be far. Case 5: Missing occurs with maximum probability, 0.20. Ending and beginning times of radar sequences might be close or far.

In test dataset, we use 15 deinterleaved signals provided by ASELSAN A.Ş. We interleaved random batches of distinct signals while the number of the interleaved signals varies between 2 and 15. The test dataset contains 1000 interleaved sequences and is of different modality and disjoint from training and validation.

Figure 2: PRI types. (a) 500 s constant PRI. (b) 500 s jitter PRI with +/-10% jitter. (c) [500, 400, 600] s constant stagger PRI with 3 stagger levels. (d) [400, 500, 600] s random stagger PRI with 3 stagger levels. (e) [600, 400, 500] s switch&dwell PRI with [5, 7, 3] dwells and 3 switches.

3.2 Experimental Setup and Evaluation Metrics

We use Tensorflow

[22] deep learning library throughout the experiments. We set penalty coefficient hyperparameters, and , after fine-tuning the network. Furthermore, we select the maximum of sequence length, the dimension of representation and the number of quantization level as 256, 512 and 5001, respectively. For the optimization procedure, we choose the base learning rate as and utilize ADAM [23]

optimizer for mini-batch gradient descent with a mini-batch size is 128 and default moment parameters. The remained BTE parameters are kept same with

[19].

We use three performance metrics for better analysis: the average probability of the true estimated next PDW position (), the average probability of the true estimated number of radars () and the average number of one to many assignments () which violates (C3.2). The desired values of those metrics are , and . Each predicted cluster which has larger than 3 is assigned as a radar signal.

3.3 Quantitative Results

We examine the effectiveness of the proposed soft min-cost flow (SMCF) framework through evaluation on the validation dataset and disjoint test data. We compare SMCF with CDIF [3], SDIF [4] and PRIT [5], which are conventional non-learning based deinterleaving methods, to show their limitations in realistic scenarios. Additionally, we show the significance of relaxed constraints in Eqn. (2.6) by training without corresponding penalty terms, i.e., , which we denote Baseline. We also compare the two inference methods (Greedy, LP) proposed in Sec. 2.4.

Table 1 shows the quantitative results for the deinterleaving tasks. SMCF algorithm consistently outperforms the associated baseline methods by up to 6.1% and 32.8% points on and metrics, respectively. Conventional methods perform poorly in challenging scenarios where largely missing pulses and complex PRI modulation types exist. Additionally, the obtained clusters are missing related PDWs although PRI values are predicted correctly.

Comparison of one-to-many violations of Baseline Greedy with of SMCF Greedy as well as the overall performance variation between SMCF Greedy and LP is a good indicator to show the proposed penalty terms improve the solution by enforcing constraints. Similar performances of Greedy&LP in SMCF and superior performance of Greedy SMCF to Baseline imply constraints are better satisfied in the output of SMCF. We note that all the methods except for Greedy have zero one-to-many violations owing to hard assignment.

4 Conclusion

We link deinterleaving to min-cost flow problem and formulate training of a self-attention based neural network to solve min-cost flow problem. The mechanism behind the formulation is supported by extensive evaluations on the large dataset. Experimental results on data of different modalities show that our framework is able to learn deinterleaving the signals.

References

  • [1] Richard G Wiley, ELINT: The interception and analysis of radar signals, Artech House, 2006.
  • [2] David K Barton, Radar system analysis and modeling, Artech House, 2004.
  • [3] HK Mardia, “New techniques for the deinterleaving of repetitive sequences,” in IEE Proceedings F (Radar and Signal Processing). IET, 1989, vol. 136, pp. 149–154.
  • [4] DJ Milojević and BM Popović, “Improved algorithm for the deinterleaving of radar pulses,” in IEE Proceedings F (Radar and Signal Processing). IET, 1992, vol. 139, pp. 98–104.
  • [5] Ken’ichi Nishiguchi and Masaaki Kobayashi, “Improved algorithm for estimating pulse repetition intervals,” IEEE transactions on Aerospace and Electronic Systems, vol. 36, no. 2, pp. 407–421, 2000.
  • [6] X. Li, Z. Liu, and Z. Huang,

    “Deinterleaving of pulse streams with denoising autoencoders,”

    IEEE Transactions on Aerospace and Electronic Systems, pp. 1–1, 2020.
  • [7] Zhipeng Ge, Xian Sun, Wenjuan Ren, Wenbin Chen, and Guangluan Xu, “Improved algorithm of radar pulse repetition interval deinterleaving based on pulse correlation,” IEEE Access, vol. 7, pp. 30126–30134, 2019.
  • [8] Yanchao Liu and Qunying Zhang, “Improved method for deinterleaving radar signals and estimating pri values,” IET Radar, Sonar & Navigation, vol. 12, no. 5, pp. 506–514, 2018.
  • [9] Ala Mahdavi and Amir Mansour Pezeshk, “A fast enhanced algorithm of pri transform,” in 2011 Sixth International Symposium on Parallel Computing in Electrical Engineering. IEEE, 2011, pp. 179–184.
  • [10] Yan Mao, Jun Han, Guohua Guo, and Xu Qing, “An improved algorithm of pri transform,” in 2009 WRI Global Congress on Intelligent Systems. IEEE, 2009, vol. 3, pp. 145–149.
  • [11] AW Ata’a and SN Abdullah, “Deinterleaving of radar signals and prf identification algorithms,” IET radar, sonar & navigation, vol. 1, no. 5, pp. 340–347, 2007.
  • [12] Zhang-Meng Liu and S Yu Philip, “Classification, denoising, and deinterleaving of pulse streams with recurrent neural networks,” IEEE Transactions on Aerospace and Electronic Systems, vol. 55, no. 4, pp. 1624–1639, 2018.
  • [13] Xueqiong Li, Zhangmeng Liu, and Zhitao Huang, “Deinterleaving of pulse streams with denoising autoencoders,” IEEE Transactions on Aerospace and Electronic Systems, 2020.
  • [14] Alexander Schrijver, Theory of linear and integer programming, John Wiley & Sons, 1998.
  • [15] Peter Ochs, René Ranftl, Thomas Brox, and Thomas Pock, “Bilevel optimization with nonsmooth lower level problems,” in

    International Conference on Scale Space and Variational Methods in Computer Vision

    . Springer, 2015, pp. 654–665.
  • [16] Samuel Schulter, Paul Vernaza, Wongun Choi, and Manmohan Chandraker, “Deep network flow for multi-object tracking,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2017, pp. 6951–6960.
  • [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [18] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
  • [19] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [20] Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani, “Self-attention with relative position representations,” arXiv preprint arXiv:1803.02155, 2018.
  • [21] Moein Ahmadi and K Mohamedpour, “Pri modulation type recognition using level clustering and autocorrelation,” American Journal of Signal Processing, vol. 2, no. 5, pp. 83–91, 2012.
  • [22] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al., “Tensorflow: A system for large-scale machine learning,” in 12th USENIX symposium on operating systems design and implementation (OSDI 16), 2016, pp. 265–283.
  • [23] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.