## I Introduction

Classical space-time adaptive processing techniques for target detection and localization have been refined through seminal contributions that have studied performance limits (see [murali_stap02, melvin_stap96, guerci_stap00]). Broadly speaking, several issues degrade clutter suppression and target localization. (1) When the interference due to clutter is unknown a priori, it must be estimated from available data. Traditionally, the clutter covariance matrix is estimated from a limited number of returns from a radar dwell, whereby the accuracy of this estimation governs the performance of STAP [guerci2003space]

. When the dimensionality of the STAP weight vector grows large, so does the amount of data required for obtaining a good estimate of the clutter covariance matrix. A fundamental assumption used in covariance estimation methods is that the clutter scattering function is a wide-sense stationary process. This assumption only holds over relatively short dwell times, which limits the amount of data that can be used to build an estimate of the clutter covariance matrix. Consequently, there is always a mismatch between the true clutter statistics and the estimated values. As the dimensionality of the STAP weight vector increases to meet the ever increasing demand for resolution, this mismatch problem becomes more severe (see

[murali_stap02, melvin_stap96]). (2)When a target lies close to the clutter, there is always a considerable degree of overlap between the azimuth-Doppler subspace of the target response and that of the clutter response. The subspace leakage from the clutter results in a significant increase in the false alarm rate. Projecting the clutter response out (through either orthogonal or oblique projections) inadvertently results in removing some of the target response, leading to a reduction in the probability of correct detection (see

[murali_stap02, melvin_stap96]). (3) Fully adaptive radar STAP, where a separate adaptive weight is applied to all pulses and array elements, requires solving a system of linear equations of size (the spatio-temporal product), where is the number of antenna array elements and is the number of pulses processed. In typical radar systems, this product can exceed tens of thousands. The sheer computational power needed to solve these large systems of equations within short time intervals, especially for high-sampling rate radars (see [guerci_stap00]), may be prohibitive.In recent years, we have also observed the emergence of big data in deep learning applications, especially within the scope of computer vision. In this regard, through decades of intensive work, there now exist several publicly available datasets of natural images such as CIFAR-10/100

[krizhevsky2009learning], ImageNet

[ILSVRC15], and COCO

[lin2014microsoft] for the training and testing of various computer vision algorithms. Expanding this scope to radar STAP, the availability of satellite topographic maps has made it possible to develop simulators that generate highly accurate radar data. The state-of-the-art software for this application, RFView^{®}[gogineni_RFView], possesses high-fidelity data generation capabilities and enables users to synthesize radar returns through a splatter, clutter, and target signal (SCATS) phenomenology engine. Using RFView

^{®}, it is possible to generate comprehensive datasets for radar STAP applications. This has motivated our ongoing data-driven approach to address the aforementioned challenges in target detection and localization.

In our approach, received radar array data is generated using RFView^{®} for randomized target locations and strengths within a constrained area in the presence of high-fidelity clutter returns. In this constrained area, we produce heatmap tensors using select subspace separation techniques in range, azimuth, and elevation angle. Our objective is to localize targets from these 3D heatmap tensors, and our approach is inspired by computer vision algorithms for object detection and localization such as the Faster R-CNN [ren2015faster], which consists of a proposal generating network for determining regions of interest, a regression network for positioning anchor boxes around objects, and a classification algorithm. While it was mainly developed and optimized for natural images, our ongoing research will develop analogous tools for heatmap images of radar array data. In doing so, we will be generating a large, representative radar STAP database for training and testing, which is in spirit analogous to the COCO dataset for natural images [lin2014microsoft]. By building upon and adapting the existing components of Faster R-CNN for our application to radar STAP, we develop tools to localize targets in the heatmap tensors mentioned above.

As a preliminary example, in this work, we analyze the theoretical performance guarantees of the normalized adaptive matched filter (NAMF) test statistic in the context of radar target localization, and augment this analysis through a proposed robust deep learning framework for target location estimation. We consider two example scenarios in RFView^{®}—one in which our framework is trained and tested using data from the original platform location instance (matched case), and another in which our framework is trained with data from the original platform location instance, and tested using data from several displaced platform location instances (mismatched case). The airborne radar platform is stationary with respect to the radar scene, and the data examples are heatmap tensors of the output power of a generalized sidelobe canceller (GSC) (see [vanveen_el88, griffiths_GSC]) or of the NAMF test statistic (see [michels_performance, kraut_adaptive]).

The outline of the paper is as follows. In Section II, we define the problem to be considered in our analysis. In Section III, we briefly review RFView^{®}, a site-specific high-fidelity RF modeling and simulation tool that is used to generate the rich datasets required for our research; we further discuss the matched and mismatched RFView^{®} example scenarios our framework is tested on. In Section IV, we review the subspace separation methods used to generate the heatmap tensors for our training and test datasets. We further outline the mean absolute error metric used to measure localization accuracy, and the normalized output signal-to-clutter-plus-noise ratio (SCNR) metric used to gauge scenario similarity. In Section V, we analyze the asymptotic threshold SCNR performance of the NAMF test statistic in the context of radar target localization. In Section VI

, we present a regression convolutional neural network framework for target localization and a novel encoding scheme for applying regression networks to mismatched scenarios. In Section

VII, we provide numerical results for the matched case, and demonstrate the robustness of and improvements provided by our proposed regression CNN scheme over the classical method. In Section VIII, we provide numerical results for the mismatched case, and demonstrate how the normalized output SCNR metric can be used to benchmark network generalization. Subsequently, we illustrate how few-shot learning can be used to improve network generalization. Finally, in Section IX, we present conclusions and future research directions.We note that this journal paper is an expanded version of our 2022 IEEE Radar Conference paper [Shyam_STAP]

. Specifically, the match between the finite sample performance of the NAMF test statistic and the performance guarantees of random matrix theory in the threshold SCNR region is provided in this journal paper but not in

[Shyam_STAP], which solely considers the performance of the MVDR beamformer [capon_mvdr]. Furthermore, the performance quantification of our CNN framework is carried out rigorously and comprehensively in this journal paper, significantly exceeding the results reported in [Shyam_STAP]. The mismatched case (CNN) performance analysis is also benchmarked using the normalized output SCNR metric for the first time, and augmented via few-shot learning. Altogether, the extensions presented in this journal paper are significant, and merit publication.## Ii Problem Statement

In the realm of radar signal processing, subspace separation methods such as the normalized adaptive matched filter (NAMF) and generalized sidelobe canceller (GSC) are known to improve target localization accuracy [yoon_beamforming]. Arising from prior literature in random matrix theory (RMT), the performance of these subspace separation techniques is also known to deteriorate below a threshold SCNR in the asymptotic regime [Benaych_eigenvalues, Benaych_singular]. Extending this knowledge to radar STAP, and enabled by RFView^{®} [gogineni_RFView], we investigate whether additional processing using modern deep learning methods further improves target localization accuracy.

To evaluate this hypothesis, we first outline our example scenarios in RFView^{®}, and review the aforementioned subspace separation methods that will be used to produce the heatmap tensors comprising our datasets. Thereafter, we can perform a threshold SCNR analysis of the NAMF test statistic to select parameters for the dataset generation procedure. Then, we can evaluate the localization performance of the NAMF and GSC relative to our augmented deep learning framework, and document conclusions.

##
Iii Rfview^{®} Example Scenarios

We first describe the RFView^{®} modeling and simulation environment, which has been used to construct the realistic example scenarios in our work.

###
A RFView^{®} Modeling and Simulation Tool

RFView^{®} [gogineni_RFView] is a site-specific modeling and simulation environment developed by ISL Inc. With this simulation platform, one can generate representative radar data to be used for various signal processing algorithm applications. The RFView^{®} software is built on a Splatter, Clutter, and Target Signal (SCATS) phenomenology engine that has successfully supported numerous advanced development projects for various organizations, including the U.S. Air Force and Navy, since 1989. This SCATS model is one of the first analysis tools to accurately characterize complex RF environments and perform several signal processing tasks, including system analysis and high-fidelity data generation. Examples of tools incorporated into RFView^{®} include target returns, ground scattered clutter returns, direct path signal, and coherent & incoherent interference effects. To specify simulation scenarios and parameters in RFView^{®}, one can use either a MATLAB package or access a cloud-based web interface, run on a remote, high-speed computer cluster. A world-wide database of terrain and land cover data is also provided with RFView^{®}.

The RFView^{®} platform provides numerous radar simulation capabilities, including high fidelity electromagnetic propagation modeling, multi-static clutter modeling, multi-channel and MIMO radar simulation, high-fidelity RF system modeling, channel mismatch modeling, and post-processing pipelines to analyze generated data. Furthermore, RFView^{®} has three main categories of parameters that users can adjust. The first is the platform and target structure, which includes the locations and the trajectories of the radar platforms, as well as important target characteristics such as speed and radar cross-section (RCS). The second category is the task scheduler, which controls specific radar parameters such as the range swath, the pulse repetition frequency (PRF), the bandwidth, and the number of pulses. One can also control other receiver-specific parameters involving non-cooperative emitters. The third category is the antenna structure, which defines parameters pertaining to multi-channel planar arrays for both radar receivers and transmitters. Apart from these categories, RFView^{®} further provides advanced simulation options, such as intrinsic clutter motion (ICM) modeling and parallel execution using ‘cluster mode’ to facilitate reduced simulation runtimes.

Parameters | Values |
---|---|

Carrier frequency | |

Bandwidth | |

PRF & Duty Factor | & 10 |

Receiving antenna | (horizontal vertical elements) |

Transmitting antenna | (horizontal vertical elements) |

Antenna element spacing | |

Platform height | |

Area latitude (min, max) | |

Area longitude (min, max) |

By utilizing these functionalities, we can define synthetic example scenarios within RFView^{®} that accurately model real-world environments. We consider two of these scenarios, both of which consist of an airborne radar platform system flying over the coast of Southern California. The simulation region covers a area, which is uniformly divided into a grid. Each grid cell is in size. RFView^{®} aggregates the information on land types, the geographical characteristics within each grid cell, and the radar parameters to simulate the radar return signal. The common radar and site parameters from our matched and mismatched scenarios are given in Table I—we consider a single-channel transmitter and an -channel receiver. The radar operates in ‘spotlight’ mode and always points toward the center of the simulation region.

###
B Matched Case RFView^{®} Example Scenario

For our matched case RFView^{®} example scenario, we consider a single airborne radar platform within the scene parameterized by Table I. We randomly place a target within a constrained area that contains range bins and varies in range, , where , azimuth angle, , where , and in elevation angle, , where . The default grid resolution is defined as , where denotes the chip size. The target RCS,

, is arbitrarily selected from a uniform distribution with specified mean,

, and range, , such that . We then use RFView^{®}to generate radar returns for different target locations and RCS combinations. The radar and site parameters pertaining to the matched case are provided in Table II. The simulation region and platform location are shown in Figure 1.

Original Location – Parameters | Values |
---|---|

Platform latitude, longitude | |

Constrained area range | |

Constrained area azimuth | |

Constrained area elevation |

###
C Mismatched Case RFView^{®} Example Scenario

For our mismatched case RFView^{®} example scenario, we consider again the airborne radar platform from Section III.B (matched case) and further examine a km displacement of this platform in each cardinal direction: North, West, South, and East. These platform locations are henceforth referred to as Original Location, 1 km North, 1 km West, 1 km South, and 1 km East, respectively. Following the procedure detailed in Section III.B for the matched case, we first use RFView^{®} to generate radar return signals with different target locations and RCS combinations for the original platform location instance. Next, we use RFView^{®} to generate radar return signals with different target locations and RCS combinations for each of the four displaced platform location instances. The simulation parameters pertaining to these displaced platform locations are provided in Table III. Furthermore, the platform locations are displayed in Figure 2 alongside their respective constrained areas for target placement.

1 km North – Parameters | Values |
---|---|

Platform latitude, longitude | |

Constrained area range | |

Constrained area azimuth | |

Constrained area elevation | |

1 km West – Parameters | Values |

Platform latitude, longitude | |

Constrained area range | |

Constrained area azimuth | |

Constrained area elevation | |

1 km South – Parameters | Values |

Platform latitude, longitude | |

Constrained area range | |

Constrained area azimuth | |

Constrained area elevation | |

1 km East – Parameters | Values |

Platform latitude, longitude | |

Constrained area range | |

Constrained area azimuth | |

Constrained area elevation |

## Iv Subspace Separation Methods

In our analysis, we make use of two subspace separation techniques: the generalized sidelobe canceller [vanveen_el88] and the NAMF test statistic [michels_performance]. These techniques are used to generate heatmap tensors in range, azimuth, and elevation angle of the constrained area(s) pertaining to each of our RFView^{®} example scenarios.

### A Heatmap Tensor Generation

We consider an -element receiver array for the radar receiver. Let be a matrix comprising independent realizations of the radar return signal (received radar array data) and let be a matrix comprising independent realizations of the clutter-plus-noise data, both of which have been matched filtered to range bin , where denotes the range (see Section III.B), and is the distance between the platform location and simulation region. Accordingly, we define and as the null and alternative hypotheses, respectively. We can derive the following signal model, where consists of independent realizations of the matched filtered target data, are unique sets of independent realizations of the matched filtered clutter data, and are unique sets of independent realizations of the matched filtered noise data:

(1) | |||

(2) |

Subsequently, we perform a whitening transformation of the matched filtered radar array data:

(3) | |||

(4) |

Now, let denote the sample covariance matrix obtained from for range bin . Let denote the array steering vector associated with coordinates in range, azimuth, and elevation. This array steering vector is provided by RFView^{®}. As such, the output power, , of the generalized sidelobe canceller for coordinates is given by:

(5) |

Similarly, the NAMF test statistic, , for coordinates is given by:

(6) |

Sweeping the steering vector across and at the default angular resolution, , and recording or at each location, we produce a heatmap image in azimuth and elevation. Stacking these images over the consecutive range bins (indexed by ) produces a three dimensional heatmap tensor for STAP; these heatmap tensors comprise the examples of our dataset. Following the notation of Section III.B (the matched case), we have radar array data matrices altogether, which yields heatmap tensors.

Through the matched filtering and whitening transformation procedures described above, the SCNR is substantially increased. This improved ratio is defined as the output SCNR, which is unique for each of the range bins across a given radar array data matrix. For our signal model, the output SCNR is defined as:

(7) | |||

(8) |

Following the notation of Section III.B, for each of our radar array data matrices, a single target is present in one of the range bins. As such, we solely consider the output SCNR for this range bin. Computing the mean of the resulting output SCNR values, we obtain the Mean Output SCNR of our dataset. Our empirical results are conveyed through this measure.

As described in Section III, the constrained areas pertaining to our RFView^{®} example scenarios are defined by range bounds ( and ). By altering the chip size, , the value of (the number of range bins comprising each constrained area) can also be changed. Accordingly, the value of can be increased to achieve greater resolution, which increases the depth of the heatmap tensor. Per this observation, we refer to as the Depth Parameter for the remainder of this work. Similarly, we can designate the common factor by which the default grid resolution is reduced. This factor, , is defined as the Resolution Divisor, whereby the updated grid resolution is equivalent to: .

An example of this three dimensional heatmap tensor is provided in Figure 3 for the radar and site parameters specified in Section III.B. The heatmap tensor is generated using the NAMF test statistic, and the default grid resolution is defined as . The constrained area stretches over range bins, where , such that the resulting heatmap tensor has dimensions . The target is present at . Encoding this target location relative to the top-left corner of the heatmap tensor (see Section VI.B) gives us the grid location: .

We record the ground truth target location for each heatmap tensor example using the standard Cartesian coordinate system, with the platform at the origin, the Northward-pointing line as the x-axis, and the upward-pointing line as the z-axis. Our final dataset comprises the heatmap tensors as the features and the coordinate encoded true target locations (see Section

VI.B) as the labels, which will be used by our CNN framework.### B Mean Absolute Error Metric

Following the heatmap tensor generation process outlined in Section IV.A, we must define a metric to interpret the localization accuracy of our NAMF test statistic, generalized sidelobe canceller, and augmented CNN frameworks. While our heatmap tensors follow the spherical coordinate system, for our definition of the localization accuracy, we transform the ground truth target location into Cartesian coordinates. Moreover, to report the localization error in meters, we use the Mean Absolute Error (average Euclidean distance) between the predicted target locations and the ground truth target locations in plotting the localization error. We now let represent the ground truth target location for example from our dataset and let be the predicted target location for this example (outputted by the CNN model). To note, the heatmap tensor inputs to the CNN are from separate datasets pertaining to the NAMF test statistic and generalized sidelobe canceller. The localization error, , over the test examples, is defined as:

(9) |

We compare this error in target localization from our regression network with the error from a more traditional approach of using the cell with the peak NAMF test statistic or GSC output power as the predicted target location. Let be the center of the grid cell that contains the peak NAMF test statistic for example from the particular dataset. Let be the center of the grid cell that contains the peak GSC output power for example from the particular dataset. Over the test examples, we can compute the errors, and , in using this traditional method as:

(10) | |||

(11) |

### C Normalized Output SCNR

Among the central concerns of ‘black-box’ regression models (such as the CNN) is that their predictive performance is often nondeterministic. Extending this notion to our matched case and mismatched case RFView^{®} example scenarios, it is of significant importance to gauge how well our CNN framework can generalize to perturbations. Our mismatched case exists to answer this very question. As a preliminary step, we can use existing methods from classical radar STAP theory to predict how well our CNN framework can localize targets in the displaced platform location instances, when trained to localize targets in the original platform location instance.

The metric we consult in this analysis is the normalized output SCNR, first introduced by Reed, Mallett, and Brennan in 1974 [reed_rapid]. Originally derived as a means to characterize the convergence rate of adaptive arrays, this metric, defined as the ratio between the adaptive SCNR and the clairvoyant SCNR, is a function of the sample and true clutter-plus-noise covariance matrices, and , respectively. Extending this metric to our signal model, we obtain the following, for :

(12) | |||

(13) | |||

(14) |

Through the normalized output SCNR, we aim to characterize the ‘similarity’ between the platform location instances in our matched and mismatched case RFView example scenarios. We define as the matched filtered radar array data matrix from the original platform location instance, and as the clutter-plus-noise covariance matrix of the original platform location instance. We observe that in the matched case, and that is the clutter-plus-noise covariance matrix of a displaced platform location instance in the mismatched case. For the matched case, we observe that for each of the radar array data matrices, where a single target is present in one of the range bins. As such, we solely consider the normalized output SCNR for this range bin. Thus, the expectation of the resulting normalized output SCNR values gives us a Mean Normalized Output SCNR of . For the mismatched case, we similarly compute the mean of the resulting normalized output SCNR values to obtain .

## V Threshold Scnr Analysis

Over the past decade, advancements in random matrix theory (RMT) have shown that in the asymptotic regime, subspace separation methods experience a severe degradation in performance below a threshold signal-to-noise ratio (SNR)—referred to as the ‘phase transition threshold’—with high probability

[Benaych_eigenvalues, Benaych_singular, Nadakuditi_correlation, Nadler_PCA]. The asymptotic regime is defined for the joint limit , whereby the radar array data matrix, , grows infinitely large. In a passive radar context, Gogineni et al. demonstrated in 2018 that the detection performance of the generalized likelihood ratio test (GLRT) within the finite sample regime exhibits this severe degradation below the threshold SNR region, presenting an agreement with the asymptotic performance guarantees predicted by random matrix theory [Gogineni_Passive]. More clearly, the estimated dominant sample eigenvalue and eigenvectors do not accurately represent their true complements below the threshold SNR. In this analysis, the authors select

, whereby the threshold SNR (phase transition threshold) is derived as:(15) |

such that . The authors further demonstrate that the Kolmogorov-Smirnov (KS) test statistic, which depicts the accuracy of the Gaussian perturbation model for the provided parameters, experiences a severe degradation below this threshold SNR.

Extending this analysis to our problem of radar target localization, we can derive two sets of SCNR thresholds below which the performance of the NAMF test statistic is expected to deteriorate: (1) for fixed clutter-to-noise ratio (CNR) and variable mean target RCS, and (2) for fixed mean target RCS and variable CNR. These thresholds are derived for the matched case RFView^{®} example scenario from III.B, with default grid resolution . In the first case, we fix the CNR to dB and vary the mean target RCS, , from dBsm to dBsm. In the second case, we fix to dBsm and vary the CNR from dB to dB. In both cases, the mean output SCNR varies from dB to dB. Regarding data preliminaries (see Section III.B), we fix the target RCS range to dBsm, and generate two datasets for each of the combinations discussed above, one with and , yielding threshold SCNR ( dB), and another with and , yielding threshold SCNR ( dB). For both cases, we compute the localization error, (see Section IV.B), in meters using examples for each dataset, across all CNR and mean target RCS combinations. These results are summarized in Figures 4 and 5.

^{0}

^{0}footnotetext: Each heatmap tensor generation time was measured using an NVIDIA GeForce RTX 3090 GPU and averaged across all examples in the respective dataset

NAMF Test Statistic | ||
---|---|---|

Data Matrix Parameters | Grid Resolution | Elapsed Time |

s | ||

s |

^{1}

^{1}footnotemark: 1

We observe in both plots that the localization performance of the NAMF test statistic rapidly degrades below the threshold SCNR region for both and , with the latter case depicting a closer match with the derived threshold SCNR. Both cases present an agreement with the asymptotic performance guarantees of random matrix theory in the finite sample regime. We note that for , the heatmap tensor generation time is seven-fold that of the case (see Table IV). Per this trade-off, we adopt the data matrix parameters .

## Vi Regression Cnn Framework

To augment the subspace separation methods outlined in Section IV.A, we design a regression CNN framework to estimate the position of a single target in the presence of clutter and noise using heatmap tensors from either the NAMF test statistic or the generalized sidelobe canceller. For the matched case (Section III.B), we consider four evaluations (Section VII), and for the mismatched case (Section III.C), we consider two evaluations (Section VIII). For each step in these evaluations, we produce two datasets consisting of heatmap tensors: one for the NAMF test statistic and another for the generalized sidelobe canceller. These datasets pertain to the original platform location instance, and are split such that of the dataset is used for training and the remaining is used for testing, where and . Deriving from these five evaluations, we have built four separate regression networks using a CNN architecture to learn the location of each target given its heatmap tensor. The CNN architectures proposed below are extensions of the baseline architecture outlined in our 2022 IEEE Radar Conference paper (see [Shyam_STAP]).

### A Proposed Networks for Target Localization

#### 1 Baseline CNN – Default Angular Resolution

The structure of our baseline regression CNN is shown in Figure 6. This network is used for evaluations where the default angular resolution, , is employed with depth parameter . Each of our training examples are of size and can be visualized as an array of heatmaps (one for each range bin) of size (see Figure 3). Our input examples first pass through a convolutional layer with kernel size

and stride 1, yielding 32 feature maps, to which we apply ReLu activation and batch normalization. We apply max pooling to the output with kernel size

. Subsequently, we pass this output through another convolutional layer with kernel size and stride 1, yielding 64 feature maps, to which we once again apply ReLu activation and batch normalization. After applying max pooling to this output with kernel size , we flatten and pass it through two fully connected layers to obtain the predicted location of our target in coordinates.#### 2 Bisected CNN – Resolution Divisor

The architecture of our bisected regression CNN is shown in Figure 7. This network is used for evaluations where the default grid resolution is employed with resolution divisor , whereby . Each of our training examples are of size and can be visualized as a set of heatmaps (one for each range bin) of size .

#### 3 Trisected CNN – Resolution Divisor

The architecture of our trisected regression CNN is shown in Figure 8. This network is used for evaluations where the default grid resolution is employed with resolution divisor , such that . Each of our training examples are of size and can be visualized as a set of heatmaps (one per range bin) of size .

#### 4 Quadrisected CNN – Resolution Divisor

The architecture of our quadrisected regression CNN is shown in Figure 9. This network is used for evaluations where the default grid resolution is employed with resolution divisor , such that . Each of our training examples are of size and can be visualized as a set of heatmaps (one per range bin) of size .

### B Coordinate Encoding Scheme

Previously, we noted in Section IV.A that the ground truth target location for each heatmap tensor example is recorded using the standard Cartesian coordinate system, with the platform at the origin, the Northward-pointing line as the x-axis, and the upward-pointing line as the z-axis. While this representation works well in the matched case, it does not work well in the mismatched case, where the regression network is trained on the original platform location instance, but is applied to the displaced platform location instances. In particular, Table III summarizes that the range and elevation bounds of the displaced locations’ constrained areas differ considerably from those pertaining to the constrained area of the original location. Accordingly, if the regression CNN were to be applied to the displaced platform location instances after being trained on the original platform location instance, the resulting predicted target locations would be highly inaccurate, due to the disparate constrained area bounds arising from our choice of coordinate system. To amend these pitfalls, we introduce a novel coordinate encoding scheme to enable our trained model to generalize to new platform location instances with disparate heatmap tensor (constrained area) bounds. We consider the instances shown in Figure 10.

We first note the following prior knowledge regarding the instances at hand. For the original platform location instance, we are provided with the default grid resolution, , and the original heatmap tensor bounds:

Now, for the new platform location instance, we are solely provided with the new heatmap tensor bounds:

As we are provided with the new heatmap tensor bounds of the new platform location instance, we know the exact location of the ‘top-left corner’ of the new heatmap tensor, which corresponds to the coordinates: . Similarly, we also know the exact location of the ‘top-left corner’ of the original heatmap tensor, which corresponds to the coordinates: . We can now partition the constrained area of the new platform location instance to ensure the new and original heatmap tensors have identical dimensions: . The new heatmap tensor accordingly has grid resolution , where:

(16) | |||

(17) | |||

(18) |

Now that we have established an equivalence between our new and original heatmap tensors, we encode the ground truth target locations of the original platform location instance’s heatmap tensors using the top-left heatmap tensor corner as the reference point. Afterward, our regression CNN is trained using the heatmap tensors as the features and the grid-indexed ground truth target locations as the labels. More explicitly, consider heatmap tensor example from the original platform location instance dataset, which has the ground truth target location: . Using our coordinate encoding scheme, we can supply the following grid-indexed target location to our regression CNN:

(19) |

When our regression CNN is thereafter applied to the new platform location instance, it will predict the grid-indexed target location for new heatmap tensor example from the new platform location instance dataset, using the top-left heatmap tensor corner as a reference. We can convert this grid-indexed target location, , to spherical coordinates via the transformation:

(20) | |||

(21) | |||

(22) |

Finally, we can convert this predicted target location from spherical coordinates to the standard Cartesian coordinate system, such that . We compare this prediction with the ground truth target location, , for example from the new platform location instance dataset (see Section IV.B).

## Vii Matched Case Empirical Results

As outlined in Section VI, in the matched case, we consider four evaluations of our regression CNN framework to estimate the position of a single target in the presence of clutter and noise using heatmap tensors from either the NAMF test statistic or the generalized sidelobe canceller.

### A Evaluating Regression CNN Framework for Variable SCNR

For our initial evaluation, we gauge the localization accuracy of our CNN framework over a range of SCNR. As noted in Section V, the SCNR can be changed by fixing the mean target RCS, , and varying the CNR, or by fixing the CNR and varying the mean target RCS. We consider again both cases: in the first case, we fix the CNR to dB and vary from dBsm to dBsm, whereas in the second case, we fix to dBsm and vary the CNR from dB to dB. The mean output SCNR varies from dB to dB in both cases. We further fix the target RCS range to dBsm, and define , with default grid resolution . Thereafter, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the generalized sidelobe canceller. These datasets pertain to the original platform location instance, and are split such that of each dataset is used for training and the remaining is used for testing ( and ).

Subsequently, we consider the baseline CNN from Section VI.A.1, and train this network until convergence using the Adam optimizer [kingma_14]

with learning rate hyperparameter

(tuned via experimentation). Using the localization accuracy metric described in Section IV.B, we compare the error in target localization using our baseline CNN with a more traditional scheme of using the midpoint of the cell with the peak NAMF test statistic or GSC output power as the predicted target location (see Figures 11, 12, 13, and 14).The results indicate that our baseline CNN achieves a significantly lower error in predicting the true target location when compared to the traditional scheme, culminating in a 12-fold improvement at an SCNR of dB for all four datasets. Furthermore, in the low-SCNR regime, the fixed CNR cases (Figures 11, 12) depict a 4-fold improvement over the traditional scheme, and the fixed cases (Figures 13, 14) depict a 5-fold improvement over the traditional scheme.

### B Evaluating Regression CNN Framework for Variable Dataset Size ()

Deriving from prior literature in computer vision, the training dataset size, , is regarded as a critical factor in benchmarking CNN performance [zhang2017understanding]. Extending this notion to our second evaluation, we investigate the effects of varying the size of the dataset used to train our CNN framework. We fix the CNR to dB, the mean target RCS to dBsm, the target RCS range to dBsm, and define , with default grid resolution . Next, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the generalized sidelobe canceller—with ranging from to in increments of . These datasets pertain to the original platform location, and are split such that of each dataset is used for training and the remaining is used for testing ( and ).

Consequently, we consider the baseline CNN from Section VI.A.1, and train this network until convergence using the Adam optimizer with . Using the localization accuracy metric from Section IV.B, we compare the error in target localization using our baseline CNN with the aforementioned traditional method of using the midpoint of the cell with the peak NAMF test statistic or GSC output power. The results of this analysis are depicted in Figures 15 and 16.

The plots show that our baseline CNN achieves a far lower error in predicting the true target location when compared to the traditional scheme, indicated by the 7-fold improvement (NAMF test statistic) and 4-fold improvement (generalized sidelobe canceller) when the size of the dataset is limited to examples. We also observe that as the dataset size increases, the error of our baseline CNN decreases asymptotically. As such, increasing beyond examples would likely yield diminishing returns.

### C Evaluating Regression CNN Framework for Variable Chip Size ()

Building upon our first evaluation, we note the existence of several additional methods by which the localization accuracy of the traditional scheme can be improved. Among them, we can reduce the chip size, , which increases the range resolution. This subsequently increases the number of range bins, and also increases the depth, , of each heatmap tensor (see Section IV.A). Since the angular resolution remains unchanged, the azimuth and elevation dimensions of each heatmap tensor are constant. Thus, we can use the baseline CNN from Section VI.A.1 to evaluate the effects of decreasing the chip size. Regarding data preliminaries, we fix the CNR to dB, the mean target RCS to dBsm, the target RCS range to dBsm, and let . We decrease the chip size by changing the depth parameter from to in increments of . Subsequently, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the generalized sidelobe canceller. These datasets pertain to the original platform location, and are split such that of each dataset is used for training and the remaining is used for testing, where and .

We train the baseline CNN until convergence using the Adam optimizer with . Using the localization accuracy metric described in Section IV.B, we compare the error in target localization using our baseline CNN with the aforementioned traditional approach. The results of this analysis are depicted in Figures 17 and 18.

NAMF Test Statistic | ||
---|---|---|

Data Matrix Parameters | Grid Resolution | Elapsed Time |

s | ||

s | ||

s | ||

s | ||

Generalized Sidelobe Canceller (GSC) | ||

Data Matrix Parameters | Grid Resolution | Elapsed Time |

s | ||

s | ||

s | ||

s |

^{V}

We observe that our baseline CNN markedly outperforms the traditional method across all chip sizes, and achieves the lowest error for m. However, as shown in Table V for both the NAMF test statistic and GSC cases, the heatmap tensor generation time for chip size m is greater than 4-fold the generation time for chip size m—a demonstrable trade-off between localization accuracy and computational speed. We also note that the heatmap tensor generation time using the generalized sidelobe canceller is consistently less than the generation time using the NAMF test statistic. As such, for evaluations outside of the low-SCNR regime, the generalized sidelobe canceller may be preferred due to its comparable localization accuracy and improved computational speed.

### D Evaluating Regression CNN Framework for Variable Grid Resolution ()

We note in Section VII.C that by reducing the chip size, , we can improve the localization accuracy of the traditional scheme. As an extension of this evaluation, we can increase the grid resolution, , itself, by varying the resolution divisor, . Correspondingly, we can use the baseline CNN from Section VI.A.1 for , the bisected CNN from Section VI.A.2 for , the trisected CNN from Section VI.A.3 for , and the quadrisected CNN from Section VI.A.4 for .

Regarding data preliminaries, we fix the CNR to dB, the mean target RCS to dBsm, the target RCS range to dBsm, and let . We change the grid resolution by increasing the resolution divisor from to . Subsequently, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the generalized sidelobe canceller. These datasets pertain to the original platform location, and are further split such that of each dataset is used for training and the remaining is used for testing ( and ).

NAMF Test Statistic | ||
---|---|---|

Data Matrix Parameters | Grid Resolution | Elapsed Time |

s | ||

s | ||

s | ||

s | ||

Generalized Sidelobe Canceller (GSC) | ||

Data Matrix Parameters | Grid Resolution | Elapsed Time |

s | ||

s | ||

s | ||

s |

^{V}

We train the aforementioned CNNs until convergence using the Adam optimizer with . Using the localization accuracy metric described in Section IV.B, we compare the error in target localization using our CNNs with the precedent traditional approach. The results of this analysis are depicted in Figures 19 and 20.

We observe that each of our CNNs greatly outperform the traditional method across all tested grid resolutions, with the quadrisected CNN in particular achieving a 25-fold improvement for , where the resolution divisor . However, as shown in Table VI for both the NAMF test statistic and GSC cases, the heatmap tensor generation time for is greater than 20-fold the generation time for resolution divisor , marking a colossal trade-off between localization accuracy and computational speed. Paralleling Section VII.C, the heatmap tensor generation time using the generalized sidelobe canceller is also much less than the generation time using the NAMF test statistic. Thus, for evaluations outside of the low-SCNR regime, the GSC may be preferred due to its similar localization accuracy and much improved computational speed when compared to the NAMF test statistic.

## Viii Mismatched Case Empirical Results

As outlined in Section VI, we consider two evaluations of our regression CNN framework in the mismatched case to estimate the position of a single target in the presence of clutter and noise using heatmap tensors from either the NAMF test statistic or the GSC. We now gauge how well our framework can generalize to these new scenarios.

### A Evaluating Regression CNN Framework for Displaced Platform Location Instances

While we have demonstrated that our regression CNN framework achieves substantial gains over the traditional method in a matched setting (where our CNN is trained and tested on the same platform location instance), among the central questions regarding our framework is whether it demonstrates an improvement in a mismatched setting (where our CNN is trained and tested on different platform location instances). This analysis is necessary as the traditional method of using the midpoint of the cell with the peak NAMF test statistic or GSC output power is largely invariant across displaced platform location instances.

For our analysis, we consider the scenario outlined in Section III.B. We fix the CNR to dB, the mean target RCS to dBsm, the target RCS range to dBsm, and define . We consider the radar and site parameters provided in Table III, with default grid resolution . For the original platform location instance, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the generalized sidelobe canceller—and split them such that of each dataset is used for training and the remaining is used for testing ( and ). Next, for each displaced platform location instance, we produce two datasets consisting of heatmap tensors—one for the NAMF test statistic and another for the GSC—that are only used for testing. Altogether, we now have one training dataset and five test datasets for each of our subspace separation techniques, four of which pertain to the displaced platform location instances. The matched case is reconsidered in the evaluation where our regression CNN framework is trained and tested on the original platform location instance.

Thereafter, we consider the baseline CNN from Section VI.A.1, and train this network on the original platform location instance until convergence using the Adam optimizer with . We apply this network to the displaced platform location instances, and compare the resulting target localization accuracies with the aforementioned traditional approach, using the metric described in Section IV.B. This procedure is repeated for the matched case, after which we evaluate our baseline CNN on the displaced platform location instances, making use of the coordinate encoding scheme described in Section VI.B, since the heatmap tensor bounds of the original and displaced platform location instances are disparate (see Table III).

Before we present the results of our analysis, we recall the Normalized Output SCNR metric from Section IV.C. As was outlined previously, this metric can be used to characterize the ‘similarity’ between the platform location instances in our mismatched case RFView^{®} example scenario. By doing so, we can preliminarily gauge how well our baseline CNN framework can localize targets in the displaced platform location instances, when trained on the original platform location instance. We consider the examples of the radar array data matrix, , used to generate the training dataset of the original platform location instance, and let denote the clutter-plus-noise covariance matrix of the (true) original platform location instance. Furthermore, we let denote the clutter-plus-noise covariance matrix of each (sample) displaced platform location instance. Following the procedure detailed in Section IV.C, we obtain the ensuing Mean Normalized Output SCNR values (summarized in Table VII):

Original Location | Original Location | 1.0000 |

[1pt/1pt] 1 km North (N) | Original Location | 0.6919 |

1 km West (W) | Original Location | 0.6852 |

[1pt/1pt] 1 km South (S) | Original Location | 0.5124 |

1 km East (E) | Original Location | 0.5320 |

Deriving from Table VII, we should expect our baseline CNN—when trained on the original platform location instance—to achieve the highest gain over the traditional method for the matched case, followed by the 1 km North-displaced (N) and 1 km West-displaced (W) platform location instances. As such, the 1 km South-displaced (S) and 1 km East-displaced (E) platform location instances should yield the lowest gain. To note, the decrease in is related to the perturbation caused by displacing our original platform location instance. This perturbation is quantifiable in terms of the subspace perturbation error [shah_dimension] (we will be reporting this result at the 2023 IEEE Radar Conference). As we have now provided the Mean Normalized Output SCNR for our matched and mismatched cases, we proceed by presenting the results of our analysis in Figures 21 and 22, comparing the localization accuracies of our baseline CNN and the traditional method.

We observe that our baseline CNN outperforms the traditional method across all displaced platform location instances, with the 1 km North and 1 km West-displaced platform location instances providing the highest gain (eclipsing a two-fold improvement over the traditional scheme), followed by the 1 km South and 1 km East-displaced platform location instances, which have diminished gains. Ordering these displaced scenarios via their respective gains (N and W outperform S and E) yields a pairwise match with the ordering predicted by . Our analysis indicates that the Mean Normalized Output SCNR is a viable metric for predicting the generalization capabilities of our CNN across displaced platform location instances.

### B Few-Shot Learning for Displaced Platform Location Instances

Revisiting the mismatched case from Section VIII.A

, we note that while our baseline CNN presents an improvement in localization accuracy over the traditional method, the gain is substantially reduced. To ameliorate this reduction, we can adapt existing methods from computer vision to address the mismatched localization problem at hand. In particular, we are interested in the premise of few-shot learning (FSL)—a method of training machine learning algorithms using minimal data. To motivate this approach, we begin by reviewing few-shot learning in the context of computer vision, and extend FSL to our analysis.

Much of the basis for few-shot learning derives from the notion of one-shot learning: a challenging task where a single example is used to train a learning model to classify objects (see

[fei_fei_one_shot, fink_single_example]). Over the past decade, one-shot learning has become foundational to the problem of visual object detection, which is a computer vision paradigm pertaining to the identification and localization of objects in images and videos. In particular, single-shot detectors such as YOLO [redmon2016you] and YOLOv3 [redmon_yolov3] achieve near real-time visual object detection, depicting the feasibility of one-shot learning in computer vision applications. As an extension of these methods, few-shot learning has more recently been used in conjunction with transfer learning and fine tuning [shen_partial] via a data-level approach. By training the provided learning model on a large base dataset for a specified task, we can fine-tune the learning model to perform a similar task using minimal new examples via FSL.Adapting these methods for our analysis, we reconsider the baseline CNN from Section VI.A.1, and train this network on the original platform location instance. From Section VIII.A, we know that a simple application of this CNN to the displaced platform location instances yields diminished improvements over the traditional method. To overcome this reduction, however, as opposed to re-training our baseline CNN from scratch using large datasets for each displaced platform location instance, we can use few-shot learning to fine-tune our baseline CNN.

Consequently, we augment our trained baseline CNN from Section VIII.A with few-shot learning. First, to reduce computation time, we freeze the convolutional and batch normalization layers of our trained baseline CNN (the weights and biases will remain constant), halving the number of trainable parameters. The weights and biases of the remaining layers will be updated via fine-tuning with FSL. Subsequently, we generate two datasets (one for the NAMF test statistic and another for the GSC) consisting of new heatmap tensors for each displaced platform location instance, where dB, dBsm, dBsm, and , with default grid resolution . The target location in each heatmap tensor is randomly generated (independent of the examples from the original platform location instance and the examples from each displaced platform location instance). Finally, we use few-shot learning to fine-tune our network for each of the displaced platform location instances, further training our baseline CNN until convergence using the new examples. We consider the Adam optimizer with reduced learning rate . The results of this analysis are depicted in Figures 23 and 24.

Using few-shot learning, we observe a dramatic improvement in the gain afforded by our trained baseline CNN over the traditional method for all four displaced platform location instances. This analysis demonstrates that our regression CNN framework can be augmented with FSL to improve network generalization, and ameliorate the reduced target localization accuracies we observed in Section VIII.A across displaced platform location instances.

We now conclude our discussion by making the following distinction. While the performance of our regression CNN framework is a monotonic function of the Mean Normalized Output SCNR (), deriving the exact relationship quickly becomes mathematically intractable due to the black-box nature of the CNN. As this translates to our few-shot learning evaluation, determining the exact improvement in for a given displaced platform location instance—after FSL has been applied to it—is nontrivial. Such an investigation exists outside the scope of our analysis.

## Ix Concluding Remarks

The emergence of site-specific high-fidelity radio frequency modeling and simulation tools such as RFView^{®} has made it possible to approach classical problems in radar using a data-driven methodology. In this work, as part of our ongoing data-driven approach to radar STAP, we analyzed the asymptotic performance guarantees of select subspace separation methods in the context of radar target localization, and augmented this analysis through a proposed robust deep learning framework for target location estimation in matched and mismatched settings. Our approach builds upon classical radar STAP techniques and was inspired by existing computer vision algorithms for object detection and localization. To demonstrate the feasibility of our approach, our deep learning framework was evaluated on a diverse array of perturbations, and its gain was quantified over the classical approach. Regarding future directions, further analysis (and extensions) of the mismatched case can serve to better gauge the efficacy of knowledge transfer in radar STAP applications.

## Acknowledgment

This work is supported in part by the Air Force Office of Scientific Research (AFOSR) under award FA9550-21-1-0235. Dr. Muralidhar Rangaswamy and Dr. Bosung Kang are supported by the AFOSR under project 20RYCORO51. Dr. Sandeep Gogineni is supported by the AFOSR under project 20RYCOR052. The opinions and statements within this paper are the authors’ own and do not constitute any explicit or implicit endorsement by the U.S. Department of Defense.