Log In Sign Up

Broad Area Search and Detection of Surface-to-Air Missile Sites Using Spatial Fusion of Component Object Detections from Deep Neural Networks

Here we demonstrate how Deep Neural Network (DNN) detections of multiple constitutive or component objects that are part of a larger, more complex, and encompassing feature can be spatially fused to improve the search, detection, and retrieval (ranking) of the larger complex feature. First, scores computed from a spatial clustering algorithm are normalized to a reference space so that they are independent of image resolution and DNN input chip size. Then, multi-scale DNN detections from various component objects are fused to improve the detection and retrieval of DNN detections of a larger complex feature. We demonstrate the utility of this approach for broad area search and detection of Surface-to-Air Missile (SAM) sites that have a very low occurrence rate (only 16 sites) over a  90,000 km^2 study area in SE China. The results demonstrate that spatial fusion of multi-scale component-object DNN detections can reduce the detection error rate of SAM Sites by >85 recall. The novel spatial fusion approach demonstrated here can be easily extended to a wide variety of other challenging object search and detection problems in large-scale remote sensing image datasets.


page 1

page 2

page 3

page 5


WRICNet:A Weighted Rich-scale Inception Coder Network for Multi-Resolution Remote Sensing Image Change Detection

Majority models of remote sensing image changing detection can only get ...

Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks

In this work, we investigate various methods to deal with semantic label...

Subspace Decomposition based DNN algorithm for elliptic-type multi-scale PDEs

While deep learning algorithms demonstrate a great potential in scientif...

An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications

In order to maximize detection precision rate as well as the recall rate...

Embedded Vision for Self-Driving on Forest Roads

Forest roads in Romania are unique natural wildlife sites used for recre...

I Introduction

Deep Neural Networks (DNN) have shown through extensive experimental validation to deliver outstanding performance for object detection/recognition in a variety of benchmark high-resolution EO/MS remote sensing image datasets [1]-[7]. The demonstrated ability of DNNs to automatically detect a wide variety of man-made objects with very high accuracy has tremendous potential to assist human analysts in labor-intensive visual searches for objects of interest in high-resolution imagery over large areas of the Earth’s surface. However, even DNN detectors with exceptionally high accuracy (e.g. 99%) will still generate a tremendous number of errors when applied to large-scale remote-sensing image datasets. For example, a DNN detector with 99% average accuracy, chip size of 128 x 128 pixels, and a chip scan overlap of 50% will generate 88,000 errors when applied to a 0.5 m GSD image dataset covering an AOI of only 10,000 km2 (e.g. 1° x 1° cell).

If post DNN detection results are intended to be reviewed by human analysts in machine-assisted analytic workflows, then large numbers of detection errors can quickly lead to “error fatigue” and a corresponding negative end-user perception of a machine-assisted workflows. Thus, it is important to develop methods to reduce error rates resulting from application of DNN detectors to large-scale remote-sensing image datasets to improve machine-assisted analytic workflows.

Ii Study Area and Source Data

This study builds upon Marcum et al. [8] where broad area search and detection of SAM Sites (Fig. 1) was demonstrated over a 90,000 km2 study Area of Interest (AOI) along the SE coast of China. Key results from the prior study were:

  1. A machine-assisted approach was used to reduce the original AOI search area by 660 to only 135 km2.

  2. The average machine-assisted search time for 2100 candidate SAM Site locations was 42 minutes which was 81 faster than a traditional human visual search.

While Marcum et al. used a single binary DNN detector to locate candidate SAM Site locations, here we explore the benefit of fusing multi-scale DNN detectors of smaller component objects to improve the detection of the larger encompassing SAM Site features.

First, a binary SAM Site DNN detector was trained using a slightly enhanced version of the curated SAM Sites training data in China from [8]. To ensure blind scanning, only 101 SAM Site lying outside the SE China AOI were used to train the DNN. While [8] used a 227x227 pixel chip size at 1 m GSD to train a ResNet-101 DNN, here we used a 299x299 pixel chip size at 1 m GSD for training a NASNet DNN.

As in [8], negative training chip samples were selected using a 5-km offset in the four cardinal directions (i.e. N/S/E/W) for each SAM Site. The SE China AOI has 16 known SAM Sites which includes 2 newer SAM Sites found in the previous study [8].

We next developed binary DNN detectors for four different SAM Site component objects:

Launch Pads

, Missiles, Transporter Erector Launchers (TELs), and TEL Groups (two or more co-located TELs ) (Fig. 2). Component binary DNN detectors were trained using curated data at 0.5 m GSD from China SAM Sites outside the AOI. We first created negative training samples for each component using nearby image chips (similar land cover context), but outside the known spatial extent of a SAM Site. This produced a 1:1 ratio of negative to positive component training samples (Table I).

In addition, we created a second training dataset using all four components to train a combined Launch Pad detector (empty and non-empty) knowing that the other components (e.g Missiles, TELs , etc.) are generally co-located with Launch Pads. We then developed a second set of component detectors for the Missile, TEL, and TEL Group object classes by combining negative training data from the other components and then randomly paring down the data to produce a 4:1 ratio of negative to positive samples (Table I). For the Missile component, samples from empty Launch Pads, TEL, and TEL Group and their negatives were added. However, only samples from empty Launch Pads and Missiles were added to the negatives for TELs and TEL Groups to reduce confusion between these two components.

Different chip sizes were used for the training samples based on known object sizes. A 128x128 pixel chip size was used for detecting both empty and combined Launch Pads and TEL Groups. While a 64x64 pixel chip size was used for Missiles and TELs. Counts for all training data are provided in Table I and these only include component samples outside the SE China AOI to ensure blind scanning.

Object SAM Launch Missiles TELs TEL
Class Sites Pads Groups
TP 101 391011footnotemark: 1 1976 2733 1179
TN 404 3696 2624 2272 1054
Combo TP n/a 9798 as above as above as above
Combo TN n/a 8512 6530 10,078 5762
1: Empty
2: Includes those with co-located Missiles, TELs, and TEL Groups
TABLE I: Summary of Curated Training Data

Fig. 1: Example Surface-to-Air Missile (SAM) Site with smaller-scale Launch Pad and TEL Group component objects.
(a) Empty Launch Pad
(b) Empty Launch Pad
(c) Missile
(d) Missile
(e) TEL
(f) TEL
(g) TEL Group
(h) TEL Group
Fig. 2: Samples of SAM Site component objects used in this study.

Iii Data Processing

Iii-a Training Data Augmentation

Augmentation strategies from [8] were used to train the SAM Site DNN and all component DNNs to improve detector performance. A 144 augmentation was used for all 5-fold validation experiments while a 9504 augmentation was used for the final SAM Site DNN used in the AOI scanning. To save computing time, augmentations were reduced for training the component DNNs due to the much larger sample sizes. These changes included using RGB samples only, reducing the number of rotations, using a single jitter distance, and removing the contrast augmentation. Most of the final component DNNs were trained with 648 augmentations, except the combined Launch Pad DNN used a 216 augmentation.

Iii-B Scanning and Spatial Clustering

We used the Neural Architecture Search Network (NASNet) [9]

for training all DNN detectors along with transfer learning from ImageNet weights. The NASNet DNN significantly outperformed the ResNet-101 DNN (Table  

II) which was the best performing SAM Site DNN detector evaluated in [8].

DNN TPR (%) TNR (%) ACC (%) AUC (%)
ResNet-101 99.0 98.8 96.4 96.4
NASNet 99.9 99.0 99.4 99.4
TABLE II: Summary of SAM Site DNN Detector Performance from 5-fold Cross-Validation. Metrics shown are True Positive Rate (TPR), True Negative Rate (TNR), Average Accuracy (ACC), and Area Under the ROC Curve (AUC).

As in [8], images used for broad area search for SAM Sites in the SE China AOI were comprised of 66K 1280x1280 pixel tiles at 1 m GSD with 10% overlap between tiles. Individual tiles were scanned by generating

19.7M image chips with 75% overlap (25% stride) that were then input to the NASNet DNN. This produced a detection field,

, of softmax outputs from the DNN. After thresholding at , a greatly reduced detection field, , is then used to produce an amplified spatial detection field, . The is used to weight a spatial clustering of to produce mode clusters, , within a 300-m aperture radius (see [8]). Cluster locations were then rank-ordered by summing the scores of all detections within a mode cluster to generate an initial set of “candidate” SAM Sites.

A new 1280x1280 pixel tile at 0.5 m GSD centered on each candidate SAM Site’s cluster location was then used for all component DNN scans. Component scanning outputs were also spatially clustered to generate locations and cluster scores for each component object. An aperture radius of = 32 m was used since this is approximately half the typical distance between SAM Site launch pads in China. An alpha cut of was used to generate distinct cluster locations for a given component relative to neighboring components that were present at each candidate SAM Site.

Likewise, in order to determine thresholds used for the decision-theoretic approach (DTA) described in Section  III-E, a training set of 1280x1280 pixel pseudo-candidate tiles were generated that were centered about the known SAM Sites outside the SE China AOI along with corresponding offset tile negatives. The same scans and processes performed for candidate tiles within the SE China AOI (described above), were used for the pseudo-candidate training dataset.

Iii-C Cluster Score Normalization and Truncation

Cluster scores from one object class to another are not necessarily comparable since they can result from objects with different physical sizes and corresponding R values. In addition, results generated from image tile scans with different DNN input chip size and/or GSD will have a variable spatial density. Since we wish to spatially fuse, and potentially weight, various component DNN detections, the cluster scores must be normalized to bring both the SAM Site and component detections into a common reference space. Here we use detection density, i.e. the number of detections per unit area, as the means to achieve a common reference space prior to spatially fusing the cluster scores from the candidate SAM Sites and their associated component detections.

Iii-C1 Normalization for a Single Detection Location

The amplified spatial detection field, , contains an intersected volume, , for each detection, , in . is calculated as the weighted sum of scores of each with its neighboring detections, . The weight is determined by the distance-decay function , where and is otherwise. An approximate maximum intersection volume for a single detection can be calculated by integrating the truncated distance-decay function around a detection location. As mentioned above, and are normalized using detection density. Let and , where stride is the image chip’s scanning stride distance in meters, so that . The approximate max intersection volume for a single DNN detection can then be calculated as:


Iii-C2 Max Cluster Score Truncation

The cluster score, , should also be limited for normalization. In the previous algorithm from [8], the number of detections in a cluster, , was virtually unbounded. As a result, detections that were a large distance away from the cluster location can potentially contribute to . Using the initial detections for each cluster as a base, detection locations with a haversine distance less than are given a weight of 1, otherwise the detection receives a weight of 0. The is then a weighted sum of the DNN detection inference scores with their respective weights (Procedure 1). In Section III-D, we discuss the possibility of applying a negative penalty weight to all detections with a haversine distance greater than .

Iii-C3 Approximate Max Cluster Score

The number of detections within can be seen as a Gauss Circle Problem. Thus, the max number of detections within the aperture area surrounding a cluster location can be approximated in terms of detection density as:


Using equations (1) and (2), a normalizing cluster factor can be calculated as:

Input: Alpha-cut , Mode-Cluster
Output: Ranked Clusters , where
while  do
      // Init. with chip,
     for all  do
         if  then
               penalty // 0 if no penalty
         end if
     end for
end while
sort by score, descending
Procedure 1 Object Detection Cluster Ranking with Normalized Scores and Optional Penalty

Iii-D Over-Detection Penalty

In previous work we have observed FP hotspots, i.e. large numbers of spatially co-occurring false positive detections. In order to mitigate this potential problem, a penalty can be applied when computing . As mentioned in Section III-C2, instead of using a weight of 0 when , a negative weight can be applied. We explored two types of penalty assignments. The first used a flat weight of -1. The second is similar to the distance-decay function, however the sign was changed to negative and increases in value exponentially as increases (Fig. 3). The penalty is calculated using the following formula: .

Fig. 3: Distance-decay functions used for calculating local clustering scores. The function (blue) is used as a weight when summing detections within distance and the function (red) is used to calculate an exponential penalty weight for detections outside .

Fig. 4: True Positive Rate (TPR or recall), Positive Predictive Value (PPV or precision), and F1 score versus the threshold for the cluster count of TEL cluster centers within 150 m of a candidate SAM Site location. In this example, the value 3 was used for the final threshold as shown in Table III.
Fig. 5: Processing flow chart for decision-level fusion of multiple component object detections.

Iii-E Decision-Theoretic Approach for Optimization

In order to make discrete decisions, we used the DTA [11] advocated by Lewis [12] that computes thresholds based on the optimal prediction of a model to obtain the highest expected F-measure. In this study, decision thresholds were selected based on the optimization of the F1

score from features extracted from the pseudo-candidate training dataset. (Fig. 

4). Optimal F1 score thresholds were determined through empirical analysis and selected examples are provided (Table  III).

Feature Empty Combo Missiles TELs TEL
Type LPs LPs Groups
Cluster Count 2 1 1 3 1
Raw Count 5 4 4 15 1
Raw Max 1.00000 0.99954 1.00000 1.00000 0.58236
Including Component Negatives
Cluster Count n/a n/a 1 1 1
Raw Count n/a n/a 2 2 1
Raw Max n/a n/a 0.99989 0.98450 0.60315
TABLE III: Sample thresholds calculated by DTA.

Iv Five-Fold Experiment Results

Five-fold cross validation experiments were performed for the training datasets in Table  I. The results provided in Table IV show an average F1 score for the baseline dataset and for the dataset with component negatives. The decrease in F1 score for the DNNs with component negatives was anticipated given the inclusion of objects in the negative training data that were visually similar to the component object that a given DNN was trained to detect.

Object Class TPR (%) TNR (%) F1 (%) SD
SAM Site 99.00 99.75 99.39 1.06
Empty LPs 99.80 99.70 99.65 0.2
Combo LPs 99.74 99.74 99.74 0.15
Missiles 99.8 99.46 99.63 0.24
TELs 99.72 99.38 99.55 0.32
TEL Groups 99.41 99.43 99.42 0.21
Including Component Negatives
Missiles 97.42 99.66 98.52 0.72
TELs 97.51 99.37 98.42 0.5
TEL Groups 96.78 99.6 98.15 1.1
TABLE IV: NASNet DNN five-fold cross validation results for DNN models of SAM Sites and each component, including component models with negative component data. Metrics shown are True Positive Rate (TPR), True Negative Rate (TNR), F1

score, and Standard Deviation (


NASNet significantly outperformed ResNet-101 for scanning the SE China AOI for SAM Sites (Table V). This is consistent with the cross-validation results given in Table IV. NASNet had 44 fewer SAM Site detections after the 0.9 alpha-cut (Section  III-B). Further, while both DNNs correctly detected all 16 known SAM Sites (e.g. TPs) in the SE China AOI, NASNet had 6 fewer clusters compared to ResNet-101 while the average TP cluster rank (Table V) was also 3 lower.

DNN Architecture C AVG TP
& Post-Procsessing Count Count Cluster Rank
ResNet-101 [8]  93,000  2100 181.9
NASNet 2079 354 62.8
NASNet w/ norm 2079 354 62.8
NASNet w/ norm and penalty 2079 354 62.8
TABLE V: Spatial clustering results from DNN scanning of the SE China AOI for candidate SAM Sites. Given values are pre-cluster counts over -cut threshold (), post-cluster counts, and average True Positive (TP) cluster rank.

V Decision-Level Component Metric Fusion

This section describes the feature selection and fusion techniques used to reduce the number of candidate

SAM Sites that could then be presented for human review in machine-assisted analytic workflows. An overview of the processing flow is provided in Fig. 5.

V-a Data Features

Five different feature types were used in [10] for decision-level fusion of components objects for improving the detection of construction sites. Here we tested feature types that used the F1 score optimization from [10] and represent the first three feature types listed below. We used the normalized cluster scores from the spatial clustering as an additional feature type. To maintain consistency between techniques employed in this study, only inference responses within a 150 m radius of the candidate SAM Site location were used. The feature types that were evaluated were:

  1. Maximum inference response (confidence value) for each component

  2. Number of raw (pre-clustered) inference detections for each component

  3. Number of clusters produced for each component

  4. Sum of normalized cluster scores for each component

Fig. 6: Comparison of F1 scores produced for candidate SAM Site features from different fusion techniques. Techniques include individual component threshold from DTA as well as component fusion using an OR gate and MLP. Note that ”(CN)“ at the end of the feature type label in the key indicates that component negative models were used in the processing.
Fig. 7: Comparison of True Positive Rate (TPR) produced for candidate SAM Site features from different fusion techniques.

V-B Decision-Level Fusion Techniques

Baseline results for the candidate SAM Site locations are first computed using only the spatial cluster outputs of the NASNet SAM Site detections. We then tested how each individual component would perform using the various feature types. SAM Site cluster scores were excluded because the pseudo-candidates training dataset was NOT generated through scanning and clustering. Consequently, some of the pseudo-candidates would have no cluster within a sufficient radius of the SAM Site center location.

Three data fusion techniques were tested:

  1. Decision Tree:

    A simple decision tree (see 

    [10]) was used to combine the decisions generated for each component using DTA. However, unlike [10], this study does not use an alpha-cut threshold since this was part of the spatial clustering algorithm. Therefore, the decision tree is simplified to a digital logic OR gate with the DTA decisions as binary inputs.

  2. Multi-Layer Perceptron (MLP):

    A feature vector was created for each candidate

    SAM Site location and used as input for training and validation. The MLP architecture consisted of two fully connected hidden layers of 100 nodes. We also tested normalization and feature bounds before being used as input based on the thresholds from DTA optimization (Section III-E).

  3. ANFIS: A first order Takagi-Sugeno-Kang (TSK) adaptive neuro-fuzzy inference system (ANFIS) [13] [14] [15]

    was utilized. The goal is to explore a neural encoding and subsequent optimization of expert knowledge input. Specifically, five IF-THEN rules were used whose IF components (aka rule firing strengths) were derived from the expert knowledge from the Decision Tree in 1) above. The consequent (i.e., ELSE) parameters of ANFIS were optimized via backpropagation 

    [13]. The reader can refer to [16] [17] and [18]

    for an in-depth discussion of the mathematics, optimization, and robust possibilistic clustering-based initialization of ANFIS. Finally, the output decision threshold was chosen through DTA.

The different Launch Pad detectors types were tested independently and in combination during the fusion step with the other three component types (i.e. Missiles, TELs, and TEL Groups):

  • Empty Launch Pads plus three (Empty LPs+3)

  • Combined Launch Pads plus three (Combo LPs+3)

  • Empty LPs and Combo LPs plus three (All 5)

V-C MLP Input Data Normalization

We found that the MLPs had some difficulty training with datasets that had larger values, so we used the common practice of linearly scaling and bounding to constrain the data to fall within the range . Let be the vector of values over the entire dataset for component for a given feature and let be the DTA thresholds computed for component , then the normalized and bounded vector can be defined as follows:


V-D Results & Observations

Over 200 different combinations of data feature types, component combinations, and fusion techniques were tested in this study to improve the detection of candidate SAM Sites.

Evaluation of the F1 score improvements (Table VI) shows that decision-level component fusion can reduce the relative error rate by up to 96.75%. It was somewhat surprising that the Raw Count feature generated five out of the top six best results. Although, Combo LPs were only able to generate an F1 score of 68.4% using DTA, the neural approaches (MLP and ANFIS) were able to do slightly better using multiple components where the top results fused all 5 components in an MLP to yield an F1 score of 71.4%. Comparisons of F1 scores for different feature types and fusion techniques can be found in Fig. 6.

However, when performing a broad area search for a very rare object (low geographic occurrence rate), it is often desirable to sacrifice some error reduction in order to achieve a higher TPR. The results in Table VII show that the highest F1 score is 45.1% while achieving a TPR of 100%. Although this F1 score is less than half of the maximum in Table  VI, this technique still achieved a 88.5% relative error reduction compared to the baseline (no component fusion) results for the candidate SAM Site locations within the SE China AOI. These scores were produced using Cluster Count features and the All 5 component combination as inputs to a simple MLP. It is also worth noting that four of the top five scores used the Empty LPs+3 component combination. Comparisons of TPRs for different feature types and fusion techniques can be found in Fig. 7.

It was also observed that cluster score truncation and normalization was able to improve the F1 scores for DTA when fusing multiple component detectors. However, the introduction of negative score penalty did not improve the score further (Fig  8), while introducing expert weighting (described in Section VI-A) also showed no improvement for the F1 scores.

Components Feature Processing Component TP FP TPR PPV F1 score Error/ km2 Relative Error
Type Technique Negatives (Recall) (Precision) (x) Reduction
SAM Sites BASELINE-NO COMPONENTS 16 338 100.00% 4.52% 8.65% 3.080 n/a
All 5 Raw Counts MLP NO 15 11 93.75% 57.69% 71.43% 0.109 96.45%
Combo LPs+3 Raw Counts ANFIS NO 13 8 81.25% 61.90% 70.27% 0.100 96.75%
Combo LPs+3 Raw Counts MLP NO 14 10 87.50% 58.33% 70.00% 0.109 96.45%
Combo LPs Cluster Count DTA n/a 13 9 81.25% 59.09% 68.42% 0.109 96.45%
Combo LPs Raw Count DTA n/a 13 9 81.25% 59.09% 68.42% 0.109 96.45%
All 5 Raw Count ANFIS NO 13 9 81.25% 59.09% 68.42% 0.109 96.45%
TABLE VI: Experiment results with highest F1 Scores. The first line after the header (in red) is the SAM Site candidates without error reduction from spatial fusion of the component detections. Highest F1 scores were from fusing multiple component detections using neural learning (MLP or ANFIS). Also, raw detection count features (pre-clustering) showed the most separability. All top solutions show a reduction of relative error greater than . These results would be optimal if error reduction was the only goal. Error includes false positives and false negatives.
Components Feature Processing Component TP FP TPR PPV F1 score Error/ km2 Relative Error
Type Technique Negatives (Recall) (Precision) (x) Reduction
SAM Sites BASELINE-NO COMPONENTS 16 338 100% 4.52% 8.65% 3.080 n/a
All 5 Cluster Count MLP NO 16 39 100% 29.09% 45.07% 0.355 88.46%
Empty LPs+3 Cluster Count MLP (Normalized) NO 16 43 100% 27.12% 42.67% 0.392 87.28%
Empty LPs+3 Cluster Count MLP NO 16 45 100% 26.23% 41.56% 0.410 86.69%
Empty LPs+3 Raw Count MLP (Normalized) YES 16 48 100% 25.00% 40.00% 0.437 85.80%
Empty LPs+3 Raw Count MLP NO 16 50 100% 24.24% 39.02% 0.456 85.21%
TABLE VII: Experiment results with highest F1 scores while maintaining a TPR of 100%. The highest F1 scores resulted from fusing all component detections with a simple MLP. Also, Cluster Count features yielded the top results. All top solutions show a reduction of relative error between which is 3 the error rate shown in Table VI.

Additionally, in general there was improvement in F1 scores for models trained with component negatives, however these improvements came at a sacrifice in TPR and only have one appearance in the Tables VI and VII. This can be interpreted as ambiguity being introduced to the dataset by essentially asking the detector to ignore the background (i.e. the Launch Pad) and focus on the smaller component.

Fig. 8: F1 score results for DTA thresholds of original cluster scores, normalized cluster scores, cluster scores with a penalty of -1 and distance-decay penalty with m.

Vi Component Metrics Fusion for Improving Candidate Sam Site Rankings

This sections discusses techniques, observations, and results used to re-rank candidate SAM Sites for utilization in machine-assisted human analytic workflows. The objective is to utilize the component detections to re-rank the candidate SAM Sites such that true SAM Sites appear higher in a rank-ordered list relative to a baseline ranking derived only from the candidate SAM Sites’ cluster scores (Table VIII). An overview of the processing flow is given in Fig. 9.

Vi-a Candidate Site and Component Score Spatial Fusion

Normalized cluster scores for candidate SAM Sites and all components found within R are summed using uniform or human expert provided weights (Fig. 9). Expert weights were only used when fusing all four components with its corresponding candidate SAM Site. The weights were: 4 for Launch Pads, 2 for TEL Groups, and 1 for Missiles, TELs, and SAM Sites.

Fig. 9: Process flow used for improved ranking of candidate SAM Sites.

Vi-B Results & Observations

The TEL detector rendered the most improvement in the average cluster rank of known SAM Sites (TPs) compared to fusion with any other single component detector (Table VIII). This, coupled with the Combo LPs detector and other component detectors trained with expert weighting (Section VI-A) improved the average cluster rank of known SAM Sites (TPs) to 15.9 (Table IX). This is 4 better than the average rank for SAM Sites without spatial fusion of the component object cluster scores.

We observed that the addition of normalization and penalty had no detectable impact on the known SAM Site

TP average cluster rank. This indicates minimal FP presence and/or uniformly distributed FP noise within the candidate

SAM Site locations generated by the spatial clustering algorithm.

Component negative models improved the ranking results compared to the SAM Site score alone, but not as well as models trained without component negatives. Again, this can be interpreted as ambiguity being introduced to the dataset by essentially asking the detector to ignore the background (i.e. the Launch Pad) and focus on the smaller component.

with Single Component Fusion
SAM Site Only

Empty LPs

Combo LPs



TEL Groups

ResNet-101[8] 139.9 n/a n/a n/a n/a n/a
NASNet 62.8 36.4 40.8 43.0 28.0 46.1
w/ Norm 62.8 34.3 34.4 43.6 28.1 47.3
w/ Norm & Penalty 62.8 34.0 34.3 43.6 27.9 47.1
Including Component Negatives
w/ Norm n/a n/a n/a 79.1 28.8 51.6
w/ Norm & Penalty n/a n/a n/a 79.1 28.7 51.48
TABLE VIII: Average rank of known SAM Sites (TPs) in SE China AOI from fusing cluster scores from a single component object class with a baseline candidate SAM Site cluster score.
with Single Component Fusion









NASNet 26.3 21.4 25.3 22.9
w/ Norm 20.3 22.9 17.9 15.9
w/ Norm & Penalty 19.9 22.5 17.8 16.0
Including Component Negatives
w/ Norm 24.8 24.9 18.1 16.8
w/ Norm & Penalty 24.1 24.9 18.1 16.8
TABLE IX: Average rank of known SAM Sites (TPs) in SE China AOI from fusing cluster scores from all four component object classes with the baseline candidate SAM Site cluster score.

Vii Conclusion and Future Work

This study extended the work in [8] where a combination of a DNN scanning and spatial clustering was used to perform a machine-assisted broad area search and detection of SAM Sites in a SE China AOI of 90,000 km2.

Here we significantly improved upon this prior study by using multiple DNNs to detect smaller component objects, e.g. Launch Pads, TELs, etc. belonging to the larger and more complex SAM Site feature. Scores computed from an enhanced spatial clustering algorithm were normalized to a reference space so that they were independent of image resolution and DNN input chip size. DNN detections from the multiple component objects were then fused to improve the final detection and retrieval (ranking) of DNN detections of candidate SAM Sites. Key results from this effort include:

  1. A initial set of 350 SAM Site detections (Table V) was reduced to only 25 candidate SAM Sites (Table VI) using neural learning techniques to spatially fuse DNN detections from multiple component objects.

  2. The results in 1) reduced the overall error rate by 85% while still preserving a 100% TPR (Table VII).

  3. The average rank of 16 known SAM Sites (TPs) in a list of 350 candidate SAM Sites was improved by 9 (Tables VIII and IX) compared to the previous study [8].

In future work we plan to A) apply this approach to a variety of other challenging object search and detection problems in large-scale remote sensing image datasets, B) investigate data-driven optimization of the component fusion weights and compare performance vs. human-expert provided weights, C) extend this approach to include fusion of multi-temporal DNN detections, D) extend this approach to include fusion of multi-source DNN detectors applied to high-resolution EO/MS and SAR imagery, and E) explore how to use more sophisticated fusion techniques (similar to ANFIS) to maintain TPR while achieving even higher error reduction.