I Introduction
Deep Neural Networks (DNN) have shown through extensive experimental validation to deliver outstanding performance for object detection/recognition in a variety of benchmark high-resolution EO/MS remote sensing image datasets [1]-[7]. The demonstrated ability of DNNs to automatically detect a wide variety of man-made objects with very high accuracy has tremendous potential to assist human analysts in labor-intensive visual searches for objects of interest in high-resolution imagery over large areas of the Earth’s surface. However, even DNN detectors with exceptionally high accuracy (e.g. 99%) will still generate a tremendous number of errors when applied to large-scale remote-sensing image datasets. For example, a DNN detector with 99% average accuracy, chip size of 128 x 128 pixels, and a chip scan overlap of 50% will generate 88,000 errors when applied to a 0.5 m GSD image dataset covering an AOI of only 10,000 km2 (e.g. 1° x 1° cell).
If post DNN detection results are intended to be reviewed by human analysts in machine-assisted analytic workflows, then large numbers of detection errors can quickly lead to “error fatigue” and a corresponding negative end-user perception of a machine-assisted workflows. Thus, it is important to develop methods to reduce error rates resulting from application of DNN detectors to large-scale remote-sensing image datasets to improve machine-assisted analytic workflows.
Ii Study Area and Source Data
This study builds upon Marcum et al. [8] where broad area search and detection of SAM Sites (Fig. 1) was demonstrated over a 90,000 km2 study Area of Interest (AOI) along the SE coast of China. Key results from the prior study were:
-
A machine-assisted approach was used to reduce the original AOI search area by 660 to only 135 km2.
-
The average machine-assisted search time for 2100 candidate SAM Site locations was 42 minutes which was 81 faster than a traditional human visual search.
While Marcum et al. used a single binary DNN detector to locate candidate SAM Site locations, here we explore the benefit of fusing multi-scale DNN detectors of smaller component objects to improve the detection of the larger encompassing SAM Site features.
First, a binary SAM Site DNN detector was trained using a slightly enhanced version of the curated SAM Sites training data in China from [8]. To ensure blind scanning, only 101 SAM Site lying outside the SE China AOI were used to train the DNN. While [8] used a 227x227 pixel chip size at 1 m GSD to train a ResNet-101 DNN, here we used a 299x299 pixel chip size at 1 m GSD for training a NASNet DNN.
As in [8], negative training chip samples were selected using a 5-km offset in the four cardinal directions (i.e. N/S/E/W) for each SAM Site. The SE China AOI has 16 known SAM Sites which includes 2 newer SAM Sites found in the previous study [8].
We next developed binary DNN detectors for four different SAM Site component objects: Launch Pads
In addition, we created a second training dataset using all four components to train a combined Launch Pad detector (empty and non-empty) knowing that the other components (e.g Missiles, TELs , etc.) are generally co-located with Launch Pads. We then developed a second set of component detectors for the Missile, TEL, and TEL Group object classes by combining negative training data from the other components and then randomly paring down the data to produce a 4:1 ratio of negative to positive samples (Table I). For the Missile component, samples from empty Launch Pads, TEL, and TEL Group and their negatives were added. However, only samples from empty Launch Pads and Missiles were added to the negatives for TELs and TEL Groups to reduce confusion between these two components.
Different chip sizes were used for the training samples based on known object sizes. A 128x128 pixel chip size was used for detecting both empty and combined Launch Pads and TEL Groups. While a 64x64 pixel chip size was used for Missiles and TELs. Counts for all training data are provided in Table I and these only include component samples outside the SE China AOI to ensure blind scanning.
Object | SAM | Launch | Missiles | TELs | TEL |
Class | Sites | Pads | Groups | ||
TP | 101 | 391011footnotemark: 1 | 1976 | 2733 | 1179 |
TN | 404 | 3696 | 2624 | 2272 | 1054 |
Combo TP | n/a | 9798 | as above | as above | as above |
Combo TN | n/a | 8512 | 6530 | 10,078 | 5762 |
1: Empty | |||||
2: Includes those with co-located Missiles, TELs, and TEL Groups |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Iii Data Processing
Iii-a Training Data Augmentation
Augmentation strategies from [8] were used to train the SAM Site DNN and all component DNNs to improve detector performance. A 144 augmentation was used for all 5-fold validation experiments while a 9504 augmentation was used for the final SAM Site DNN used in the AOI scanning. To save computing time, augmentations were reduced for training the component DNNs due to the much larger sample sizes. These changes included using RGB samples only, reducing the number of rotations, using a single jitter distance, and removing the contrast augmentation. Most of the final component DNNs were trained with 648 augmentations, except the combined Launch Pad DNN used a 216 augmentation.
Iii-B Scanning and Spatial Clustering
We used the Neural Architecture Search Network (NASNet) [9]
for training all DNN detectors along with transfer learning from ImageNet weights. The NASNet DNN significantly outperformed the ResNet-101 DNN (Table
II) which was the best performing SAM Site DNN detector evaluated in [8].DNN | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
---|---|---|---|---|
ResNet-101 | 99.0 | 98.8 | 96.4 | 96.4 |
NASNet | 99.9 | 99.0 | 99.4 | 99.4 |
As in [8], images used for broad area search for SAM Sites in the SE China AOI were comprised of 66K 1280x1280 pixel tiles at 1 m GSD with 10% overlap between tiles. Individual tiles were scanned by generating
19.7M image chips with 75% overlap (25% stride) that were then input to the NASNet DNN. This produced a detection field,
, of softmax outputs from the DNN. After thresholding at , a greatly reduced detection field, , is then used to produce an amplified spatial detection field, . The is used to weight a spatial clustering of to produce mode clusters, , within a 300-m aperture radius (see [8]). Cluster locations were then rank-ordered by summing the scores of all detections within a mode cluster to generate an initial set of “candidate” SAM Sites.A new 1280x1280 pixel tile at 0.5 m GSD centered on each candidate SAM Site’s cluster location was then used for all component DNN scans. Component scanning outputs were also spatially clustered to generate locations and cluster scores for each component object. An aperture radius of = 32 m was used since this is approximately half the typical distance between SAM Site launch pads in China. An alpha cut of was used to generate distinct cluster locations for a given component relative to neighboring components that were present at each candidate SAM Site.
Likewise, in order to determine thresholds used for the decision-theoretic approach (DTA) described in Section III-E, a training set of 1280x1280 pixel pseudo-candidate tiles were generated that were centered about the known SAM Sites outside the SE China AOI along with corresponding offset tile negatives. The same scans and processes performed for candidate tiles within the SE China AOI (described above), were used for the pseudo-candidate training dataset.
Iii-C Cluster Score Normalization and Truncation
Cluster scores from one object class to another are not necessarily comparable since they can result from objects with different physical sizes and corresponding R values. In addition, results generated from image tile scans with different DNN input chip size and/or GSD will have a variable spatial density. Since we wish to spatially fuse, and potentially weight, various component DNN detections, the cluster scores must be normalized to bring both the SAM Site and component detections into a common reference space. Here we use detection density, i.e. the number of detections per unit area, as the means to achieve a common reference space prior to spatially fusing the cluster scores from the candidate SAM Sites and their associated component detections.
Iii-C1 Normalization for a Single Detection Location
The amplified spatial detection field, , contains an intersected volume, , for each detection, , in . is calculated as the weighted sum of scores of each with its neighboring detections, . The weight is determined by the distance-decay function , where and is otherwise. An approximate maximum intersection volume for a single detection can be calculated by integrating the truncated distance-decay function around a detection location. As mentioned above, and are normalized using detection density. Let and , where stride is the image chip’s scanning stride distance in meters, so that . The approximate max intersection volume for a single DNN detection can then be calculated as:
(1) | ||||
Iii-C2 Max Cluster Score Truncation
The cluster score, , should also be limited for normalization. In the previous algorithm from [8], the number of detections in a cluster, , was virtually unbounded. As a result, detections that were a large distance away from the cluster location can potentially contribute to . Using the initial detections for each cluster as a base, detection locations with a haversine distance less than are given a weight of 1, otherwise the detection receives a weight of 0. The is then a weighted sum of the DNN detection inference scores with their respective weights (Procedure 1). In Section III-D, we discuss the possibility of applying a negative penalty weight to all detections with a haversine distance greater than .
Iii-C3 Approximate Max Cluster Score
The number of detections within can be seen as a Gauss Circle Problem. Thus, the max number of detections within the aperture area surrounding a cluster location can be approximated in terms of detection density as:
(2) |
Using equations (1) and (2), a normalizing cluster factor can be calculated as:
(3) | ||||
Iii-D Over-Detection Penalty
In previous work we have observed FP hotspots, i.e. large numbers of spatially co-occurring false positive detections. In order to mitigate this potential problem, a penalty can be applied when computing . As mentioned in Section III-C2, instead of using a weight of 0 when , a negative weight can be applied. We explored two types of penalty assignments. The first used a flat weight of -1. The second is similar to the distance-decay function, however the sign was changed to negative and increases in value exponentially as increases (Fig. 3). The penalty is calculated using the following formula: .

Iii-E Decision-Theoretic Approach for Optimization
In order to make discrete decisions, we used the DTA [11] advocated by Lewis [12] that computes thresholds based on the optimal prediction of a model to obtain the highest expected F-measure. In this study, decision thresholds were selected based on the optimization of the F1
score from features extracted from the pseudo-candidate training dataset. (Fig.
4). Optimal F1 score thresholds were determined through empirical analysis and selected examples are provided (Table III).Feature | Empty | Combo | Missiles | TELs | TEL |
Type | LPs | LPs | Groups | ||
Cluster Count | 2 | 1 | 1 | 3 | 1 |
Raw Count | 5 | 4 | 4 | 15 | 1 |
Raw Max | 1.00000 | 0.99954 | 1.00000 | 1.00000 | 0.58236 |
Including Component Negatives | |||||
Cluster Count | n/a | n/a | 1 | 1 | 1 |
Raw Count | n/a | n/a | 2 | 2 | 1 |
Raw Max | n/a | n/a | 0.99989 | 0.98450 | 0.60315 |
Iv Five-Fold Experiment Results
Five-fold cross validation experiments were performed for the training datasets in Table I. The results provided in Table IV show an average F1 score for the baseline dataset and for the dataset with component negatives. The decrease in F1 score for the DNNs with component negatives was anticipated given the inclusion of objects in the negative training data that were visually similar to the component object that a given DNN was trained to detect.
Object Class | TPR (%) | TNR (%) | F1 (%) | SD |
SAM Site | 99.00 | 99.75 | 99.39 | 1.06 |
Empty LPs | 99.80 | 99.70 | 99.65 | 0.2 |
Combo LPs | 99.74 | 99.74 | 99.74 | 0.15 |
Missiles | 99.8 | 99.46 | 99.63 | 0.24 |
TELs | 99.72 | 99.38 | 99.55 | 0.32 |
TEL Groups | 99.41 | 99.43 | 99.42 | 0.21 |
Including Component Negatives | ||||
Missiles | 97.42 | 99.66 | 98.52 | 0.72 |
TELs | 97.51 | 99.37 | 98.42 | 0.5 |
TEL Groups | 96.78 | 99.6 | 98.15 | 1.1 |
score, and Standard Deviation (
SD).NASNet significantly outperformed ResNet-101 for scanning the SE China AOI for SAM Sites (Table V). This is consistent with the cross-validation results given in Table IV. NASNet had 44 fewer SAM Site detections after the 0.9 alpha-cut (Section III-B). Further, while both DNNs correctly detected all 16 known SAM Sites (e.g. TPs) in the SE China AOI, NASNet had 6 fewer clusters compared to ResNet-101 while the average TP cluster rank (Table V) was also 3 lower.
DNN Architecture | C | AVG TP | |
& Post-Procsessing | Count | Count | Cluster Rank |
ResNet-101 [8] | 93,000 | 2100 | 181.9 |
NASNet | 2079 | 354 | 62.8 |
NASNet w/ norm | 2079 | 354 | 62.8 |
NASNet w/ norm and penalty | 2079 | 354 | 62.8 |
V Decision-Level Component Metric Fusion
This section describes the feature selection and fusion techniques used to reduce the number of candidate
SAM Sites that could then be presented for human review in machine-assisted analytic workflows. An overview of the processing flow is provided in Fig. 5.V-a Data Features
Five different feature types were used in [10] for decision-level fusion of components objects for improving the detection of construction sites. Here we tested feature types that used the F1 score optimization from [10] and represent the first three feature types listed below. We used the normalized cluster scores from the spatial clustering as an additional feature type. To maintain consistency between techniques employed in this study, only inference responses within a 150 m radius of the candidate SAM Site location were used. The feature types that were evaluated were:
-
Maximum inference response (confidence value) for each component
-
Number of raw (pre-clustered) inference detections for each component
-
Number of clusters produced for each component
-
Sum of normalized cluster scores for each component


V-B Decision-Level Fusion Techniques
Baseline results for the candidate SAM Site locations are first computed using only the spatial cluster outputs of the NASNet SAM Site detections. We then tested how each individual component would perform using the various feature types. SAM Site cluster scores were excluded because the pseudo-candidates training dataset was NOT generated through scanning and clustering. Consequently, some of the pseudo-candidates would have no cluster within a sufficient radius of the SAM Site center location.
Three data fusion techniques were tested:
-
Decision Tree:
A simple decision tree (see
[10]) was used to combine the decisions generated for each component using DTA. However, unlike [10], this study does not use an alpha-cut threshold since this was part of the spatial clustering algorithm. Therefore, the decision tree is simplified to a digital logic OR gate with the DTA decisions as binary inputs. -
Multi-Layer Perceptron (MLP):
A feature vector was created for each candidate
SAM Site location and used as input for training and validation. The MLP architecture consisted of two fully connected hidden layers of 100 nodes. We also tested normalization and feature bounds before being used as input based on the thresholds from DTA optimization (Section III-E). -
ANFIS: A first order Takagi-Sugeno-Kang (TSK) adaptive neuro-fuzzy inference system (ANFIS) [13] [14] [15]
was utilized. The goal is to explore a neural encoding and subsequent optimization of expert knowledge input. Specifically, five IF-THEN rules were used whose IF components (aka rule firing strengths) were derived from the expert knowledge from the Decision Tree in 1) above. The consequent (i.e., ELSE) parameters of ANFIS were optimized via backpropagation
[13]. The reader can refer to [16] [17] and [18]for an in-depth discussion of the mathematics, optimization, and robust possibilistic clustering-based initialization of ANFIS. Finally, the output decision threshold was chosen through DTA.
The different Launch Pad detectors types were tested independently and in combination during the fusion step with the other three component types (i.e. Missiles, TELs, and TEL Groups):
-
Empty Launch Pads plus three (Empty LPs+3)
-
Combined Launch Pads plus three (Combo LPs+3)
-
Empty LPs and Combo LPs plus three (All 5)
V-C MLP Input Data Normalization
We found that the MLPs had some difficulty training with datasets that had larger values, so we used the common practice of linearly scaling and bounding to constrain the data to fall within the range . Let be the vector of values over the entire dataset for component for a given feature and let be the DTA thresholds computed for component , then the normalized and bounded vector can be defined as follows:
(4) |
V-D Results & Observations
Over 200 different combinations of data feature types, component combinations, and fusion techniques were tested in this study to improve the detection of candidate SAM Sites.
Evaluation of the F1 score improvements (Table VI) shows that decision-level component fusion can reduce the relative error rate by up to 96.75%. It was somewhat surprising that the Raw Count feature generated five out of the top six best results. Although, Combo LPs were only able to generate an F1 score of 68.4% using DTA, the neural approaches (MLP and ANFIS) were able to do slightly better using multiple components where the top results fused all 5 components in an MLP to yield an F1 score of 71.4%. Comparisons of F1 scores for different feature types and fusion techniques can be found in Fig. 6.
However, when performing a broad area search for a very rare object (low geographic occurrence rate), it is often desirable to sacrifice some error reduction in order to achieve a higher TPR. The results in Table VII show that the highest F1 score is 45.1% while achieving a TPR of 100%. Although this F1 score is less than half of the maximum in Table VI, this technique still achieved a 88.5% relative error reduction compared to the baseline (no component fusion) results for the candidate SAM Site locations within the SE China AOI. These scores were produced using Cluster Count features and the All 5 component combination as inputs to a simple MLP. It is also worth noting that four of the top five scores used the Empty LPs+3 component combination. Comparisons of TPRs for different feature types and fusion techniques can be found in Fig. 7.
It was also observed that cluster score truncation and normalization was able to improve the F1 scores for DTA when fusing multiple component detectors. However, the introduction of negative score penalty did not improve the score further (Fig 8), while introducing expert weighting (described in Section VI-A) also showed no improvement for the F1 scores.
Components | Feature | Processing | Component | TP | FP | TPR | PPV | F1 score | Error/ km2 | Relative Error |
Type | Technique | Negatives | (Recall) | (Precision) | (x) | Reduction | ||||
SAM Sites | BASELINE-NO COMPONENTS | 16 | 338 | 100.00% | 4.52% | 8.65% | 3.080 | n/a | ||
All 5 | Raw Counts | MLP | NO | 15 | 11 | 93.75% | 57.69% | 71.43% | 0.109 | 96.45% |
Combo LPs+3 | Raw Counts | ANFIS | NO | 13 | 8 | 81.25% | 61.90% | 70.27% | 0.100 | 96.75% |
Combo LPs+3 | Raw Counts | MLP | NO | 14 | 10 | 87.50% | 58.33% | 70.00% | 0.109 | 96.45% |
Combo LPs | Cluster Count | DTA | n/a | 13 | 9 | 81.25% | 59.09% | 68.42% | 0.109 | 96.45% |
Combo LPs | Raw Count | DTA | n/a | 13 | 9 | 81.25% | 59.09% | 68.42% | 0.109 | 96.45% |
All 5 | Raw Count | ANFIS | NO | 13 | 9 | 81.25% | 59.09% | 68.42% | 0.109 | 96.45% |
Components | Feature | Processing | Component | TP | FP | TPR | PPV | F1 score | Error/ km2 | Relative Error |
Type | Technique | Negatives | (Recall) | (Precision) | (x) | Reduction | ||||
SAM Sites | BASELINE-NO COMPONENTS | 16 | 338 | 100% | 4.52% | 8.65% | 3.080 | n/a | ||
All 5 | Cluster Count | MLP | NO | 16 | 39 | 100% | 29.09% | 45.07% | 0.355 | 88.46% |
Empty LPs+3 | Cluster Count | MLP (Normalized) | NO | 16 | 43 | 100% | 27.12% | 42.67% | 0.392 | 87.28% |
Empty LPs+3 | Cluster Count | MLP | NO | 16 | 45 | 100% | 26.23% | 41.56% | 0.410 | 86.69% |
Empty LPs+3 | Raw Count | MLP (Normalized) | YES | 16 | 48 | 100% | 25.00% | 40.00% | 0.437 | 85.80% |
Empty LPs+3 | Raw Count | MLP | NO | 16 | 50 | 100% | 24.24% | 39.02% | 0.456 | 85.21% |
Additionally, in general there was improvement in F1 scores for models trained with component negatives, however these improvements came at a sacrifice in TPR and only have one appearance in the Tables VI and VII. This can be interpreted as ambiguity being introduced to the dataset by essentially asking the detector to ignore the background (i.e. the Launch Pad) and focus on the smaller component.
Vi Component Metrics Fusion for Improving Candidate Sam Site Rankings
This sections discusses techniques, observations, and results used to re-rank candidate SAM Sites for utilization in machine-assisted human analytic workflows. The objective is to utilize the component detections to re-rank the candidate SAM Sites such that true SAM Sites appear higher in a rank-ordered list relative to a baseline ranking derived only from the candidate SAM Sites’ cluster scores (Table VIII). An overview of the processing flow is given in Fig. 9.
Vi-a Candidate Site and Component Score Spatial Fusion
Normalized cluster scores for candidate SAM Sites and all components found within R are summed using uniform or human expert provided weights (Fig. 9). Expert weights were only used when fusing all four components with its corresponding candidate SAM Site. The weights were: 4 for Launch Pads, 2 for TEL Groups, and 1 for Missiles, TELs, and SAM Sites.
Vi-B Results & Observations
The TEL detector rendered the most improvement in the average cluster rank of known SAM Sites (TPs) compared to fusion with any other single component detector (Table VIII). This, coupled with the Combo LPs detector and other component detectors trained with expert weighting (Section VI-A) improved the average cluster rank of known SAM Sites (TPs) to 15.9 (Table IX). This is 4 better than the average rank for SAM Sites without spatial fusion of the component object cluster scores.
We observed that the addition of normalization and penalty had no detectable impact on the known SAM Site
TP average cluster rank. This indicates minimal FP presence and/or uniformly distributed FP noise within the candidate
SAM Site locations generated by the spatial clustering algorithm.Component negative models improved the ranking results compared to the SAM Site score alone, but not as well as models trained without component negatives. Again, this can be interpreted as ambiguity being introduced to the dataset by essentially asking the detector to ignore the background (i.e. the Launch Pad) and focus on the smaller component.
with Single Component Fusion | ||||||
SAM Site Only |
Empty LPs |
Combo LPs |
Missiles |
TELs |
TEL Groups |
|
ResNet-101[8] | 139.9 | n/a | n/a | n/a | n/a | n/a |
NASNet | 62.8 | 36.4 | 40.8 | 43.0 | 28.0 | 46.1 |
w/ Norm | 62.8 | 34.3 | 34.4 | 43.6 | 28.1 | 47.3 |
w/ Norm & Penalty | 62.8 | 34.0 | 34.3 | 43.6 | 27.9 | 47.1 |
Including Component Negatives | ||||||
w/ Norm | n/a | n/a | n/a | 79.1 | 28.8 | 51.6 |
w/ Norm & Penalty | n/a | n/a | n/a | 79.1 | 28.7 | 51.48 |
with Single Component Fusion | ||||||||
---|---|---|---|---|---|---|---|---|
Unweighted |
Fusion |
Weighted |
Fusion |
Unweighted |
Fusion |
Weighted |
Fusion |
|
NASNet | 26.3 | 21.4 | 25.3 | 22.9 | ||||
w/ Norm | 20.3 | 22.9 | 17.9 | 15.9 | ||||
w/ Norm & Penalty | 19.9 | 22.5 | 17.8 | 16.0 | ||||
Including Component Negatives | ||||||||
w/ Norm | 24.8 | 24.9 | 18.1 | 16.8 | ||||
w/ Norm & Penalty | 24.1 | 24.9 | 18.1 | 16.8 |
Vii Conclusion and Future Work
This study extended the work in [8] where a combination of a DNN scanning and spatial clustering was used to perform a machine-assisted broad area search and detection of SAM Sites in a SE China AOI of 90,000 km2.
Here we significantly improved upon this prior study by using multiple DNNs to detect smaller component objects, e.g. Launch Pads, TELs, etc. belonging to the larger and more complex SAM Site feature. Scores computed from an enhanced spatial clustering algorithm were normalized to a reference space so that they were independent of image resolution and DNN input chip size. DNN detections from the multiple component objects were then fused to improve the final detection and retrieval (ranking) of DNN detections of candidate SAM Sites. Key results from this effort include:
-
The results in 1) reduced the overall error rate by 85% while still preserving a 100% TPR (Table VII).
In future work we plan to A) apply this approach to a variety of other challenging object search and detection problems in large-scale remote sensing image datasets, B) investigate data-driven optimization of the component fusion weights and compare performance vs. human-expert provided weights, C) extend this approach to include fusion of multi-temporal DNN detections, D) extend this approach to include fusion of multi-source DNN detectors applied to high-resolution EO/MS and SAR imagery, and E) explore how to use more sophisticated fusion techniques (similar to ANFIS) to maintain TPR while achieving even higher error reduction.
References
- [1] G. J. Scott, K. C. Hagan, R. A. Marcum, J. A. Hurt, D. T. Anderson, and C. H. Davis, “Enhanced fusion of deep neural networks for classification of benchmark high-resolution image datasets,” IEEE Geoscience & Remote Sensing Letters, Vol. 15, No. 9, pp. 1451-1455, 2018, DOI: 10.1109/LGRS.2018.2839092.
-
[2]
J. A. Hurt, G. J. Scott, D. T. Anderson, C. H. Davis, ”Benchmark meta-dataset of high-resolution remote sensing imagery for training robust deep learning models in machine-assisted visual analytics”,
2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
, 9-11 October, 2018. -
[3]
G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C.H. Davis (2017), “Training deep convolutional neural networks for land cover classification of high-resolution imagery,”
IEEE Geoscience & Remote Sensing Letters, Vol. 14, No. 4, pp. 549-553, DOI: 10.1109/LGRS.2017. 2657778. - [4] G. J. Scott , R. A. Marcum, C. H. Davis and T. W. Nivin, “Fusion of deep convolutional neural networks for land cover classification of high-resolution imagery,” IEEE Geoscience & Remote Sensing Letters, Vol. 14, No. 9, 2017, pp. 1638-1642, DOI: 10.1109/LGRS. 2017.2722988.
- [5] Y. Yang and S. Newsam, “Bag-of-visual words and spatial extensions for land-use classification,” Proc. ACM SIGSPATIAL Int. Conf. Adv. Geogr. Inf. Syst., 2010, pp. 270–279.
-
[6]
G. Sheng, W. Yang, T. Xu, and H. Sun, “High-resolution satellite scene classification using a sparse coding based multiple feature combination,”
International Journal of Remote Sensing, Vol. 33, No. 8, 2012, pp. 2395–2412. - [7] G. Cheng, J. Han, and X. Lu., “Remote sensing image scene classification: benchmark and state of the art,” Proceedings of the IEEE, Vol. 105, No. 10, 2017, pp. 1865-1883, DOI: 10.1109/JPROC.2017.2675998.
- [8] R. A. Marcum, C. H. Davis, G. J. Scott, and T. W. Nivin, “Rapid broad area search and detection of Chinese surface-to-air missile sites using deep convolutional neural networks,” Journal of Applied Remote Sensing, Vol. 11, No. 4, 042614, 2017, DOI: 10.1117/1.JRS.11.042614.
- [9] B. Zoph, V. Vasudevan, J. Shlens, and Q.V. Le., “Learning transferable architectures for scalable image recognition”, 2017, arXiv preprint, arXiv: 1707.07012.
- [10] A. B. Cannaday II, R. L. Chastain, J. A. Hurt, C. H. Davis, G. J. Scott and A. J. Maltenfort, ”Decision-Level Fusion of DNN Outputs for Improving Feature Detection Performance on Large-Scale Remote Sensing Image Datasets,” 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 5428-5436, DOI: 10.1109/BigData47090.2019.9006502.
-
[11]
N. Ye, K. M. A. Chai, W. S. Lee, H.L Chieu ”Optimizing F-measures: a tale of two approaches”,
Proceedings of the International Conference on Machine Learning
, 2012. - [12] D. D. Lewis, “Evaluating and optimizing autonomous text classification systems”. InSIGIR, 1995, pp. 246–254.
- [13] J. Jang, ”ANFIS Adaptive-Network-based Fuzzy Inference System. Systems, Man and Cybernetics”, IEEE Transactions on. 23. 1993, pp. 665 - 685, 10.1109/21.256541.
- [14] A. Abraham, ”Adaptation of Fuzzy Inference System Using Neural Learning”, in Nedjah, Nadia; de Macedo Mourelle, Luiza (eds.), Fuzzy Systems Engineering: Theory and Practice, Studies in Fuzziness and Soft Computing, 181, Germany: Springer Verlag, 2005, pp. 53–83.
- [15] D. Karaboga, and E. Kaya, ”Adaptive network based fuzzy inference system (ANFIS) training approaches: a comprehensive survey,” Artificial Intelligence Review, 2018, DOI: 10.1007/s10462-017-9610-2.
- [16] B. Ruprecht, C. Veal, B. Murray,M. Islam, D. Anderson, F. Petry, J. Keller, G. Scott, and C. Davis, “Fuzzy logic-based fusion of deep learners in remote sensing,” IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2019.
- [17] B. Ruprecht, C. Veal, A. Cannaday, D. Anderson, F. Petry, J. Keller, G. Scott, C.Davis, C. Northworthy, K. Nock, and E. Glimour, “Are neural fuzzy logic systems really explainable and interpretable?,” SPIE Security and Defense, 2020.
- [18] B. Ruprecht, C. Veal, B. Murray,M. Islam, D. Anderson, F. Petry, J. Keller, G. Scott, and C. Davis, “Possibilistic Clustering Enabled Neuro Fuzzy Logic,” under review, World Congress on Computational Intelligence, 2020.