Deep Multiple Instance Learning for Airplane Detection in High Resolution Imagery

Automatic airplane detection in aerial imagery has a variety of applications. Two of the major challenges in this area are variations in scale and direction of the airplanes. In order to solve these challenges, we present a rotation-and-scale invariant airplane proposal generator. This proposal generator is developed based on the symmetric and regular boundaries of airplanes from the top view called symmetric line segments (SLS). Then, the generated proposals are used to train a deep convolutional neural network for removing non-airplane proposals. Since each airplane can have multiple SLS proposals, where some of them are not in the direction of the fuselage, we collect all proposals correspond to one ground truth as a positive bag and the others as the negative instances. To have multiple instance deep learning, we modify the training approach of the network to learn from each positive bag at least one instance as well as all negative instances. Finally, we employ non-maximum suppression to remove duplicate detections. Our experiments on NWPU VHR-10 dataset show that our method is a promising approach for automatic airplane detection in very high resolution images. Moreover, the proposed algorithm can estimate the direction of the airplanes using box-level annotations as an extra achievement.



There are no comments yet.


page 2

page 3

page 4

page 5

page 6

page 9


Sparse Network Inversion for Key Instance Detection in Multiple Instance Learning

Multiple Instance Learning (MIL) involves predicting a single label for ...

Automatic Ship Classification Utilizing Bag of Deep Features

Detection and classification of ships based on their silhouette profiles...

Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection

The low resolution of objects of interest in aerial images makes pedestr...

Accounting for Dependencies in Deep Learning Based Multiple Instance Learning for Whole Slide Imaging

Multiple instance learning (MIL) is a key algorithm for classification o...

PIGMIL: Positive Instance Detection via Graph Updating for Multiple Instance Learning

Positive instance detection, especially for these in positive bags (true...

Geomorphological Analysis Using Unpiloted Aircraft Systems, Structure from Motion, and Deep Learning

We present a pipeline for geomorphological analysis that uses structure ...

SNIPER: Efficient Multi-Scale Training

We present SNIPER, an algorithm for performing efficient multi-scale tra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Remote sensing (RS), as a contactless technique for information collection, is used in a wide range of civil, agricultural, and military applications bai2014vhr ; seelan2003remote ; pelton2017handbook . Since the beginning of earth observation from space, many satellites have been launched into space, which has been used successfully in remote sensing applications. With the development of very high resolution (VHR) imaging equipment, the resolutions of the available images are increased in spatial, spectral, and temporal domains. Given these large and valuable data, automatic analysis of VHR images is increasingly interested.

Automatic object detection in VHR images plays an important role for a wide range of applications and is receiving significant attention in recent years cheng2016survey . In this paper, we focus on the detection of airplanes in VHR images. Although airplane detection has studied from many years ago, it is still a challenging problem because of the complex and cluttered background, airplane appearance and shape variations, the different resolutions of satellite images, and the arbitrary rotation of airplanes.

In order to obtain a rotation-and-scale invariant airplane detection system, we employ the common characteristics of airplanes. An airplane is a man-made object that is seen symmetrically from the top view (Fig. 1). In addition, the boundary of each airplane is a regular shape that can be approximated by a chain of line segments. According to these characteristics, we develop an algorithm to generate proposals based on the symmetric line segments (SLS).

Figure 1: Outline drawing airplane in a flat style (top view)
Figure 2: Schematic view of the proposed airplane detection approach

Although the SLS algorithm can detect almost all airplanes, there are many other regions in the VHR images contain symmetric line segments. Therefore, it is required to refine the generated proposals. In recent years, Convolutional Neural Networks (CNN) has achieved remarkable results in a wide range of computer vision applications

krizhevsky2012imagenet ; simonyan2014very ; szegedy2015going ; he2016deep ; huang2017densely . In this paper, we will use CNN to separate the airplane candidates from the others.

To train a deep convolutional neural network, a significant number of instances with the desired labels are required. Now, a question arises how to label a symmetric line segment? As can be seen in Fig. 1

, an airplane can have several SLS in which some of them are not symmetric around the fuselage. Therefore, we consider all proposals have significant overlap with an airplane as a positive bag in which classifying one of them as positive is sufficient. This problem is known as multiple instance learning (MIL)


in the machine learning literature. In this paper, we modify the training process of the CNN in order to obtain a MIL algorithm.

In the test phase, we use non-maximum suppression (NMS) after CNN to eliminate the redundant detections. The combination of the proposed deep multiple instance learning and non-maximum suppression leads to detect the most common SLS among airplanes. As will be shown in the experiments, the most common SLS among airplanes is formed from two line segments of two wings that are symmetrical about the fuselage. As a result, the direction of the airplanes can be estimated using box-level annotations. The schematic view of the proposed approach is shown in Fig. 2.

The contributions of this paper can be summarized as follows: 1) We introduce a novel proposal generation algorithm for airplanes called symmetric line segments (SLS). 2) We formulate CNN training process as multiple instance learning problems. 3) We estimate the airplane direction using box-level annotations. 4) We validate our framework on NWPU VHR-10 dataset cheng2014multi .

The rest of this paper is organized as follows. A brief review of the existing methods for airplane detection are presented in Section 2. Our proposed approach is presented in Section 3. The experimental results on NWPU VHR-10 dataset are reported in Section 4, and finally the paper is concluded in Section 5.

2 Related Works

Generally, object detection algorithms consist of three main modules: proposal generation, feature extraction, and classification. We discuss these modules as follows.

2.1 Proposal Generation

The goal of this module is to generate a pool of proposal candidates in which some of them correspond to the desired objects. Proposal generation is one of the main differences between the published algorithms, and its precision will directly affect the next steps.

The simplest proposal generator is the sliding window that used in many studies such as sun2012automatic ; redmon2016you ; liu2016ssd . Sliding window is a rectangular region of fixed width and height that slides across an image. The number of generated proposals by this approach is very high, especially if we want to support different scales and aspect ratios.

For reducing the number of generated proposals and making them more meaningful, several alternative algorithms are proposed. Image segmentation is used in bo2010region to combine neighbor pixels and generate homogeneous regions as the proposals. In order to manage objects of different dimensions, multiple segmentations is used in li2012automatic . Moreover, there are many general proposal generators such as Edge Boxes zitnick2014edge , Selective Search uijlings2013selective , BING cheng2014bing , and RIGOR humayun2014rigor . For a more in-depth survey of general proposal generators, we refer the readers to hosang2016makes .

Although the general proposal generators are also useful for airplane detection, the special geometry of the airplanes has led to the development of some special proposal generators. Circle-frequency filter (CFF) is the most famous airplane proposal generator that introduced in kawato2001circle and then used in many works such as cai2006airplane ; gao2013aircraft ; an2014automated ; zhang2015unsupervised . CFF is based on this property of airplane that has two wings and a long fuselage, and it is also symmetrical around the fuselage. Therefore, if an array of pixels is extracted along a circle with a proper center and a proper radius, then the array approximates to a sine curve with period 4. Of course, this property is not valid for airplanes with different colors in the wings and fuselage.

In this paper, we will propose a novel special proposal generator for airplanes based on the symmetric line segments.

Figure 3: Proposal generation based on the symmetric line segments, a) original image, b) line segments detected by LSD, c) line segments nearby a selected line segment, and d) line segments that are symmetrical enough to the reference line segment

2.2 Feature Extraction

The number of false alarms produced by a proposal generator, even the special ones, is usually much more than acceptable. Thus, it is required to eliminate the undesired proposals by a supervised algorithm. In this module, each proposal is represented by a discriminative feature vector. Various geometrical and textural feature extraction methods used in previous works, some of them are reviewed here.

The boundary of an airplane in top view is quite distinguished from other objects. So, simple shape descriptors can be good features if the boundary of the objects extracted carefully. In hosomura2010airplane some simple geometrical features such as area, perimeter, roundness, and aspect ratio are extracted from the boundaries as the features. Some more meaningful geometrical features are used in inglada2007automatic . However, accurate boundary extraction is a challenging problem yet. ,cheng2016object Textural features are used more frequently in this field. Histogram of oriented gradients (HOG) was introduced in dalal2005histograms and used in many airplane detection studies such as an2014automated ; cheng2013object ; cheng2016object . Bag-of-words (BoW) is another useful feature extractor that established by bai2014vhr for airplane detection. Gabor filters, Local Binary Patterns (LBP), and Haar-like features are some other common feature extractors that a review of them is collected in kumar2014detailed .

Despite the progress made in the design of engineered features, their performance plateaued in the recent years. On the opposite side, feature learning algorithms become more popular every day. Feature learning is a set of methods to discover the representations needed for feature detection or classification from raw data automatically. Convolutional neural networks lecun1998gradient ; szegedy2015going ; howard2017mobilenets are among the most successful algorithms for feature learning from images used in zhong2018multi ; deng2018multi ; zou2018random for airplane detection.

2.3 Classification

After extracting the proposals and representing them by feature vectors, the last module is decision making to separate objects by a classifier. Similar to the feature extraction step, neural networks are used widely as the classifier in the airplane detection studies wu2015fast ; zhang2016weakly ; yang2017m ; xu2017deformable

. The other popular classifier is the support vector machine (SVM) employed in

li2011saliency ; cheng2013object ; bai2014vhr ; cheng2014scalable ; cheng2016learning .

3 Proposed Approach

The schematic view of the proposed approach is shown in Fig. 2. As can be observed, this approach consists of three modules discussed in the following sections.

3.1 Proposal Generation

Figure 4: Measuring the symmetry degree of two line segments

Airplanes are man-made objects with a regular and symmetrical boundary from the top view that can be approximated by a chain of line segments. Based on these properties, we propose a novel proposal generator for airplanes called symmetric line segments (SLS). The stages of SLS are shown in Fig. 3. The first stage of SLS is line detection. Hough transform duda1972use is a widely used tool for line detection. The gradient direction is not considered in Hough transform, and for this reason, it cannot detect small and noisy line segments. On the other hand, line segment detector (LSD) von2010lsd is a newer tool that considered the gradient direction. In Fig. 3, the line segments detected by LSD is drawn.

The next stage is to extract pairs of line segments that can belong to an airplane. Desired pairs of line segments have close endpoints and are symmetric with respect to the main axis of the airplane (i.e., fuselage direction). Thus, for a candidate line segment (the blue line segment in Fig. 3), we first select other line segments that at least one of their endpoints are in a limited distance from the endpoints of the candidate line segment. Then, we omit the pairs that are not sufficiently symmetrical as Fig. 3. The symmetry of the two line segments is quantified as follows.

Figure 5: Sample proposals generated by the SLS method

Let us consider and as the endpoints of the first line segment, and similarly and for the second one. An essential property of LSD is specifying the line segment direction based on the gradient direction. Thus, for an appropriate pair, and should be the mirrors of and , respectively. The symmetry axis for these line segments is defined as a line that passes from the points and . Then, the line segments are mirrored about the symmetry axis and the corresponding endpoints are called , , , and , respectively. A symbolic display of this concept is shown in Fig. 4. Finally, the relative euclidean distance of the original and mirrored endpoints are considered as the measure of symmetry:


where is L2 norm (Euclidean distance). The pairs with are retained (Fig. 3). As can be observed, one line segment may be present in more than one proposal or in none of them.

This proposal generator is rotation-and-scale invariant; so, can be used effectively in aerial imagery with different resolutions. Some positive and negative proposals are shown in Fig. 5.

3.2 Feature Extraction and Classification

Figure 6: Sample cropped and resized proposals

To extract rotation-and-scale invariant representations from the SLS proposals, we crop a rotated square for each proposal with the following parameters:


where , , , and are the endpoints of the line segments. Then, each crop is resized to fixed size of that some samples are depicted in Fig. 6. To train the CNN, one needs to assign a label to each proposal. We use the intersection over union (IoU) measure where shown in Fig. 7. Intersection and union regions of the rectangles of Fig. 7 are shown in Fig. 7 and Fig. 7 with red and cyan colors, respectively. Intersection area divided by union area leads to a normal quantity known as IoU. We label each proposal based on the following intervals:

Figure 7: Computation of intersection over union (IoU)

In the training phase, labels and are used as the airplane and other objects, respectively. Labels are not used in the training phase. Some proposals with the corresponding labels are shown in Fig. 8.

Figure 8: Labeling proposals based on the intersection over union with the ground truth bounding box (blue: ground truth, green: label +1, yellow: label -1, red: label 0)

Another point in the training phase is the possibility of having more than one proposal for each airplane. Moreover, due to the weakly labeling of the targets (box-level annotations), it is possible that some of the proposals with label +1 be undesirable in which the symmetry axis is not same as the main axis of the airplane. Therefore, it is inevitable to let the CNN misclassify some of the proposals with label +1. For this purpose, we propose the following training strategy.

All proposals highly overlapped with one ground truth bounding box are considered as a unique positive bag. On the other hand, all proposals with label 0 are considered as the negative samples. In each iteration, a batch is formed by some negative samples or one positive bag (we call them as negative iteration and positive iteration where applied alternatively). In a negative iteration, all of the samples are used to update the network parameters based on the following focal loss:



is the model’s estimated probability for the airplane class. We used the focal loss defined in

lin2017focal with due to the large number of simple negative samples. On the other hand, in a positive iteration, at least one sample should be classified as airplane. Thus, we define the following loss for the instances of a positive bag:


where is the cross entropy loss, and is the minimum for the instances of this bag. In other words, we bypass the instances with a relatively large loss from training. By this approach, in the first iterations that the model is not converged, will be a large number and subsequently, almost all instances in a positive bag are involved in the training. While the model tends to converge, will decrease and undesired positive instances will removed from the training. As reported in section 4, the trained model rejects about 55% of the positive instances but less than 2% of positive bags.

3.3 Non-Maximum Suppression

In the test phase, all generated proposals are fed to the network and the estimated scores are computed. However, as mentioned in the previous section, it is possible to have positive proposals with high intersection. Thus, it is required to merge the results. For this purpose, we select the detected square with the highest score and remove all of the other detections that have IoU more than 0.4 with this square. Then, the next most possible proposal is selected to remove its similar squares, and this action iteratively performed.

The non-maximum suppression (NMS) approcah is shown in Fig. 9. The detected squares by the network are shown in Fig. 9. The highest score square is drawn in Fig. 9 with green color and the removing squares with red color. The next iteration is depicted in Fig. 9.

Figure 9: Non-maximum suppression procedure

4 Experimental Validations

In this section, we first review the dataset used in our experiments. Then, the implementation details and the evaluation method are presented. Finally, the results are reported quantitatively and qualitatively.

4.1 Dataset

To evaluate the performance of the proposed approach for airplane detection in VHR remote sensing imagery, we use the publicly available NWPU VHR-10 dataset cheng2014multi . NWPU VHR-10 is a collection of 800 very high resolution optical remote sensing images with 10 class objects that are annotated in box-level. These classes are airplane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, and vehicle. In this paper, we focus on the detection of airplanes that existed in 90 images with 757 instances.

Two of the challenges in this dataset are the variety of scale and direction of the airplanes. In aerial images, the direction of objects is not controllable and the algorithm should be rotation-invariant. Moreover, size of the airplanes and the resolution of imaging systems may be different. In NWPU VHR-10 dataset, 715 color images were acquired from Google Earth with the spatial resolution ranging from 0.5 to 2 m, and 85 pansharpened color infrared images were acquired from Vaihingen data with a spatial resolution of 0.08 m. In order to show the variation in the length of the airplanes, we approximate the length of each airplane with the geometric mean of its bounding box width and height. The histogram of 757 airplanes approximated lengths is plotted in Fig.

10. The minimum and maximum lengths are 30 and 126 pixels with ratio 4.2.

Figure 10: Histogram of the airplanes lengths in NWPU VHR-10 dataset

4.2 Implementation Details

Figure 11: The convolutional neural network architecture used for feature extraction and classification 111This figure is generated by adapting the code from

The proposed approach consists of three modules depicted in Fig. 2. The first module is the SLS proposal generator. In SLS, we use LSD algorithm for line segment detection which works only on gray-scale images. Hence, it is required to convert the RGB images to gray-scale ones, but what is the best RGB to gray conversion? In RGB to gray conversion, colors are mapped to only numbers (on average, every colors are mapped to the same number). Therefore, some of the boundaries clearly visible in the RGB image may disappear in the new gray space. For this reason, we run the LSD and the followed extraction of the symmetric line segments in each of the channels individually and then put all proposals together. This will increase the number of proposals (roughly 3 times) and, of course, the recall value.

The second module is the supervised classification of the proposals. As mentioned, we use the well known deep convolutional neural network for feature extraction and classification. In the recent years, several different architectures are proposed to use in the classification tasks. We employ the convolutional partition of VGG16 architecture simonyan2014very

for feature extraction followed by three fully connected layers with 256, 256, and 2 neurons (Fig.

11). As is well known, 757 positive bags are insufficient for training such a network with

trainable parameters. So, we use the convolutional layers of VGG16 trained on the ImageNet dataset


contained more than 14 million images. Then, the pre-trained convolutional layers and the randomly-initialized fully connected layers are domain-specific fine-tuned for airplane detection. For implementation, we use the Keras API

chollet2015keras and Adam optimizer kingma2014adam .

Non-maximum suppression is the third module used only in the test phase. In this module, an iterative procedure is done in which the best proposal with the maximum score is selected and the overlapped proposals with IoU greater than 0.4 are removed.

4.3 Evaluation Method

To evaluate the performance of the proposed approach, we use 3-fold cross-validation method. In this validation method, the dataset is split into three folds. Then, three independent experiments are done in which two folds used for training of the model and the held-out used for validation. However, since objects in an aerial image have similar resolution, brightness, angle of view, and geographic characteristics, folding is done at the image level to achieve fair results. In other words, each fold consists of all proposals generated from one-third of the images (i.e., 30 images). This allows us to investigate how well our system works on new, unseen images.

We use , , and score to evaluate the trained model with the following equations:


where TP (true positive) and FN (false negative) are the numbers of detected and missed airplanes, respectively. Also, FP (false positive) is the number of other objects that are mistakenly identified as an airplane. is the fraction of relevant instances that have been detected over the total amount of airplanes, while is the fraction of relevant instances among the detected instances. Both and take only one side of the algorithm, while considers both of them to compute the overall score.

Another evaluation metric widely used in the object detection domain is the average precision (

). The computes the average value of the over the interval from = 0 to = 1 (i.e., the area under the curve).

4.4 Performance Evaluation

Fold Step Pos. Neg. Ind. TP FN FP Recall Precision F1


Fold 1 SLS 5044 197465 1154 255 0 203408
CNN 2510 34 25 252 3 2317
NMS 252 5 0 252 3 5 0.988 0.981 0.984


Fold 2 SLS 4593 190284 1060 251 0 195686
CNN 2063 4 9 246 5 1830
NMS 246 0 0 246 5 0 0.980 1.000 0.990


Fold 3 SLS 5082 195085 1360 251 0 201276
CNN 2126 10 17 247 4 1906
NMS 246 3 0 246 5 3 0.980 0.988 0.984


Total SLS 14719 582834 3574 757 0 600370
CNN 6699 48 51 745 12 6053
NMS 744 8 0 744 13 8 0.983 0.989 0.986


  • * Pos.: Positive Instances, Neg.: Negative Instances, Ind.: Indeterminate Instances

Table 1: Results of the modules of the proposed approach for airplane detection on NWPU VHR-10 dataset

The detailed evaluation results of the modules of the proposed approach are reported in Table 1. In this table, positive, negative and indeterminate instances are the number of instances with labels , , and , respectively. TP is the number of ground truth airplanes with at least one correspond positive instance. FN is the number of ground truth airplanes with no correspond positive instance. FP is the number of all instances minus TP (i.e., from the corresponding positive instances with one ground truth, only one is considered as true positive and the others are counted as false positive).

As can be observed in Table 1, for all airplanes with different scales and directions, at least one appropriate candidate is extracted by SLS proposal generator (on average about 19 positive instances for each airplane). These positive instances are formed from different symmetric line segments in an airplane in three channels. Of the positive instances, some are properly along the fuselage and others are not in that direction (this challenge is especially due to the box-level annotation of the ground truth). Because of this, we proposed to use the MIL idea for the training of the network so that the network can learn only one positive instance from all positive instances correspond to one airplane. By comparing the first and second rows in each fold, it can be seen that about of the positive instances are rejected by CNN, but only less than of the airplanes are missed.

Non-maximum suppression (NMS) is the last module of the proposed approach, which is very important due to the multiple detections of each airplane. From the results reported in Table 1, NMS step has worked well and only one airplane is missed in this step.

Altogether, more than of the airplanes are detected by the proposed approach while only false alarms are produced (i.e., ). Some detection results of the proposed approach is shown in Fig. 12. As can be observed, in addition to the high performance of the proposed algorithm for detecting airplanes, for most of them, the direction is correctly estimated. As a result, one of the main advantages of the proposed algorithm is the ability to estimate airplane direction using box-level annotations.

Figure 12: Sample results of the proposed airplane detection algorithm on NWPU VHR-10 dataset.

4.5 Comparison with Previous Works

Unfortunately, the experimental setups of the airplane detection algorithms in the literature are very different that makes it difficult to compare them fairly. For example, the datasets used in yu2015rotation ; yu2016rotation ; long2017accurate are collected by the authors and are not publicly available. Also, different evaluation measures such as and reported. In addition, various data partitioning methods such as k-fold and holdout used.

Considering these differences, the reported results of some previous works on NWPU VHR-10 dataset, as well as our results, are presented in Table 2. According to this table, our results are promising and are among the best ones for airplane detection. It is notable that in deng2018multi the holdout method is used (i.e., 60% of images for training and only 40% for validation) while we used 3-fold cross validation. So, the selected images for validation in the holdout method are crucial.

Using box-level annotations, most of the existing studies focus on detecting the bounding box of airplanes. In contrast, as a significant advantage, our algorithm can estimate the direction of airplanes.

Method TP FN FP Recall Prec. F1 AP


cheng2016object - - - - - - 0.663
xu2017deformable - - - - - - 0.873
cheng2016learning - - - - - - 0.884
zhong2018multi - - - - - - 0.907
zou2018random - - - - - - 0.941
deng2018multi - - - - - - 0.987
qiu2017occluded - - - - - 0.912 -
qiu2017automatic - - - - - 0.917 -
qiu2018unified - - - - - 0.920 -
Ours 744 13 8 0.983 0.989 0.986 0.973
  • * Prec.: Precision

Table 2: Comparison with previous works on NWPU VHR-10 dataset

5 Conclusion

In the paper, we presented a rotation-and-scale invariant airplane detection algorithm that consists of three main modules: proposal generation, deep classification, and non-maximum suppression.

For proposal generation, we introduced a new method based on the symmetric line segments. In this method, we employed the common properties of the boundaries of the airplanes from the top view that can be approximated by a chain of symmetric line segments. As reported in the experimental results, this method is promising for detection of airplanes with arbitrary direction and in a wide range of scales. However, in addition to airplanes, there are also other symmetric line segments in images that need to be filtered by the next modules.

Due to the box-level annotations of NWPU VHR-10 dataset, which is much simpler and more practical than pixel-level annotations, the separation of the symmetric line segments that are properly in the direction of the airplane is not possible from other symmetric line segments in the ground truth box. So, in the second module, we modified the training process of deep convolutional neural networks through the idea of multiple instance learning. More precisely, we allow the network to learn at least one proposal from the proposals corresponds to one ground truth airplane. This idea has been successfully implemented, and as a significant result, the direction of the most of the airplanes have been correctly estimated without being specified in the dataset.

Since each airplane may have more than one appropriate symmetric line segment, as the third module in the test phase, we used the non-maximum suppression algorithm.

Experiments conducted on NWPU VHR-10 dataset show that the proposed algorithm detects 744 airplanes from 757 ones, and gives only 8 false alarms. In other words, the quantitative parameters are: , , , , , , and .

As a future work, we are developing algorithms to extract extra parameters (i.e., not specified in the dataset such as the direction), in an end-to-end deep learning style.


  • (1) X. Bai, H. Zhang, J. Zhou, Vhr object detection based on structural feature extraction and query expansion, IEEE Transactions on Geoscience and Remote Sensing 52 (10) (2014) 6508–6520.
  • (2) S. K. Seelan, S. Laguette, G. M. Casady, G. A. Seielstad, Remote sensing applications for precision agriculture: A learning community approach, Remote Sensing of Environment 88 (1-2) (2003) 157–169.
  • (3) J. N. Pelton, S. Madry, S. Camacho-Lara, Handbook of satellite applications, Springer Publishing Company, Incorporated, 2017.
  • (4) G. Cheng, J. Han, A survey on object detection in optical remote sensing images, ISPRS Journal of Photogrammetry and Remote Sensing 117 (2016) 11–28.
  • (5) A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.
  • (6) K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
  • (7)

    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.

  • (8) K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • (9) G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269. doi:10.1109/CVPR.2017.243.
  • (10)

    J. Amores, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence 201 (2013) 81–105.

  • (11) G. Cheng, J. Han, P. Zhou, L. Guo, Multi-class geospatial object detection and geographic image classification based on collection of part detectors, ISPRS Journal of Photogrammetry and Remote Sensing 98 (2014) 119–132.
  • (12) H. Sun, X. Sun, H. Wang, Y. Li, X. Li, Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model, IEEE Geoscience and Remote Sensing Letters 9 (1) (2012) 109–113.
  • (13) J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  • (14) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37.
  • (15) S. Bo, Y. Jing, Region-based airplane detection in remotely sensed imagery, in: Image and Signal Processing (CISP), 2010 3rd International Congress on, Vol. 4, IEEE, 2010, pp. 1923–1926.
  • (16) Y. Li, X. Sun, H. Wang, H. Sun, X. Li, Automatic target detection in high-resolution remote sensing images using a contour-based spatial model, IEEE Geoscience and Remote Sensing Letters 9 (5) (2012) 886–890.
  • (17) C. L. Zitnick, P. Dollár, Edge boxes: Locating object proposals from edges, in: European conference on computer vision, Springer, 2014, pp. 391–405.
  • (18) J. R. Uijlings, K. E. Van De Sande, T. Gevers, A. W. Smeulders, Selective search for object recognition, International journal of computer vision 104 (2) (2013) 154–171.
  • (19)

    M.-M. Cheng, Z. Zhang, W.-Y. Lin, P. Torr, Bing: Binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3286–3293.

  • (20) A. Humayun, F. Li, J. M. Rehg, Rigor: Reusing inference in graph cuts for generating object regions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 336–343.
  • (21) J. Hosang, R. Benenson, P. Dollár, B. Schiele, What makes for effective detection proposals?, IEEE transactions on pattern analysis and machine intelligence 38 (4) (2016) 814–830.
  • (22) S. Kawato, N. Tetsutani, Circle-frequency filter and its application, in: ITE Technical Report 25.12, The Institute of Image Information and Television Engineers, 2001, pp. 49–54.
  • (23) H. Cai, Y. Su, Airplane detection in remote sensing image with a circle-frequency filter, in: International Conference on Space Information Technology, Vol. 5985, International Society for Optics and Photonics, 2006, p. 59852T.
  • (24) F. Gao, Q. Xu, B. Li, Aircraft detection from vhr images based on circle-frequency filter and multilevel features, The Scientific World Journal 2013.
  • (25) Z. An, Z. Shi, X. Teng, X. Yu, W. Tang, An automated airplane detection system for large panchromatic image with high spatial resolution, Optik-International Journal for Light and Electron Optics 125 (12) (2014) 2768–2775.
  • (26)

    W. Zhang, W. Lv, Y. Zhang, J. Tian, J. Ma, Unsupervised-learning airplane detection in remote sensing images, in: MIPPR 2015: Remote Sensing Image Processing, Geographic Information Systems, and Other Applications, Vol. 9815, International Society for Optics and Photonics, 2015, p. 981503.

  • (27) T. Hosomura, et al., Airplane extraction from high resolution satellite image using boundary feature, in: Proc. of ISPRS Technical Com. VIII Symposium, 2010.
  • (28) J. Inglada, Automatic recognition of man-made objects in high resolution optical remote sensing images by svm classification of geometric image features, ISPRS journal of photogrammetry and remote sensing 62 (3) (2007) 236–248.
  • (29) N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, Vol. 1, IEEE, 2005, pp. 886–893.
  • (30) G. Cheng, J. Han, L. Guo, X. Qian, P. Zhou, X. Yao, X. Hu, Object detection in remote sensing imagery using a discriminatively trained mixture model, ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 32–43.
  • (31) G. Cheng, P. Zhou, X. Yao, C. Yao, Y. Zhang, J. Han, Object detection in vhr optical remote sensing images via learning rotation-invariant hog feature, in: Earth Observation and Remote Sensing Applications (EORSA), 2016 4th International Workshop on, IEEE, 2016, pp. 433–436.
  • (32) G. Kumar, P. K. Bhatia, A detailed review of feature extraction in image processing systems, in: Advanced Computing & Communication Technologies (ACCT), 2014 Fourth International Conference on, IEEE, 2014, pp. 5–12.
  • (33) Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
  • (34) A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861.
  • (35) Y. Zhong, X. Han, L. Zhang, Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery, ISPRS Journal of Photogrammetry and Remote Sensing 138 (2018) 281–294.
  • (36) Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, H. Zou, Multi-scale object detection in remote sensing imagery with convolutional neural networks, ISPRS Journal of Photogrammetry and Remote Sensing.
  • (37) Z. Zou, Z. Shi, Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images, IEEE Transactions on Image Processing 27 (3) (2018) 1100–1111.
  • (38) H. Wu, H. Zhang, J. Zhang, F. Xu, Fast aircraft detection in satellite images based on convolutional neural networks, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015, pp. 4210–4214.
  • (39)

    F. Zhang, B. Du, L. Zhang, M. Xu, Weakly supervised learning based on coupled convolutional neural networks for aircraft detection, IEEE Transactions on Geoscience and Remote Sensing 54 (9) (2016) 5553–5563.

  • (40) Y. Yang, Y. Zhuang, F. Bi, H. Shi, Y. Xie, M-fcn: Effective fully convolutional network-based airplane detection framework, IEEE Geosci. Remote Sens. Lett 14 (8) (2017) 1293–1297.
  • (41) Z. Xu, X. Xu, L. Wang, R. Yang, F. Pu, Deformable convnet with aspect ratio constrained nms for object detection in remote sensing imagery, Remote Sensing 9 (12) (2017) 1312.
  • (42) Z. Li, L. Itti, Saliency and gist features for target detection in satellite images, IEEE Transactions on Image Processing 20 (7) (2011) 2017–2029.
  • (43) G. Cheng, J. Han, P. Zhou, L. Guo, Scalable multi-class geospatial object detection in high-spatial-resolution remote sensing images, in: Geoscience and Remote Sensing Symposium (IGARSS), 2014 IEEE International, IEEE, 2014, pp. 2479–2482.
  • (44) G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 54 (12) (2016) 7405–7415.
  • (45) R. O. Duda, P. E. Hart, Use of the hough transformation to detect lines and curves in pictures, Communications of the ACM 15 (1) (1972) 11–15.
  • (46) R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, G. Randall, Lsd: A fast line segment detector with a false detection control, IEEE transactions on pattern analysis and machine intelligence 32 (4) (2010) 722–732.
  • (47) T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, arXiv preprint arXiv:1708.02002.
  • (48) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, Ieee, 2009, pp. 248–255.
  • (49) F. Chollet, et al., Keras, (2015).
  • (50) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980.
  • (51) Y. Yu, H. Guan, Z. Ji, Rotation-invariant object detection in high-resolution satellite imagery using superpixel-based deep hough forests, IEEE Geoscience and Remote Sensing Letters 12 (11) (2015) 2183–2187.
  • (52) Y. Yu, H. Guan, D. Zai, Z. Ji, Rotation-and-scale-invariant airplane detection in high-resolution satellite images based on deep-hough-forests, ISPRS Journal of Photogrammetry and Remote Sensing 112 (2016) 50–64.
  • (53) Y. Long, Y. Gong, Z. Xiao, Q. Liu, Accurate object localization in remote sensing images based on convolutional neural networks, IEEE Transactions on Geoscience and Remote Sensing 55 (5) (2017) 2486–2498.
  • (54) S. Qiu, G. Wen, Y. Fan, Occluded object detection in high-resolution remote sensing images using partial configuration object model, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (5) (2017) 1909–1925.
  • (55) S. Qiu, G. Wen, Z. Deng, Y. Fan, B. Hui, Automatic and fast pcm generation for occluded object detection in high-resolution remote sensing images, IEEE Geosci. Remote Sens. Lett 14 (2017) 1730–1734.
  • (56) S. Qiu, G. Wen, J. Liu, Z. Deng, Y. Fan, Unified partial configuration model framework for fast partially occluded object detection in high-resolution remote sensing images, Remote Sensing 10 (3) (2018) 464.