Domain Adaptive Generation of Aircraft on Satellite Imagery via Simulated and Unsupervised Learning

06/08/2018 ∙ by Junghoon Seo, et al. ∙ Satrec Initiative Co., Ltd. 0

Object detection and classification for aircraft are the most important tasks in the satellite image analysis. The success of modern detection and classification methods has been based on machine learning and deep learning. One of the key requirements for those learning processes is huge data to train. However, there is an insufficient portion of aircraft since the targets are on military action and oper- ation. Considering the characteristics of satellite imagery, this paper attempts to provide a framework of the simulated and unsupervised methodology without any additional su- pervision or physical assumptions. Finally, the qualitative and quantitative analysis revealed a potential to replenish insufficient data for machine learning platform for satellite image analysis.



There are no comments yet.


page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Satellite image analysis consists of various computer vision and machine learning techniques. In particular, object detection and classification play key roles in the satellite image analysis. For the most part, automatic target recognition has tended to center around the question on machine learning

[1]. However, crucial targets are too sparse to be observed because of military operations and actions. For example, the military aircraft as primary targets could be operated rarely in present or be hidden under air-raid shelter.

In order to increase the scarce data onto satellite imagery, numerous studies have attempted to generate synthetic aircraft images. Previous study can be categorized into three approaches: (i) radiometric process [9], (ii) graphics-based models on optics [7], and (iii) on atmospheric science [6]. Despite the efforts of continuous studies, previous models have limitation to apply for a wide range of situations. Those studies depend on too rigid physical assumption and specific condition, not on real data.

This paper aims to provide an alternative framework for aircraft simulation on satellite imagery. The proposed method would neglect to follow any assumption of physical phenomenon which is hard to be perfectly modeled or appropriately reproduced. During the adversarial training process, the refiner learns how to generate a real-like satellite imagery from synthetic aircraft. The remainder of this paper will illustrate the foregoing remarks by considering the simulation model and both qualitative and quantitative results.

2 Proposed Method

Figure 1: Overview of our proposed method. (a) overlay of aircraft images on satellite imagery. (b) adversarial learning to refine the synthesized image

2.1 Overlay satellite imagery with aircraft images

Object simulation on the satellite image is treated as a difficult task because of an extremely wide diversity of data rather than object deformation. Moreover, it is usual that quality of images with which the image analysis platform deals is overwhelmingly low. Therefore, we decide not to elaborately concern about a delicate 3D CAD model of aircraft.

This step of our pipeline is shown in Fig. 1 (a). Various near-top-down view images are crawled from web, then crop the background of them. Next, we just overlaid on valid background of a satellite image. These images are much easier to be obtained than CAD models and usage of them makes us free from any bothersome graphical rendering. The only augmentation method of aircraft image is a rotation.

2.2 Simulated and Unsupervised Adversarial Learning

Right after putting aircraft on background, it looks artificial because a visual correlation between object and background is not considered at all. For the harmonious synthesis considering both aircrafts’ objectness and background, we adopt a simGAN model [10] derived from a generative adversarial network [3]. Fig. 1

(b) shows that two different neural network models, which are called

refiner and discriminator, are trained in adversarial concept.

Suppose there are two sets of samples and where and is sampled from the source domain image and the target domain image, respectively. The goal of the refiner is to generate the synthetic image , which deceives the discriminator

into classifying it as real image while keeping the pixels as same with

’s as possible. On the other hand, the discriminator aims to classify the synthetic image as fake and the real image as real. The overall refiner loss and discriminator loss are defined as follows:


where is a hyper-parameter of the weights for the identity mapping. By using gradient descent method in training step, and are updated alternately to minimize and , respectively. Finally, a set of the refined images could be generated from the refiner, which is real-like but similar to the original . In our task, is a set of the synthetic images fake aircraft are overlaid on, and is a set of the real images authentic aircraft appear on.

3 Experiments and Discussions

Figure 2: Results from our method and their visualization on t-SNE. (a) Four synthetic samples before and after refinement. (b) Visualization on t-SNE. ’Ex’ mark () refers mean of embedded manifold of each set ((, , )). ’Circle’ () and ’arrow’ () marks signify the example images of (a) and matchings between the synthetic and refined pair, respectively.

3.1 Experiment Details

We collected RGB satellite imagery using Google Earth Pro 7.1 [4]. The dataset includes 8,604 real satellite image patches which include at least one combat aircraft, and 2,917 fake image patches which include overlaid near-top-down aircraft image. All aircraft models of the overlaid images are totally different from those of the collected images.

We substitute our neural network architecture with a one-way version of [12], which is said that its architecture is appropriate to solve style transfer task. is set as . On training step, batch size is one and training is over after 180k step. In evaluation step, is sampled independently and identically from

without replacement. Setting of the other hyperparameters is the almost same as those in


3.2 Qualitative Evaluation

Refinement results are shown in Fig. 2 (a). After the refinement, the aircraft look much more natural than those in the original synthetic images. To examine the visual results analytically, we apply t-SNE [8]. We select pre-trained VGG19 [11] which is already proven to be useful in neural style transfer task [2]. fc6

layer feature vectors

, , and are extracted from , , and , respectively.

Fig. 2 (b) shows a visualization of the result from t-SNE. The average of is closer to than to and it makes roughly a conjecture that the domain difference between the refined and the real is smaller than that between the synthetic and the real. Additionally, it appears consistent that each example is shown as a point also moves closer to the distribution of the real image after the refinement.

3.3 Quantitative Evaluation

Maximum mean discrepancy (MMD) [5]

is one of test statistics for measurement of the difference between two distributions. A mixture of 16 Gaussian radial basis functions (RBF) kernels, where sigma varies from

to , is considered to be associated continuous kernel. To avoid excessive or

time complexity, a linear time unbiased estimate of MMD is used


(i) (ii) (iii)
0.3329 0.3423 0.2300
Table 1: Comparisons of maximum mean discrepancy between each image pair

We report three MMD values among , , in Table 1. It is worthy to note that (iii) is the smallest. It makes us infer that domain difference between and is even lower than that between the other pairs, i.e. and , or and . The result shows that the proposed method effectively reduces the gap between synthetic image and real image in terms of quantitative analysis.

4 Conclusions

In this paper, we introduced our method to build up the graphics-free simulation for aircraft in satellite imagery. Our approach is based on data-dependent simulated and unsupervised learning method so it could be freely adaptable to any condition of the similar tasks. The experiment shows meaningful qualitative and quantitative performance. In future work, we will focus on improving the performance of the classification and detection method of our satellite image analysis platform using refined image data. We expect that our method will contribute to the field of remote sensing, especially in data generation for automatic target recognition.


  • [1] G. Cheng and J. Han. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117:11–28, 2016.
  • [2] L. A. Gatys, A. S. Ecker, and M. Bethge.

    Image style transfer using convolutional neural networks.


    Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on

    , pages 2414–2423. IEEE, 2016.
  • [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • [4] N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202:18–27, 2017.
  • [5] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012.
  • [6] S. Han, A. Fafard, J. Kerekes, M. Gartley, E. Ientilucci, A. Savakis, C. Law, J. Parhan, M. Turek, K. Fieldhouse, et al. Efficient generation of image chips for training deep learning algorithms. In Automatic Target Recognition XXVII, volume 10202, page 1020203. International Society for Optics and Photonics, 2017.
  • [7] E. J. Ientilucci and S. D. Brown. Advances in wide-area hyperspectral image simulation. In Targets and Backgrounds IX: Characterization and Representation, volume 5075, pages 110–122. International Society for Optics and Photonics, 2003.
  • [8] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
  • [9] J. R. Schott, S. D. Brown, R. V. Raqueno, H. N. Gross, and G. Robinson. An advanced synthetic image generation model and its application to multi/hyperspectral algorithm development. Canadian Journal of Remote Sensing, 25(2):99–111, 1999.
  • [10] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2242–2251, 2017.
  • [11] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
  • [12] J. Zhu, T. Park, P. Isola, and A. A. Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2242–2251, 2017.