Proper communication at both the public and personal level is key to the healthy development of human civilization. Over the years the means of communication have evolved, and in the present day the Internet is the most popular and important platform for communication. Many social media systems have developed using the Internet, and they provide a very cheap and effective way to express and shares one’s ideas with the rest of the world. While an effective communication system for sharing information could help us become more informed and connected as a society, it could also be used to spread misinformation to achieve a nefarious objective. Hence, it is of paramount importance that we verify and authenticate the shared data on these systems.
While there are many ways of communicating ideas, such as speech, symbols, and written text, images are today one of the most popular means. Unfortunately, manipulating images has become very easy. Tools such as GIMP and Photoshop can be used to manipulate images in a wide variety of ways, and they are easily accessible to the general public. To address this problem, the forensic community has developed a wide variety of tools to detect various kinds of image forgeries [16, 15, 19]. While most of the images shared on the internet come from consumer cameras and smart-phones, other types of imagery such as satellite images are also very important in business and government applications and thus pose new problems for the forensic community [5, 18].
With the increase in the number of satellites equipped with imaging sensors and the technological advancements made in satellite imaging technology, high resolution images of the ground are becoming popular. It is now possible to not only access these overhead images from public websites  but also to buy custom satellite imagery of specific locations. Just like any other image, satellite images can also be doctored. While the forensic community has been developing tools to address forgeries of all types, they have been biased towards imagery captured from consumer cameras and smartphones [3, 11, 6, 4]. The nature of acquisition of satellite imagery is quite different from that of images from consumer cameras hence the importance that forensic tools be developed that specifically target satellite imagery.
In the recent years, some methods [9, 21, 2] for satellite image forgeries have been developed. In  Ho et al. have proposed an active forensic method based on watermarks to verify the authenticity of a satellite image. While watermarks are an effective way of ascertaining whether an image is forged or not, their absence renders such methods ineffective. In 
Ali et al. have proposed a passive method based on machine learning to detect inpainting in satellite images. Yarlagadda et al.
have proposed a method based on deep learning to detect splicing in satellite images. They employ Generative Adversarial Networks (GANs)[8, 7] to learn a compact representation of pristine satellite images and use it to detect splicing of various sizes.
In this paper, we discuss the detection and localization of splicing in satellite images. Splicing refers to replacement of pixels of a region of the image to add or remove an object. We employ a Conditional Generative Adversarial Network (cGAN) to learn a mapping from a satellite image to its splicing mask. The trained cGAN operates on a satellite image of interest and outputs a mask of the same resolution that is indicative of the likelihood of a pixel belonging to a spliced region. Our cGAN’s architecture is an extension of the popular pix2pix. Differently from , we learn a direct mapping from an image to its forgery mask, rather than operating in a one-class fashion. To achieve this, we provide both pristine and spliced images to train our model, while the authors in  only use pristine images for training as they are trying to learn a compact representation of the pristine data and use it to identify forgeries.
We use the dataset proposed in  to validate our method. We report both the localization and detection performance.
2 Problem Formulation
We investigate the following two specific objectives in this paper: forgery detection and localization. Detection refers to the goal of determining if an RGB satellite image has been modified via splicing. It is a binary classification problem where images can be considered forged, if they have been modified, or pristine, if not. Localization refers to the image segmentation goal of identifying each pixel in a forged image that belongs to the spliced entity, otherwise known as the forgery. These goals are defined in a similar manner to those outlined in .
Forgery masks are used to help us visualize and determine the outcomes for these objectives. For an image , a forgery mask of the same dimensions shows the forgery in (if it exists). In other words, for a satellite image where specifies the coordinate location of a pixel in , the corresponding forgery mask is comprised of values defined as
Therefore, the shape, size, and location of a forgery in an image can be ascertained from the mask if it contains white pixel values (i.e., 255). At an extreme, an entirely white mask indicates that every pixel in has been manipulated, whereas an entirely black mask represents a pristine image.
Our approach is to train a cGAN to create
, an estimate of the forgery mask. is considered doctored if , meaning that a forgery is detected in it and is comprised of the pixels located at . On the other hand, the image is considered pristine if no forgery is detected, indicated by 0. Examples of satellite images and their corresponding ground truth forgery masks can be seen in Figure 1.
In this section we describe our technique for splicing detection and localization. Additional details about the general cGAN concepts reported in this section can be found in . We train our cGAN on both pristine and forged images to learn a mapping from an input image to a forgery mask . It consists of two parts: a generator G and a discriminator D. Figure 2 shows the overall cGAN architecture.
The generator G has a 16-layer U-net architecture (8 encoder layers, 8 decoder layers) with skip connections . When G is presented with an image , it computes an estimated forgery mask , defined as . The generator’s objective is to create that is close to the true . Meanwhile, the discriminator D is trained to differentiate between the true input-mask pairs , and synthesized input-mask pairs
coming from the generator. In a cGAN, the generator and the discriminator are coupled through a loss function. During the course of training the discriminator forces the generator to produce masks that are not only close to the ground truth but also good enough that the discriminator cannot distinguish them from the ground truth thus making the generator do a better job.
The discriminator D has an architecture of a 5-layer CNN that does binary classification on masks. Sometimes, a true image-mask pair , is presented to D. Other times, an image-mask estimate pair is presented. In both cases, the image under analysis is presented to the discriminator D along with either a true forgery mask or a synthesized forgery mask . D
divides the input into patches of size 70x70 pixels. It then classifies each patch as forged or pristine, assigning labels 0 and 1 respectively. The values for all of the patches are averaged to determine the classification for the entire input. The following equations describe the two cases outlined in this paragraph:
The generator G and the discriminator D compete in a min-max game, training and improving each other over time. The coupled loss function of the network is described in the following equation:
So far, we have described a network in which the generator G learns to create masks that could be mistaken for real forgery masks by D. However, this does not ensure that the synthesized masks will correctly show forgeries in images. For example, may “fool” D and be classified as an authentic mask for without resembling its ground truth mask. In such a case, . Therefore, we impose an additional constraint on the generator so that it learns to reconstruct the ground truth masks of training images, i.e., . This can be achieved by training G to minimize reconstruction loss between and . Since our task is to primarily classify every individual pixel into two classes (i.e., forged or pristine), we choose to be a binary cross-entropy (BCE) loss term. This is different with respect to the classic pix2pix which uses as loss term . We later on verify in our experiments that BCE is indeed a better choice over . The total loss function of the cGAN is denoted as:
Once training is complete, the generator G is capable of producing masks that are realistic and close to . To test new images under analysis, the discriminator is not considered, and the generator is used to produce mask estimates.
4 Experimental Validation
In this section, we report the details of our experiments. First, we describe the image dataset. Next, training strategies are discussed. Finally, we present experimental results and analysis.
We utilized the dataset presented in  for our experiments. It contains color images of overhead scenes from a satellite and their corresponding ground truth forgery masks. Each image-mask pair is defined as , and has resolution pixels. The images were adapted from ones originally provided by the Landsat Science program [12, 13] run jointly by NASA  and US Geological Survey (USGS) . To create forged images, objects such as airplanes and clouds were spliced into some of the images at random locations. These doctored images fall into one of three size categories (small, medium, or large) based on the approximate dimensions of the forgery they contain relative to the patch dimensions (70x70 pixels) used by the discriminator D to analyze a mask. Small forgeries are approximately 32x32 pixels; medium forgeries are approximately 64x64 pixels, and large forgeries are approximately 128x128 pixels. The remaining satellite images were left as pristine. For our purposes, pristine and small-forgery samples underwent data augmentation to increase the size of the training dataset. Augmentation methods included rotating pristine and small-forgery , pairs by multiples of 90 and flipping them about the vertical and horizontal center axes. This produced our dataset , which contains 344 total , pairs. Also, 158 pairs contain small forgeries, 32 pairs contain medium forgeries, 31 pairs contain large forgeries, and 123 are pristine. These subsets of are denoted as , , , and , respectively. Examples are shown in Figure 3.
The dataset was split into three sets for training, validation, and testing. The training dataset contains 128 pairs and 90 pairs. The validation set has 32 pairs and 18 pairs. The final dataset, , consists of 32 , 31 , and 15
pairs. By creating disjoint training/validation and evaluation datasets, we observe how well a trained model extends to new forgery sizes. It was hypothesized that small forgeries might pose the biggest challenge to the network, so they compose the training and validation sets. The cGAN was trained for 200 epochs using the Adam optimizer with an initial learning rate of 0.0002. The reconstruction losscoefficient was set to 100. After training, the model that performed the best on was selected to use for testing.
We did both visual and numerical analysis of the results to determine the effectiveness of our proposed method. Figure 22 contains examples of mask estimates produced by G and their corresponding ground truth masks . It shows that the model produces mask estimates of both pristine and forged images that very closely resemble the ground truth masks, i.e., . Thus, we can clearly see if a forgery is present in an image and, if so, its various properties. A numerical analysis of the results further verifies this.
To evaluate forgery detection, the average pixel value of a mask estimate is defined as
where is the image resolution. Then, binary thresholding with threshold T is used to determine whether the image under analysis is pristine or forged. As described above, an image is considered pristine when . From a thresholding standpoint, this is achieved when . Otherwise, is labeled as forged. Figure 4 shows the receiver operating characteristic (ROC) curves that reveal the performance of different thresholds T. It also illustrates model performances achieved when using BCE loss and loss for reconstruction. The areas under the curve (AUC) for both BCE and loss are 1.000, indicating that it is possible to achieve perfect detection accuracy with thresholding. These results are further verified by the precision-recall (PR) plot in Figure 5 for a model using BCE loss. It too indicates that perfect detection is possible with our 2-class model, as its average precision score is also 1.000.
To assess forgery localization, a similar evaluation process occurs; however, only for images in which forgeries are detected. Their mask estimates are thresholded and then undergo a pixel-wise comparison to to their corresponding ground truth masks . Figure 4 also shows ROC curves for localization for different thresholds. In this case, a performance difference in BCE versus is observed. BCE yields a higher AUC value of 0.988 in comparison to , which achieves an AUC of 0.927. The PR curve (again using BCE loss) with an average precision score of 0.953 confirms that localization results are very good.
In this paper, we propose a forensic image analysis method based on a cGAN for splicing detection and localization in satellite images. The proposed technique exploits a data driven approach, thus learns how to distinguish forged regions from pristine ones directly from the available training data.
Results show that the developed methodology accomplishes both tampering detection and localization with incredibly high accuracy on the used dataset. Moreover, it is interesting to notice how the proposed solution is able to generalize to forgeries of different sizes than those seen during training.
While the results of this experiment are very good, it would be interesting to see how the technique performs on different types of forgeries, as well as on datasets containing images coming from different satellites, to further test the method generalization capability.
-  ((Accessed on 12/01/2018)) 15 free satellite imagery data sources. Note: GIS Geography http://gisgeography.com/free-satellite-imagery-data-list Cited by: §1.
-  (2017-05) . Proceedings of the IEEE International Conference of Information and Communication Technology for Embedded Systems. Note: Chonburi, Thailand External Links: Cited by: §1.
-  (2010-05) Identification of cut&paste tampering by means of double-JPEG detection and image segmentation. Proceedings of the IEEE International Symposium on Circuits and Systems. Note: Paris, France External Links: Cited by: §1.
-  (2017-07) Tampering detection and localization through clustering of camera-based CNN features. , pp. 1855–1864. Note: Honolulu, HI External Links: Cited by: §1.
-  ((Accessed on 12/01/2018)) Conspiracy files: who shot down MH17?. Note: BBC News http://www.bbc.com/news/magazine-35706048 Cited by: §1.
-  (2015-11) Splicebuster: a new blind image splicing detector. Proceedings of the IEEE International Workshop on Information Forensics and Security. Note: Rome, Italy External Links: Cited by: §1.
-  (2016) Deep learning. MIT Press, Cambridge, MA. Cited by: §1.
-  (2014-12) Generative adversarial nets. Advances in Neural Information Processing Systems, pp. 2672–2680. Note: Montréal, Canada Cited by: §1.
-  (2005-01) A semi-fragile pinned sine transform watermarking system for content authentication of satellite images. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium. Note: Seoul, Korea External Links: Cited by: §1.
-  (2017-07) Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976. Note: Honolulu, HI External Links: Cited by: §1, §3.
-  (2015) Forensic Camera Model Identification. In Handbook of Digital Forensics of Multimedia Data and Devices, External Links: Cited by: §1.
-  ((Accessed on 12/01/2018)) Landsat on AWS. Note: Amazon Web Services Inc. https://aws.amazon.com/public-datasets/landsat/ Cited by: §4.
-  ((Accessed on 12/01/2018)) Landsat science. Note: National Aeronautics and Space Administration https://landsat.gsfc.nasa.gov/ Cited by: §4.
-  ((Accessed on 12/01/2018)) NASA. Note: National Aeronautics and Space Administration https://www.nasa.gov/ Cited by: §4.
-  (2013-11) An overview on image forensics. ISRN Signal Processing 2013, pp. 22. External Links: Cited by: §1.
-  (2011-10) Vision of the unseen: current trends and challenges in digital image and video forensics. ACM Computing Surveys 43, pp. 1–42. External Links: Cited by: §1.
-  (2015-10) U-Net: convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Note: Munich, Germany External Links: Cited by: §3.
-  ((Accessed on 12/01/2018)) Satellite images show clearly that russia faked its MH17 report. Note: Mashable http://mashable.com/2015/05/31/russia-fake-mh17-report Cited by: §1.
-  (2013-05) Information forensics: an overview of the first decade. IEEE Access 1, pp. 167–200. External Links: Cited by: §1.
-  ((Accessed on 12/01/2018)) USGS.gov — science for a changing world. Note: U.S. Geological Survey https://www.usgs.gov/ Cited by: §4.
-  (2018-01) Satellite image forgery detection and localization using GAN and one-class classifier. arXiv:1802.04881. Cited by: §1, §1, §1, §2, §4.