Remote sensing applications have recently been utilizing deep learning techniques with an increasing pace . However, tools for utilization of of deep neural networks (DNN) are still not well-developed for the field of satellite imagery where most of the methods are imported as is from natural image processing that are restricted to the visible range of the electromagnetic spectrum. Satellite images are usually hyperspectral and they also capture images with statistics different to natural images. To be able to fully utilize the DNN models with their full power for satellite applications, it is necessary to assess the usability of current tools and better tailor them to this application.
One of the techniques widely used in deep learning is image augmentation. It have been a very successful technique to increase the variety of natural images used for training deep neural networks [2, 3, 9]. It was used for training AlexNet , one of the earliest DNNs for natural image classification. It caused an increase in the number of training images by a factor of 2048. The advantage of this technique is that it enlarges the training dataset while preserving labels by utilizing symmetries in natural images. Since then, augmentation has been used as a standard technique in training DNNs where some DNNs trained on satellite images made use of it but these models usually restricted themselves to the RGB spectral space  or attempted to squeeze all channels in a three channel domain .
This study is an exploratory attempt where I investigate the usability of different standard image augmentation techniques for DNN training on satellite image classification. Additionally, I investigate a novel augmentation technique of addition speckle noise which is one of the most common noise types in satellite images . Optimizing such techniques could assist in improving the DNN models for satellite images which could open new horizons in the field of remote sensing especially with the high volume of satellite images that cover the whole Earth. This could enable many researchers especially in developing areas to make use of the open access satellite data currently available through portals such as the Copernicus Open Access Hub.
2.1 Model Architecture
I used the standard convolutional neural network VGG19 as a base model for training. The input layer dimensions were expanded to incorporate thirteen channels of the hyperspectral satellite images. Additionally, the fully connected layers were replaced by three layers of dimension , , and
where the last ten units correspond to the classification targets as will be further elaborated in the dataset description section. The model was implemented using Tensorflow with Keras implementation where the layers of the model were initialized randomly for the first training model and the random weights were then saved and used at each re-initialization stage to ensure that all the models start from the same random weights. This was done thrice as will be elaborated in the data augmentation section.
Hyperspectral images from EuroSAT were used . Images in this dataset were taken by the satellite Sentinel-2A covering areas from different regions in Europe. Each image was 64 64 pixels with thirteen channels representing different electromagnetic bands centered at 443, 490, 560, 665, 705, 740, 783, 842, 865, 945, 1375, 1610, and 2190 wavelengths. Each image belonged to one of ten classification targets (industrial building, residential building, annual crop, permanent crop, river, sea and lake, herbaceous vegetation, highway, pasture, and forest). A more complete description of the data acquisition can be found at Helber et al. 2019 .
2.3 Data Augmentation
Due to the lack of a Keras/Tensorflow implementation of the image generator tool that supports hyperspectral images, I developed an image generator seeding from previously published codes  but with extended functionality using Python Scikit-image toolbox. The tool allows for image augmentation by horizontal and vertical flipping, rotation, translation, zooming, shearing, and addition of speckle (multiplicative) noise. All but the latter technique are common procedures for natural image augmentation in training DNN models. Speckle noise addition was added as an augmentation technique that better simulates the types of noise that affect satellite images .
I first tested each of the common image augmentation techniques separately and performance was compared to the no augmentation condition. Horizontal and vertical flipping were combined together in one condition. Rotation augmentation was implemented with a varying maximum range for rotation of , , and . The image generator randomly generates a rotation angle between the maximum range angle in clockwise and anticlockwise directions. Image zooming augmentation was implemented with a varying maximum range of and
where the image generator randomly generates a zooming factor between maximum range (zooming in) and the inverse of maximum range (zooming out). When the image is shrunk, the edge pixels are repeated to pad the missing regions and preserve the image size. Translation augmentation was implemented with maximum ranges ofand of image size where the generator randomly generates an -axis and -axis translation values between zero and the maximum range in any direction where missing values are padded also by using edge pixel values. Shearing augmentation was tested with maximum ranges of and where padding was implemented similar to the conditions of zooming, rotation, and translation. After testing each augmentation technique, the ones that showed improvement in classification accuracy were tested again in combination with each other and compared against the no augmentation condition.
Finally, I tested the introduction of speckle noise which is a multiplicative noise component common in satellite communication 
to train the DNN model as an augmentation technique. I tested the addition of speckle noise with zero mean and variance with values of 0.001, 0.004, 0.007, and 0.010. These conditions were tested against another no augmentation condition in a similar fashion to the other tests.
2.4 Training and Performance Measurement
I trained each DNN model for ten epochs with 500 batches per epoch of 128 images each. For optimization, an Adam optimizer (learning rate=0.001) was utilized to minimize a categorical cross-entropy loss function. While image augmentation aims at increasing the number and variety of training images, I kept the number of training samples the same while image augmentation task was only to increase the variety in training samples. This was done to ensure a fair comparison between each condition and rule out the number of training images as a confounding factor. Image dataset was split intotraining images and testing images where the networks were tested after each epoch of training so that we can monitor the discrepancy between training and test performance in order to investigate over-fitting due to the lack of variety of images. Training dataset included 21000 training samples and 9000 test samples. This means that without augmentation, each training sample would be used on average 30 times throughout the ten epochs of training which could lead to over-fitting given that the network has a total of 21,025,226 trainable parameters. While the resulting performance, as will be observed, is not superior to currently existing models (Hilbert), the goal of this study is rather to explore the utility of different augmentation techniques in training DNN models on satellite images.
For the first condition, I tested each of the image augmentation techniques common in DNN models for natural image classification. Table 1 summarizes accuracy performance for the epoch with highest test accuracy for each of the different testing conditions. Results show that horizontal and vertical flipping exhibit the highest increase in classification accuracy in comparison with the no-augmentation condition followed by shearing, zooming, and rotation. Image translation, in contrast, showed worse performance than the baseline for each of the two levels tested. Most of the augmentation conditions showed their best epochs in the later epochs with training accuracy lower than test accuracy signaling that further training could yield even better accuracy.
|Flipping||Rotation||Zooming||Translation||Shear||Best Epoch||Training Accuracy||Test Accuracy|
When testing with different combinations of different techniques excluding translation, most of the models showed an increase in accuracy albeit lower than flipping from the first test (Table 2). This could be attributed to the increase in image variety leading to longer time to reach the best accuracy. However, even the combinations showing lower accuracy have the potential to improve if given longer training based on the observation that no over-fitting was observed in comparison to the no-augmentation condition that shows slight over-fitting.
|Flipping||Rotation||Zooming||Shear||Best Epoch||Training Accuracy||Test Accuracy|
These results show that traditional image augmentation techniques could improve performance even in hyperspectral setting. They also show that translation augmentation could be better avoided in the same traditional manner. One reason for the decrease in performance in image translation could be the image padding mechanism that follows translation that could be destructive to the image. A better implementation of translation could be beneficial especially given that satellite images are usually cropped from larger tiles. It could improve the utility of translation augmentation if it were implemented on a larger tile where the image translation simply includes other pixels from the larger tile rather than removes them completely and pads them later. This could be an interesting endeavor for a future study as it could even improve the performance of other techniques that also cause pixel destruction such as rotation, zooming, and shearing.
When testing the introduction of speckle noise, it appeared that performance had improved using different noise levels (Table 3). Despite that, given that the training accuracy was higher than test accuracy, it could suggest that over-fitting was starting to occur. This could have been caused due to the small variation that occurs in the images due to the noise addition rendering noised images almost identical to clean ones. Nonetheless, this technique appears to be a promising approach to image augmentation for satellite data and could prove to be even more beneficial when introduced along with other traditional augmentation techniques. Also, being native to satellite images, it could prove beneficial for generalization of the models to images originating from different satellites.
|Noise Variance||Best Epoch||Training Accuracy||Test Accuracy|
While many of the traditional image augmentation techniques appear to be readily transferable to satellite image processing model, more specific augmentation methods could further enhance the accuracy and generalizability of these models. Traditional augmentation techniques could also be modified to fit the nature of satellite images to further benefit from these techniques.
-  François Chollet. Keras. https://github.com/fchollet/keras, 2015.
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber.
Multi-column deep neural networks for image classification.
2012 IEEE conference on computer vision and pattern recognition, pages 3642–3649. IEEE, 2012.
-  Dan C Cireşan, Ueli Meier, Jonathan Masci, Luca M Gambardella, and Jürgen Schmidhuber. High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183, 2011.
M. A. A. Ghaffar, A. McKinstry, T. Maul, and T. T. Vu.
Data augmentation approaches for satellite image super-resolution.ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2/W7:47–54, 2019.
-  Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  Jens Leitloff and Felix M. Riese. Examples for CNN training and classification on Sentinel-2 data. http://doi.org/10.5281/zenodo.3268451, 2018.
-  Lei Ma, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing, 152:166–177, 2019.
-  Patrice Y Simard, David Steinkraus, John C Platt, et al. Best practices for convolutional neural networks applied to visual document analysis. In Icdar, volume 3, 2003.
-  Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
-  M Vijay and L Saranya Devi. Speckle noise reduction in satellite images using spatially adaptive wavelet thresholding. International Journal of Computer Science and Information Technologies (IJCSIT), 3(2012):3432–3435, 2012.