A Comparison of Data Augmentation Techniques in Training Deep Neural Networks for Satellite Image Classification

03/30/2020 ∙ by Mohamed Abdelhack, et al. ∙ 0

Satellite imagery allows a plethora of applications ranging from weather forecasting to land surveying. The rapid development of computer vision systems could open new horizons to the utilization of satellite data due to the abundance of large volumes of data. However, current state-of-the-art computer vision systems mainly cater to applications that mainly involve natural images. While useful, those images exhibit a different distribution from satellite images in addition to having more spectral channels. This allows the use of pretrained deep learning models only in a subset of spectral channels that are equivalent to natural images thus discarding valuable information from other spectral channels. This calls for research effort to optimize deep learning models for satellite imagery to enable the assessment of their utility in the domain of remote sensing. This study focuses on the topic of image augmentation in training of deep neural network classifiers. I tested different techniques for image augmentation to train a standard deep neural network on satellite images from EuroSAT. Results show that while some image augmentation techniques commonly used in natural image training can readily be transferred to satellite images, some others could actually lead to a decrease in performance. Additionally, some novel image augmentation techniques that take into account the nature of satellite images could be useful to incorporate in training.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Remote sensing applications have recently been utilizing deep learning techniques with an increasing pace [8]. However, tools for utilization of of deep neural networks (DNN) are still not well-developed for the field of satellite imagery where most of the methods are imported as is from natural image processing that are restricted to the visible range of the electromagnetic spectrum. Satellite images are usually hyperspectral and they also capture images with statistics different to natural images. To be able to fully utilize the DNN models with their full power for satellite applications, it is necessary to assess the usability of current tools and better tailor them to this application.
One of the techniques widely used in deep learning is image augmentation. It have been a very successful technique to increase the variety of natural images used for training deep neural networks [2, 3, 9]. It was used for training AlexNet [6], one of the earliest DNNs for natural image classification. It caused an increase in the number of training images by a factor of 2048. The advantage of this technique is that it enlarges the training dataset while preserving labels by utilizing symmetries in natural images. Since then, augmentation has been used as a standard technique in training DNNs where some DNNs trained on satellite images made use of it but these models usually restricted themselves to the RGB spectral space [5] or attempted to squeeze all channels in a three channel domain [4].
This study is an exploratory attempt where I investigate the usability of different standard image augmentation techniques for DNN training on satellite image classification. Additionally, I investigate a novel augmentation technique of addition speckle noise which is one of the most common noise types in satellite images [11]. Optimizing such techniques could assist in improving the DNN models for satellite images which could open new horizons in the field of remote sensing especially with the high volume of satellite images that cover the whole Earth. This could enable many researchers especially in developing areas to make use of the open access satellite data currently available through portals such as the Copernicus Open Access Hub.

2 Methods

2.1 Model Architecture

I used the standard convolutional neural network VGG19

[10] as a base model for training. The input layer dimensions were expanded to incorporate thirteen channels of the hyperspectral satellite images. Additionally, the fully connected layers were replaced by three layers of dimension , , and

where the last ten units correspond to the classification targets as will be further elaborated in the dataset description section. The model was implemented using Tensorflow with Keras implementation

[1] where the layers of the model were initialized randomly for the first training model and the random weights were then saved and used at each re-initialization stage to ensure that all the models start from the same random weights. This was done thrice as will be elaborated in the data augmentation section.

2.2 Dataset

Hyperspectral images from EuroSAT were used [5]. Images in this dataset were taken by the satellite Sentinel-2A covering areas from different regions in Europe. Each image was 64 64 pixels with thirteen channels representing different electromagnetic bands centered at 443, 490, 560, 665, 705, 740, 783, 842, 865, 945, 1375, 1610, and 2190 wavelengths. Each image belonged to one of ten classification targets (industrial building, residential building, annual crop, permanent crop, river, sea and lake, herbaceous vegetation, highway, pasture, and forest). A more complete description of the data acquisition can be found at Helber et al. 2019 [5].

2.3 Data Augmentation

Due to the lack of a Keras/Tensorflow implementation of the image generator tool that supports hyperspectral images, I developed an image generator seeding from previously published codes [7] but with extended functionality using Python Scikit-image toolbox. The tool allows for image augmentation by horizontal and vertical flipping, rotation, translation, zooming, shearing, and addition of speckle (multiplicative) noise. All but the latter technique are common procedures for natural image augmentation in training DNN models. Speckle noise addition was added as an augmentation technique that better simulates the types of noise that affect satellite images [11].
I first tested each of the common image augmentation techniques separately and performance was compared to the no augmentation condition. Horizontal and vertical flipping were combined together in one condition. Rotation augmentation was implemented with a varying maximum range for rotation of , , and . The image generator randomly generates a rotation angle between the maximum range angle in clockwise and anticlockwise directions. Image zooming augmentation was implemented with a varying maximum range of and

where the image generator randomly generates a zooming factor between maximum range (zooming in) and the inverse of maximum range (zooming out). When the image is shrunk, the edge pixels are repeated to pad the missing regions and preserve the image size. Translation augmentation was implemented with maximum ranges of

and of image size where the generator randomly generates an -axis and -axis translation values between zero and the maximum range in any direction where missing values are padded also by using edge pixel values. Shearing augmentation was tested with maximum ranges of and where padding was implemented similar to the conditions of zooming, rotation, and translation. After testing each augmentation technique, the ones that showed improvement in classification accuracy were tested again in combination with each other and compared against the no augmentation condition.
Finally, I tested the introduction of speckle noise which is a multiplicative noise component common in satellite communication [11]

to train the DNN model as an augmentation technique. I tested the addition of speckle noise with zero mean and variance with values of 0.001, 0.004, 0.007, and 0.010. These conditions were tested against another no augmentation condition in a similar fashion to the other tests.

2.4 Training and Performance Measurement

I trained each DNN model for ten epochs with 500 batches per epoch of 128 images each. For optimization, an Adam optimizer (learning rate=0.001) was utilized to minimize a categorical cross-entropy loss function. While image augmentation aims at increasing the number and variety of training images, I kept the number of training samples the same while image augmentation task was only to increase the variety in training samples. This was done to ensure a fair comparison between each condition and rule out the number of training images as a confounding factor. Image dataset was split into

training images and testing images where the networks were tested after each epoch of training so that we can monitor the discrepancy between training and test performance in order to investigate over-fitting due to the lack of variety of images. Training dataset included 21000 training samples and 9000 test samples. This means that without augmentation, each training sample would be used on average 30 times throughout the ten epochs of training which could lead to over-fitting given that the network has a total of 21,025,226 trainable parameters. While the resulting performance, as will be observed, is not superior to currently existing models (Hilbert), the goal of this study is rather to explore the utility of different augmentation techniques in training DNN models on satellite images.

3 Results

For the first condition, I tested each of the image augmentation techniques common in DNN models for natural image classification. Table 1 summarizes accuracy performance for the epoch with highest test accuracy for each of the different testing conditions. Results show that horizontal and vertical flipping exhibit the highest increase in classification accuracy in comparison with the no-augmentation condition followed by shearing, zooming, and rotation. Image translation, in contrast, showed worse performance than the baseline for each of the two levels tested. Most of the augmentation conditions showed their best epochs in the later epochs with training accuracy lower than test accuracy signaling that further training could yield even better accuracy.

Flipping Rotation Zooming Translation Shear Best Epoch Training Accuracy Test Accuracy
Table 1: Results of testing standard image augmentation techniques in a separate fashion.

When testing with different combinations of different techniques excluding translation, most of the models showed an increase in accuracy albeit lower than flipping from the first test (Table 2). This could be attributed to the increase in image variety leading to longer time to reach the best accuracy. However, even the combinations showing lower accuracy have the potential to improve if given longer training based on the observation that no over-fitting was observed in comparison to the no-augmentation condition that shows slight over-fitting.

Flipping Rotation Zooming Shear Best Epoch Training Accuracy Test Accuracy
Table 2: Results of testing standard image augmentation techniques in combination with each other.

These results show that traditional image augmentation techniques could improve performance even in hyperspectral setting. They also show that translation augmentation could be better avoided in the same traditional manner. One reason for the decrease in performance in image translation could be the image padding mechanism that follows translation that could be destructive to the image. A better implementation of translation could be beneficial especially given that satellite images are usually cropped from larger tiles. It could improve the utility of translation augmentation if it were implemented on a larger tile where the image translation simply includes other pixels from the larger tile rather than removes them completely and pads them later. This could be an interesting endeavor for a future study as it could even improve the performance of other techniques that also cause pixel destruction such as rotation, zooming, and shearing.
When testing the introduction of speckle noise, it appeared that performance had improved using different noise levels (Table 3). Despite that, given that the training accuracy was higher than test accuracy, it could suggest that over-fitting was starting to occur. This could have been caused due to the small variation that occurs in the images due to the noise addition rendering noised images almost identical to clean ones. Nonetheless, this technique appears to be a promising approach to image augmentation for satellite data and could prove to be even more beneficial when introduced along with other traditional augmentation techniques. Also, being native to satellite images, it could prove beneficial for generalization of the models to images originating from different satellites.

Noise Variance Best Epoch Training Accuracy Test Accuracy
Table 3: Results of testing speckle noise addition as an image augmentation technique with different noise variance values.

4 Conclusion

While many of the traditional image augmentation techniques appear to be readily transferable to satellite image processing model, more specific augmentation methods could further enhance the accuracy and generalizability of these models. Traditional augmentation techniques could also be modified to fit the nature of satellite images to further benefit from these techniques.