Deep Learning for Change Detection in Remote Sensing Images: Comprehensive Review and Meta-Analysis

06/10/2020 ∙ by Lazhar Khelifi, et al. ∙ Université de Montréal 0

Deep learning (DL) algorithms are considered as a methodology of choice for remote-sensing image analysis over the past few years. Due to its effective applications, deep learning has also been introduced for automatic change detection and achieved great success. The present study attempts to provide a comprehensive review and a meta-analysis of the recent progress in this subfield. Specifically, we first introduce the fundamentals of deep learning methods which arefrequently adopted for change detection. Secondly, we present the details of the meta-analysis conducted to examine the status of change detection DL studies. Then, we focus on deep learning-based change detection methodologies for remote sensing images by giving a general overview of the existing methods. Specifically, these deep learning-based methods were classified into three groups; fully supervised learning-based methods, fully unsupervised learning-based methods and transfer learning-based techniques. As a result of these investigations, promising new directions were identified for future research. This study will contribute in several ways to our understanding of deep learning for change detection and will provide a basis for further research.



There are no comments yet.


page 3

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning (DL) has seen an increasing trend and a great interest over the past decade due to its powerful ability to represent learning. Deep learning allows models that are built, based on multiple processing layers, to learn representations of data samples with several levels of abstraction [1]. Deep learning enables models that are composed, based on multiple layers, to learn representations of data samples with several ranges of abstraction levels [1]

. It may also be considered as the analysis of models that either require a greater composition of learned concepts or functions, compared to conventional machine learning models such as naive Bayes

[2] [3]

, support vector machine (SVM)

[4] [5]

, random forests,

[6] [7]

and the decision tree


On the basis of its state-of-the-art performance, deep learning has been consequently applied to various domains, such as computer vision

[10], speech recognition [11], and information retrieval [12]. Particularly, in the computer vision field, deep learning has taken great leaps thanks to the recent advances of processing power, the improvements in graphics processors and the increased data volumes (i.e., videos and images). Notably, the science of remote sensing (RS) has seen a massive increase in the generation and enhancement of digital images captured from airplanes or satellites that cover almost each angle of the surface of the earth. This growth in data has pushed the community of the geoscience and remote sensing (RS) to apply deep learning algorithms to solve different remote sensing tasks. Among these tasks, stands out the change detection (CD) task defined in [13] as ’the process of identifying differences in the state of an object or phenomenon by observing it at different times’. In another word, change detection refers to identifying the differences between images acquired over the same geographical zone but taken at two distinct times [14].

Change detection techniques are extensively utilized in various applications [15] including; disaster assessment [16], environmental monitoring [17], land management [18] and urban change analysis [19], etc. Currently, the number of extreme disasters caused by climate change such as drought, floods, hurricanes, and heat waves, has revealed at the same time a new challenge for researchers and a need for developing more effective automated change detection methods. Motivated by those aforementioned observations, deep learning has been introduced for change detection in remote sensing and achieved good performance.

Recently, various reviews that focus on deep learning for remote sensing data have been published. These studies have summarized the deep learning techniques adopted in all major remote sensing sub-areas including classification, restoration, denoising, target recognition, scene understanding, and other tasks (for further details we refer the reader to

[20] [21] [22]). To the best of our knowledge, however, there is no work that has studied the recent progress of deep learning for the task of change detection in a specific and extensive way. Therefore, the purpose of this present report is to provide an overview of the state of deep learning algorithms as applied in remote sensing images for change detection. Hence, by performing a meta-analysis, we selected and categorized the relevant papers related to DL and change detection. By doing so, then we provide a technical review of these studies that shed more light on the advance of deep learning for change detection. This review will serve as a base for future studies in this subfield of research.

The rest of this paper is structured as follows. Section 2 presents the definition of the change detection problem. Section 3 gives a brief overview of deep learning as well as the typical deep models used for change detection. Section 4 describes the methods and data used to review the state-of-the-art. In Section 5, we divide these previous works into three categories; fully supervised learning-based methods, fully unsupervised learning-based methods, and transfer learning-based methods. Section 6 suggests two interesting research directions to further advance the field. Finally, Section 7 outlines the conclusions.

2 Change detection in remote sensing

Change detection is the operation of quantitative analysis and determination of surface changes from phenomena or objects over two distinct periods [13]. This process, which is a basic technology in the field of earth observation, attempts to distinguish the changed and unchanged pixels of bi-temporal or multi-temporal remote sensing images acquired from the same geographical zone or area, but at different times, respectively [23] [24]. Assigning to each pixel a binary label based on a pair or series of co-registered images represents the main purpose of the change detection system. A positive label thus means that the area of that pixel has changed, while a null label represents an unchanged area (See Figs. 1 and 2) [25]. Actually, change detection represents a powerful tool for video surveillance, mapping urban areas, and other forms of multi-temporal analysis.

Formally, let and be two co-registered images, which share the same size and taken over the same geographical region at two separate periods and , respectively, using the same sensor, in the classic monomodal case:


The primary purpose of a change detection system is to generate an accurate binary change map (CM):


where represents the position coordinates of the pixel indexed . In traditional methods, this change map can be obtained by a difference image (DI) operation, based on differencing or log-rationing function (), followed by a final analysis of the DI result.

Change detection has been successfully used in a wide variety of applications. In particular, in the agricultural sector, change detection is adopted for deforestation monitoring, disaster assessment and shifting cultivation monitoring. In the military field, it is now utilized in collecting information about new military installations, movement of the enemy’s military forces, battlefield area, and damage assessment [26]. In the civil field, change detection is used to control urban area development and city extension [27]. Also, it is actually adopted to monitor the effects of climate changes usually associated with the increase of levels of greenhouse gas (GHG) emissions in the atmosphere, such as changes in mass balance and glacier facies or sea-level change.

While the change detection algorithms have shown many benefits in various fields of applications, it faces some serious challenges. Among these challenges, we can consider the variation in data acquisition parameters which can affect the process of finding the relevant changes by adding an irrelevant information into the data. In addition, this unwanted change can be emerged as atmospheric features, like fog, clouds, and dust. For example, a cloud present in one image (at time ) but not in the other one (at time ) leads to a bright patch that can be registered as a difference and consequently affects the quality of the resulting change map. Angles of sunlight may also present another problem related to the presence and the direction of the shadows on the scene [26]. Besides, vegetation growth and surface reflectance of objects such as soil before and after rain can also affect the result of a change [28]. Thus, a robust change detection method must be able to differentiate between relevant changes and irrelevant changes in satellite images in addition to the detection of temporal changes. Motivated by those successful applications, recently deep learning techniques, capable of extracting information from data (image or video), have been applied to solve this problem and have achieved good performances.

Figure 1: Graphical illustration of the change detection problem.
Figure 2: An illustration of a typical change detection results within a high-resolution satellite image [29] [30].

3 Brief Overview of deep learning

Deep learning (DL) algorithms, aiming at learning representative and discriminatory features from a set of data in a hierarchical way, have received much attention from worldwide geoscience and remote sensing communities, in recent years. In the first part of this section, we briefly present the deep learning history to explain the trend in its growth. In the second part, we outline different deep network models widely designed for change detection in remote sensing images. These deep networks incorporate deep belief networks (DBNs), stacked autoencoders (SAEs), generative adversarial networks (GANs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs).

3.1 History

Deep learning (DL) is a particular approach of machine learning which takes advantage of the knowledge of the statistics, human brain, and applied mathematics statistics, as it advanced over the last years [31]. By gathering these pieces of knowledge, this approach relieved human experts from formally defining all the knowledge that the computer machine requires to resolve a particular problem. This powerful approach reaches good flexibility and scalability by representing the world as an embedded hierarchical structure of concepts. This concept hierarchy enables the machine to recognize complex concepts by developing them from simpler ones [31]

. DL is driven from the connectivity theory related to the functionality of our brain cells, also called neurons, leading to the concept of artificial neural networks (ANN). ANN is designed based on artificial neuron layers to receive input data and transform it into outputs by applying an activation function and learning progressively higher-level features. The intermediate layers (in the middle of the input and output) are often called “

hidden layers” because they are not directly observable from the inputs and outputs of the system [32]. In practice, to solve complex tasks such as change detection in remote sensing images, a neural network that contains multiple hidden layers is applied. This multiple-layered structure is addressed as a “deep” neural networks (DNNs), therefore, the word “deep learning”.

As described in [31], the development of deep learning has followed three main waves reflecting different philosophical viewpoints. The first wave refers to cybernetics in the 1940s-1960s, characterized by the concepts advance of biological learning [33] [34]

and application of the first models such as the perceptron

[35] which allows the training of an architecture based on a unique neuron. The second stage began with the connectionist111Connectionism represents a movement in cognitive science that aims to interpret intellectual abilities through artificial neural networks [43]. approach expanded in the 1980s-1995s period, with back-propagation [36] to train a neural network using one or two hidden layers. This fundamental building block updates the weights of the connections in the network for multiple times, by minimizing a measure of the gap among the actual output vector of the net and the aimed output vector [37]. While this approach works quite well when dealing with simple applications, especially, the community of computer vision has found some issues to apply this approach to complex problems. The main challenge was the lack of specific computing hardware to train efficiently deep neural networks (DNNs). The third wave started in 2006 under the name of deep learning [38] [39]. Since that time, we have seen a renewed importance in deep neural networks benefitted to the availability of powerful computer systems, expanded databases and new training techniques. Currently, deep learning has received much focus in different research areas of computer vision, including the analysis of remote sensing images.

3.2 Deep models

3.2.1 DBNs

Deep belief networks (DBNs) are mainly built based on a layerwise training model called restricted Boltzmann machine (RBM). RBMs are stochastic undirected graphical models containing a layer of visible variables and a unique layer of hidden variables. Fig.

3 illustrates the graph structure of the RBM. It is a bipartite graph that involves the link of visible units representing observations, to hidden units that learn to describe features based on undirected weighted connections [40]. In this model, there are no connections permitted among any variables in the visible layer or between any units in the hidden layer. Mathematically, let suppose that the visible layer contains a set of

binary random variables, and the hidden layer

consists of binary random variables. The energy function of the canonical RBM can be formulated as [31]:


where , and represent learnable parameters. The weights on the connections and the biases

The weights on the connexions and the biases of the individual units express a distribution of probability through an energy function over the joint states of the visible and hidden units

[41]. The probability (i.e., energy) of a joint configuration is then defined as:


where is the normalizing constant usually referred to the partition function:


Because of the restricted characteristic (i.e., feature) representation capability of a unique RBM, several RBMs can be stacked one by one forming a DBN that may effectively trained to obtain a deep hierarchical modeling of the training data [42]. Fig. 4 presents a DBN composed by stacking multiple RBM layers.

Figure 3: A general illustration of RBM.
Figure 4: General illustration of DBN.

3.2.2 SAEs

Autoencoder (AE) is considered as the principal building piece of the stacked autoencoder (SAE) [44]

. An autoencoder is a feedforward neural network model that applies backpropagation, setting the objective values to be consistent (or equal) to the inputs. This model consists of two steps an encoder

and a decoder that attempts to provide a reconstruction . On the one hand, based on a non-linear function, the encoder side projects the input vector to the hidden layer:


On the other hand, the decoder maps the hidden layer back to the output layer that contains an identical number of units as the input layer:



denotes the logistic sigmoid function

. and represent the input to hidden and the hidden to output weights, respectively. In addition, and identify the bias of the hidden and output units. With the purpose of reconstructing the error between and , a metric based on the Euclidean distance is generally minimized. This reconstruction loss is defined by:


A typical architecture of autoencoder is presented in Fig. 5. A stacked Autoencoder (SAE) is a neural network built on the top of several layers of autoencoders where the output of each hidden layer is connected to the input of the next hidden layer. Fig. 6 shows a simple representation of a SAE.

Figure 5: Auto-Encoders.
Figure 6: Stacked Auto-Encoders.

3.2.3 CNNs

Convolutional neural networks, also known as CNNs [45], are a special form of neural network designed for processing data that has a known grid-like representation, for example image data, which can be considered a two dimension (2D) grid of pixels. Generally, the CNNs can be thought of as an extractor of hierarchical characteristics, which, on the one hand, extracts features of diverse abstraction layers, and on the other hand, maps the raw pixel intensities into a feature vector. [46]. An architecture of a typical CNNs is illustrated in Fig. 7, where , , and

denote convolutional, max-pooling and fully-connected layers, respectively. Convolutional layer represents the fundamental component of the CNN architecture

[47]. In this layer, several trainable convolution kernels (called also filters) are applied to the previous layer. The weights of these kernels aim to connect units in a feature map with the previous layer. As a result of convolution, local conjunctions of features are detected and their appearance is mapped to the feature maps. The stacking of various convolutional layers increases the depth of networks which makes the extracted maps more abstract. The earlier layers enhance features, for example edges, however, the following layers aggregate these features in the form of motifs, parts, or objects. Formally, suppose that represents the filters convolution number in layer of the network, and the 2D array related to the input of layer . The output feature vector of layer , denoted , can be computed as follows:


where is the bias matrix, represents a filter connecting the -th feature map in the previous layer () with the -th feature map in layer , and denotes the convolution operator. Typically, after the convolution operation a nonlinear activation function is performed on each element of the convolution result.


A range of activation functions has been proposed in the literature to improve the performance of CNNs, for example the sigmoid function [48], hyperbolic tangent function (tanh) [49], adaptive piecewise linear activation (APL) [50]

, and the popular rectified linear unit (ReLU)

[51]. The convolution process is followed by a max-pooling operation. This step aims to replace the output of the network at some particular positions with a summary statistic relating to the neighborhood of this location [31]. The pooling operation aims to gradually minimize, the spatial size of the output feature maps, and hence, decreases the parameters number of the network. Generally, there are two standard choices for the operation of pooling: max and average. Formally, for a window-size neighbor represented by . The average takes the arithmetic mean of the elements in each pooling region as follows:


while the max operation takes the largest element:


where is pooling region (i.e., the number of elements in ) and is the activation value related to the position . After the pooling operation, the output feature maps of the previous layer are flattened and provided to fully connected layers. These layers are exploited to extract more high-level information by reshaping feature maps into an -dimension vector [42]. At the last layer of the network, called the classification layer, neurons are gathered automatically into output feature maps that correspond to the number of classes. Then, using a softmax function, the output of the classification layer

is converted into (normalized) probability distribution errors. Specifically, the probability distribution of classes is produced via the following function:


where the calculated probabilities are within a range, and the sum of all the probabilities is equal to . Convolutional Neural Networks (CNNs) have been well established as a powerful class of models from a variety of computer vision tasks [52] including change detection in remote sensing images. Hence, different successful CNNs architectures have been suggested in the literature. The current surge of the CNNs in many tasks heavily relies on the use of modern network architectures, such as the AlexNet [53], VGG [54], and RESNET [55]. These modern architectures explore new and innovative ways for constructing convolutional layers that guarantee more efficient learning [56].

Figure 7:

A flowchart of a conventional CNN, which consists of two convolutional layers (C1, C2), two pooling layers (P1, P2), two fully connected layers (F1, F2) and a softmax layer (output).

3.2.4 RNNs

Recurrent neural networks, also known as RNNs [37], are a class of neural networks that allows processing sequential data. Particularly, this model is enhanced by the integration of edges that spanning adjacent time steps which introduces the notion of time [57]. Compared to the convolutional neural network that is specialized for processing a grid of values such as an input image, a recurrent neural network allows operating over a sequence of vectors or values with the help of a recurrent hidden state (see Fig. 8). Formally, suppose that is a sequence of vectors where represents the data at the time step. Two activation functions define all calculations required for computation at each temporal sequence :


where is the same matrix utilized at each time step. Via this matrix, the hidden units in the previous step is used to compute , while the current observation provides a weighted term , which is summed with and a bias term . Both and

are typically replicated over time. The output layer is represented by a conventional neural network activation function applied to the linear transformation of the hidden units, and the process is repeated for every time phase


. Unfortunately, standard RNNs suffer from a critical drawback related to the vanishing gradient problem, which makes the neural network hard to be trained properly. To overcome this serious problem, long short-term memory (LSTM)


and gated recurrent unit (GRU)

[60] were suggested. One advantage of LSTM is that it introduces the notion of memory cell, a unit of computation that replaces classical nodes in the hidden layer of a network. This capability of memory cells able to overcome difficulties with training encountered by earlier recurrent networks. Like the LSTM unit, the GRU is characterized by units which control the information flow within the unit, nevertheless, without having a distinct memory cell [61].

Figure 8: An unrolled recurrent neural network.

3.2.5 GANs

Generative adversarial networks (GANs) were proposed by Goodfellow et al. [62]. Given a real data (e.g., images), this generative technique learns to produce novel data with the same statistics as the original data. GANs are based on a game theoretical scenario in which the generator network must compete against an adversary [31]. A general illustration of the structure of a GAN is shown in Fig. 9. Formally, from training data and a provided a priori distribution (i.e., random noise) , the generator network directly generates fake samples . Its adversary, the discriminator network, aims to differentiate between samples provided by the training data and samples produced by the generator. While the discriminator is trained to maximize the value of , indicating the probability of selecting the correct labels to the training samples, the generator block is trained to minimize [42]. Thus, and play a two-player minimax game as follows:



Here, denotes the expectation operator. The main goal of training generative networks is to produce examples that appear realistic compared to the original data. Based on that assumption, GANs have been successfully used in different computer vision and image processing applications.

Figure 9: General illustration of the structure of a Generative Adversarial Network (GAN).

4 Methods and data used to review DL for CD in remote sensing images

4.1 A meta-analysis process for data extraction

We searched and collected all published studies relevant to change detection in remote-sensing images using the deep learning approach. The search for studies was conducted using the web of science database222The web of science database is accessible via the following link: . It is the most trusted publisher global citation database. The generated dataset was built using an advanced search option (search date: April th, ) with a relevant controlled vocabulary included; deep learning, change detection and remote sensing topic, etc. All of the studies included in our research had been published up to . Ignored from the search query were prefaces, article summaries, interviews, discussions, news items, correspondences, readers’ letters, comments, summaries of tutorials, panels, workshops, and poster sessions. This search strategy resulted in a total of unique papers, including journal articles and conference papers, two early access paper and one editorial material paper. All these included studies are summarized in one file publicly accessible via this link: "". It is worth noting that every article included in the review was read in detail by the authors.

4.2 Referred journals and conference papers

Among the set of peer-reviewed journal papers, a larger part of articles were published in the ten journals shown in Table 1. Note that journals with only one publication are not listed here. Overall, these journals include articles peer-reviewed journal papers related to DL change detection and remote sensing. Regarding the number of articles published per journal, the top five peer-reviewed journal papers are; Remote Sensing, IEEE Transactions on Geoscience and Remote Sensing (TGRS), IEEE Access, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing and IEEE Geoscience and Remote Sensing Letters.

We found that the topic is now well represented at the major international remote sensing conferences. Thus, among the set of conference papers, a majority of the articles were published by the two remote sensing academic societies listed in Table 2, namely the IEEE Geoscience and Remote Sensing Society (IGARSS) and the Society of Photo-Optical Instrumentation Engineers (SPIE). The conferences with only one publication are not listed in this table. The reader should bear in mind that conference papers were excluded from the present meta-analysis because many were expanded into journal papers after presenting at the conferences (as in [20]). In addition, after a deep understanding of the content of all the papers, journal papers that not cover the subject of this study were also excluded.

4.3 Brief interpretation of the results

Several general conclusions may be drawn from the conducted statistical analysis to examine the trend in the use of DL for change detection. Trends and projections are illustrated in this study using histogram graphs in order to better visualize the distribution of the data. Figure 10 reveals that there has been a marked increase in the number of scientific papers released on the topic since . The number of published papers is expected to grow even more tremendously in the coming years. Similarly, the graph of Fig. 11 shows that there has been an important increase in the number of citations of those papers. Table 3 highlights the top three most-cited papers. This exponential growth, both for the number of published papers and the number of citations, validates the rapid growth of interest in the study of deep learning for change detection in remote sensing images. Notably, the number of journal papers on this topic now exceeds the number of conference papers. This indicates the technical maturity of this research area. As can be seen from Fig. 12, the CNN model has been the most widely applied for change detection, followed by the SAE, DBN, RNN, AEs, RBM and GAN models. This higher popularity of CNN is probably because it is more suitable to learn hierarchical image representations from the input data by sequentially abstracting higher-level features [47]. Looking at Fig. 13, it is apparent that the SAR image type has been the most commonly used within deep learning model for change detection, followed by multispectral, arial, optic, heterogeneous (i.e., multi-modal), and hyperspectral images. The reason for this is, that synthetic-aperture radar captures images using microwave signals which can enter through clouds [63], and is therefore more likely to have a significant advantage of being insensitive to sunlight and complex atmospheric conditions [64].

Figure 10: Growing number of published papers related to deep learning for change detection in remote sensing (we predict more than 100 papers in 2020).
Figure 11: The number of citations per year for papers related to deep learning for change detection in remote sensing (we predict more than 1000 citations in 2020).
Figure 12: Distribution of DL models used in the studies.
Figure 13: Distribution of types of remote sensing images used in the studies.
Name of journal #
-Remote Sensing- (31)
-IEEE Transactions on Geoscience and Remote Sensing (TGRS)- (13)
-IEEE Access- (7)
-IEEE Journal of Selected Topics in Applied Earth Observations and mote Sensing- (7)
-IEEE Geoscience and Remote Sensing Letters- (5)
-ISPRS Journal of Photogrammetry and Remote Sensing- (5)
-Journal of Applied Remote Sensing- (5)
-IEEE Transactions on Neural Networks and Learning Systems- (4)
-Applied Sciences-Basel- (3)
-International Journal of Image and Data Fusion- (2)
Table 1: Journals identified as pertinent, and number of relevant papers.
Title of conference/Proceedings #
-International Geoscience and Remote Sensing Symposium (IGARSS)- (17)
-Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE)- (9)
Table 2: Conferences and proceedings determined as pertinent, and number of relevant papers.
Authors Title Year of publication Times cited
-Gong et al. [65]- -Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks- 2016 (224)
-Lyu et al. [66]- -Learning a Transferable Change Rule from a Recurrent Neural Network for Land Cover Change Detection- 2016 (126)
-Zhang et al. [67]-

-Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images-

2016 (108)
Table 3: The top three most-cited papers.

5 Deep learning for change detection in remote Sensing images

Deep learning has recently become the focus of considerable interest in the change detection field [20]. It aims to automatically learn high-level features from various remote sensing data compared to traditional hand-crafted features-based methods [46]. Deep learning approaches for change detection can be grouped in several ways by considering different perspectives. In this study, therefore, the deep learning approaches used for change detection are classified into three groups based on the learning technique and the availability of a training data that can be either labeled or unlabeled. The first type contains fully supervised methods which solve the problem by learning from a labeled training dataset. The second type of methods contains fully unsupervised methods that learn from unlabeled datasets. Both supervised and unsupervised methods serve at selecting the available features that consistent with the target concept. Hence, in supervised learning, the target concept is explicitly correlated to class affiliation, while in unsupervised learning the target concept typically targeted through inherent structures of the data [68]. The third type of methods contains transfer learning based methods. Transfer learning is an important machine learning technique which attempts to utilize the knowledge learned from one task and to apply it on another, but associated, task with the purpose to either reduce the necessary fine-tuning data size or improve performances [69]. Sections 5.1, 5.2, and 5.3 will outline these types of methods in detail.

5.1 Fully Supervised learning based-methods

For a long time, it was commonly assumed that the process of training deep supervised neural networks is challenging, time-consuming, and too difficult to perform [46]. While the standard learning strategy consisting of randomly initializing the weights, recently, it was found that deep supervised networks can be trained by proper weight initialization. This novel strategy just adequate enough for improving the gradient flow as well as the transmission of useful information by the activations [70] [22]. The efficiency of supervised deep networks is particularly evident in case of the availability of large amount of labeled data used to properly train it.

In recent years, some pure supervised DL methods have been suggested for change detection in RS images relying on CNNs [71] [46]. These CNNs based studies have shown superior performances to the classical state-of-art methods. The CNNs are hierarchical models which converts the input image into multiple layers of feature maps. These generated maps consist of high-level discriminatory features that reflect the original input data [22]. Based on the fully convolutional networks U-Net is considered as one of the standard CNNs architectures used for change detection task. The general network architecture of U-Net is symmetric, having an encoder that extracts spatial features from the image, and a decoder that builds the segmentation map from the encoded feature [72].

Jaturapitpornchai et al. [63] have proposed in detail a U-Net-based network, which detects the novel buildings construction in developing regions using two SAR images captured at different times. Subsequently, the U-Net architecture was extended through a few modifications in other works. In this regard, Hamdi et al. [73] have developed an algorithm using a modified U-Net model for automatic detection and mapping of damaged areas in an ArcGIS environment. Their model was trained based on a database of a forest area in Bavaria, Germany. Recently, an improved UNet++ architecture was proposed by Peng et al. [74] for end-to-end change detection of VHR satellite images. In order, to learn multi-scale feature maps dense skip connections were established between the different layers of this architecture. In addition, a residual block strategy was followed to facilitate gradient convergence of the network. For change detection in hyperspectral image, a general end-to-end two-dimensional CNNs framework, called GETNET, was presented by Wang et al. [75]. In addition, a conventional change vector analysis (CVA) method [76] was adopted to generate pseudo-training sets with labels. Wiratama et al. [77] proposed a dual-dense convolutional network for recognizing pixel-wise change on the basis of a dissimilarity analysis of neighborhood pixels on high resolution panchromatic (PAN) images. In their suggested algorithm, two fully convolutional neural networks are utilized to compute the dissimilarity of neighboring pixels. Further, a dense connection in convolution layers is performed to reuse preceding feature maps by connecting them to all subsequent layers. Zhang et al. [78] have introduced a fully atrous convolutional neural network (FACNN). In this FACNN, first, an encoder which consists of fully atrous convolution layers, is used for extracting scale features from VHR images. Afterwards, a change map based on pixel is generated using the classification map of current images and an outdated land cover geographical information system (GIS) map. Daudt et al. [79] have proposed an integrated network based on deep FCNNs that performs a land cover mapping and change detection simultaneously, using information from the land cover mapping branches to help with change detection. Zhang et al. [80] presented a spectral-spatial joint learning network (SSJLN). At the first part of this model, the spectral-spatial joint representation is derived from the network similar to the Siamese CNN (S-CNN) [81]. Second, these extracted features are combined together using a feature fusion block. To explore the underlying information of the combined features, discrimination learning is then performed at the last step. Liu et al. [25] have demonstrated the complementarity of CNNs and bidirectional long short-term memory network (BiLSTM) by combining them into one unified architecture. While, the former is useful in extracting the rich spectral-spatial features from bi-temporal images, the latter is powerful in analyzing the temporal dependence of bi-temporal images and transferring the features of images. Similarly, Cao et al. [82] have combined a deep denoising model trained on a huge number of simulated SAR images patches with a CNNs model. While, the deep denoising network is adopted to keep useful information and suppress noise simultaneously, a three layers of a CNN model are built to establish the feature learning process. Contrary to previous approaches, that rely on CNNs based models Wiratama et al. [83] have proposed a fusion architecture combining front-end and back-end neural networks. In order to accomplish low-level and high-level differential detection, the fusion network contains both single-path and dual-path networks. In addition , based on the two dual outputs, a two-stage decision algorithm was proposed by authors to efficiently provide the final change detection result. This method has shown a good performance for the identification of changed/unchanged areas in high-resolution panchromatic images.

5.2 Fully unsupervised learning based-methods

Supervised deep learning methods such as the CNNs and its modified models have achieved satisfactory result in many computer vision tasks due the availability of large annotated datasets [46]. Unfortunately, for change detection task, there are often not enough training data to build such models. In addition, building a ground-truth map reflecting the real change information of ground objects costs lots of time and effort [84]. Therefore, in many cases, it is more efficient to learn the change features generated from a remote sensing image in an unsupervised manner [85].

Unsupervised feature-learning methods are mainly based on models which may learn feature representations from the patches (of images, par example) without any necessary supervision [22]

. There have been numerous enhancements and evolution to the unsupervised deep learning approach that has been successfully applied to recognizing remote sensing (RS) scenes and targets. One of the most well-known and significant approaches is to stack (or to combine) together different shallow feature-learning methods like the Gaussian Mixture model, AEs, sparse coding and RBMs

[46]. In this regard, for change detection in multispectral images, Zhang et al. [86] have proposed a new unsupervised method combining the DBN and the feature change analysis (FCA). Thus, to capture the useful information for discrimination between changed and unchanged regions and to also suppress the irrelevant variations, the available spectral channels are transformed into an abstract feature space via the DBN. Then, using these learned features, an FCA is performed to identify the different types of change. Similarly, Su et al. [87]

have introduced a novel deep learning and mapping (DLM) framework oriented to the ternary change detection task for information unbalanced images. In their method, two types of neural networks are used. First, a stacked denoising autoencoder is applied to two input images, serving as a feature extractor. Then, after a selection step of relevant samples, mapping functions are generated by a stacked mapping network, establishing the relationship between the features of each class. Afterwards, a comparison between the features is performed and the final ternary map is generated via a clustering process of the comparison result.

Gao et al. [88] have proposed a novel SAR image change detection method based on deep semi-nonnegative matrix factorization (Deep Semi-NMF) [89]

and singular value decomposition (SVD) networks

[90]. In their suggested method, the deep Semi-NMF is used as a pre-classification step. Following this, the SVD network of two SVD convolutional layers is applied to obtain reliable features, where good quality of these obtained features effectively improves the classification performance. To achieve more precise ternary change detection without any supervision, Gong et al.[14] have combined SAE, CNN and an unsupervised clustering algorithm. First, noise is removed and key change information are extracted by transforming difference image into a suitable feature space using SAE. Next, an unsupervised clustering is established on the feature maps learned by SAE. This final step aims to provide reliable pseudo labels for training the CNN as a change feature classifier. Lv et al. [85]

have presented a feature learning method based on the combination of a stacked contractive autoencoder (sCAE) and a simple clustering algorithm. In this method, first, an affiliated temporal change image is built using three different metrics. the aim of this strategy is to provide more information about the temporal difference on the pixel level. Second, homogeneous change samples are provided by generating a set of superpixels using a simple linear iterative clustering algorithm. Third, these generated superpixel-samples are used as input to train a sCAE network. Then, the encoded features results from the sCAE model are binary classified to create the change result map.

Gong et al. [91] have developed a generative discriminatory classified network (GDCN) for multispectral image change detection. The generative adversarial networks represent the key block of this proposed model by providing three types of data; labeled data, unlabeled data, and new fake data. More precisely, this GDCN composes of a discriminatory classified network (DCN) and a generator (G). While the DCN divides the input data into changed class, unchanged class, and extra class (i.e., fake class), the generator recovers the real data from input noises to provide additional training samples. Finally, the bitemporal multispectral images are fed to the DCN to get a final reliable change map. For change detection in SAR images, Gen et al. [92] have proposed SGDNNs, an unsupervised saliency guided deep neural networks. The first step in this model consists of extracting a salient region from the difference image (DI), which probably belongs to the changed object. Then, a hierarchical fuzzy C-means (HFCM) clustering [93] is established to select samples with higher probabilities to be changed and unchanged. Using these pseudotraining samples, a DNNs based on the nonnegative-and Fisher-constrained autoencoder are applied to get reliable final detection. Li et al. and [94] performed change detection for hyperspectral images using a novel noise modeling-based unsupervised fully convolutional network (FCN) framework. Specifically, their suggested deep CNN is trained using the change detection maps of existing unsupervised change detection methods, while the noise is removed during the end-to-end training process. Recently, Huang et al. [95] have proposed a new unsupervised algorithm based on deep learning called ABCDHIDL to automatically detect the building changes from multi-temporal high-resolution remote sensing (HRRS) images. In this algorithm, initially, a convolution operation is adopted for two reasons; first, to extract the spatial, texture and spectral features and second to generate a combined low-level feature vector for each pixel. Then, the unlabeled samples are injected to pre-train a DBN network, where its parameters are optimized by jointly using the extreme learning machine (ELM) classifier [96]. To further improve the detection process, labeled samples are offered by an automatic selection based on a morphological operation.

5.3 Deep transfer learning based-methods

In many remote sensing applications, it is so expensive or impossible to recollect the required training data and rebuild the models [97]. In particular, for the change detection task, there are often not enough training data that accurately represent the real change information of ground objects. Therefore, it is important to reduce the requirement and effort to recollect the training data. In that context, transfer learning or knowledge transfer among task domains can be a reliable solution.

Transfer learning is defined as the capability of extracting knowledge from one or more source tasks and applying it to a novel or target task [97]. Formally, given a source domain with a related source task and a target domain with a corresponding task , transfer learning is the proceeding of improving the target predictive function by utilizing the corresponding information from and , where or [98].

There are two basic approaches currently being adopted in research into transfer learning. The first approach consists of using the outputs of one or more layers of a network (such as AlexNet or resnet-101) trained on a different task as generic high dimensional feature detectors and training a new shallow model based on these features [99]. The second approach is more involved, which consists of fine-tuning the network pre-trained in general images. Hence, final layer (for classification/regression) is not just replaced, but also, previous layers are retained again [100]. Following this former approach, Hou et al. [101]

have transferred a CNNs already pre-trained on large-scale natural image data set (e.g., ImageNet

[102]), to a RS domain. Specifically, to get better results they fine-tune the VGG-16 [54] to adapt it to their optical RS images on an aerial image dataset (AID) [103]. Similarly, Venugopal et al. [104], have resorted to a ResNet- [105] network as a pretrained model, and they fine-tuned parameters based on a dilated convolutional neural network (DCNN) which detects the changes between the two images. Afterwards, the classified result is determined from the final feature map as unchanged and changed areas. To solve the change detection problem in optical aerial images, Zhang et al. [106]

proposed a new method based on deep Siamese semantic network trained using an improved triplet loss function. First, a DeepLabv2

[107] model pretrained on large-scale image data set (e.g., PASCAL VOC 2012 dataset [108]

), was transferred to the network, due to the difficulty of directly training the Siamese network. Based on this strategy, the network has achieved a comparable performance with limited computational cost and minimum training samples. This change detection method is based on four steps; First, In order to perform a radiant correction to the two coregistered images, the input bitemporal images are preprocessed using histogram matching. Second, the preprocessed pair images are fed to the deep Siamese semantic network in order to generate two feature maps. Following this, a resizing operation is applied for two semantic feature maps by a bilinear interpolation. Afterwards, a distance map is obtained by computing the Euclidean distance between semantic feature maps. Finally, a simple threshold segmentation method is used to separate the distance map, and therefore, to generate the final change detection result.

Fang et al. [109] proposed a novel hybrid end-to-end framework named dual learning-based Siamese framework (DLSF) for change detection from very high resolution (VHR) images. This framework consists of two parallel streams which are dual learning-based domain transfer and Siamese-based change decision. While the first path is aimed at reducing the domain differences between two paired images and maintaining the intrinsic information by translating them into each other’s domain, the second path is aimed at learning a decision strategy to decide the changes in two domains, respectively. Yang, et al. [64] have adopted the concept of change that is learned from the source domain to the target domain by reducing the distribution discrepancy between two domains. In their model, the pretraining stage includes two tasks; a supervised change detection in the source domain using U-Net architecture and a reconstruction network in the target domain without labels. The lower layers are shared between the two tasks, however, the final layers related to each task are trained separately. After the pretraining step, reliable labels that are chosen from a CD map, are used to fine-tune the change detection network for the target domain. Although training data are limited in the task of sea ice change detection, in the work of Gao et al. [110] a large data set was used to train a transferred multilevel fusion network MLFN, in addition, a fine-tune strategy was utilized to optimize the network parameters.

6 Promising research directions

To advance the progress of the change detection task, in this section, we suggest two important directions for research, specifically deep reinforcement learning and weakly supervised change detection.

6.1 Deep reinforcement learning

Due to the lack of sufficient labeled training databases for the supervised change detection task, the description capability of the features generated by deep learning methods may become limited or even impoverished. Recently, deep reinforcement learning [111] [112] [113] has become the focus of considerable interest in the field of machine learning and has shown an excellent potential and great performance in various domains of computer vision such as autonomous driving [114] [115], object tracking [116] [117], person re-identification [118] [119], etc.

Deep reinforcement learning combines deep neural networks with a reinforcement learning architecture, where intelligent machines can learn from their actions similar to the way humans learn from experience. Reinforcement learning enables software-defined agents to learn from the environment on the basis of random exploration and to adjust the best possible actions based on continuous feedback in order to attain their goals. Actions that get them to the target outcome are rewarded (i.e., exploitation) [120]. Formally, it consists of a finite number of states which represent agents and the environment, actions realized by the agent, probability of moving from one state to another on the basis of action , and reward corresponded to the move to the next state with action . To predict the best action as given by the function , balancing and maximizing the current reward and future reward is necessary. Where in the equation denotes a fixed discount factor. Hence, this function is represented as the summation of current reward and future reward in the following way [120]:


Reinforcement learning is particularly dedicated to solve problems consisting of both short-term and long-term rewards, for example, games such as go and chess, etc. However, combining reinforcement learning and deep network architecture together yields deep reinforcement learning (DRL), which extends the use of reinforcement learning to robustly solve more difficult games and other challenging problems [121]. Deep reinforcement learning not only provides rich representations characterized by a higher number of hidden layers of deep networks, but also, presents a reinforcement learning-based Q-learning algorithm 333Q-learning is a reinforcement learning algorithm required to find an optimal action-selection strategy to maximize the sum of the discounted rewards. [124]. that maximizes the reward for actions taken by the agent [121]. Fu et al. [122] have shown the feasibility of using deep reinforcement learning for remote sensing ship detection task. Recently, Li et al. [123] have proposed an interesting aircraft detection framework based on the combination of a CNN model with reinforcement learning. Similarly, the change detection process can be solved as an action-decision problem based on a sequence of actions refining the size of the changed regions between two input images.

6.2 Weakly supervised change detection

Considering the high cost of the data labeling operation, in many computer vision tasks, it is hard to get strong supervision information, (e.g., a dataset with fully ground-truth labels) [125]. Notably, in remote sensing images, the manual annotation of objects is generally expensive and sometimes unreliable. Particularly for the change detection task, the changed regions are very small, the background is often cluttered and complex, and the images may be taken by different sensors [126]. However, training a change detection framework based on weakly supervised learning (WSL) can alleviate the need for manual annotation. Weakly supervised data include a small quantity of accurate label information, that differs from data in traditional supervised learning [127]. In general, there are three classes of weak supervision [125]:

  • Incomplete supervision when a minimum quantity data (among the training data) is provided with labels, which is inadequate to successfully train a learner.

  • Inexact supervision is when some supervision information is available, however, not as accurate as required (i.e., only coarse-grained label information is provided).

  • Inaccurate supervision relates the case in which the outlined labels are not really ground-truth and suffer from errors (i.e., learning with label noise).

Recent progress in the geospatial object detection field [128] [129] has shown the feasibility of using weakly supervised learning. Similarly, it will be interesting to explore the potential of WSL-based change detection models accurately for identifying the changed regions between two images. However, the performance of existing WSL-based methods in remote sensing images is still far from satisfactory. For example, accurate position of the change cannot be yielded in detection of building changes [130]. Much effort also needs to be made to establish more efficient methods to improve the detection accuracy [126].

7 Conclusion

Recently, deep learning-based change detection in remote sensing field has drawn significant attention and obtained good performances. Deep learning based methods can automatically learn complex features of remote sensing images on the basis of a huge number of hierarchical layers, in contrast to traditional hand-crafted feature-based methods. In this work, publications related to DL in remote sensing images were systematically analyzed through a metaanalysis. In addition, a deeper review was conducted to describe and discuss the use of DL algorithms specifically in the field of change detection, which differentiates our study from previous reviews on DL and remote sensing. Thus, several deep models that are often used for change detection are described. In addition, we concentrate on deep learning-based change detection approaches for remote sensing images by providing a general overview of the existing methods. Specifically, these deep learning-based methods were divided into three groups; fully supervised learning-based methods, fully unsupervised learning-based methods and transfer learning-based methods. Besides, we have also proposed two promising future research directions. Therefore, a further study with more focus on deep reinforcement learning and weakly supervised change detection methods are strongly suggested.


  • [1] L. Yann, B. Yoshua, and H. Geoffrey, “Deep learning,” Nature, vol. 521, pp. 436 – 444, 2015.
  • [2]

    I. Rish, “An empirical study of the naive bayes classifier,” in

    IJCAI 2001 workshop on empirical methods in artificial intelligence

    , vol. 3, no. 22.   IBM New York, 2001, pp. 41–46.
  • [3] D. D. Lewis, “Naive (bayes) at forty: The independence assumption in information retrieval,” in Machine Learning: ECML-98, C. Nédellec and C. Rouveirol, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 4–15.
  • [4] J. Suykens, L. Lukas, P. V. Dooren, B. D. Moor, and J. Vandewalle, “Least squares support vector machine classifiers: a large scale algorithm,” 1999.
  • [5] G. Cauwenberghs and T. A. Poggio, “Incremental and decremental support vector machine learning,” in NIPS, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds.   MIT Press, 2000, pp. 409–415.
  • [6] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
  • [7] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random forests for land cover classification,” Pattern Recognition Letters, vol. 27, no. 4, pp. 294 – 300, 2006.
  • [8] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991.
  • [9] M. Friedl and C. Brodley, “Decision tree classification of land cover from remotely sensed data,” Remote Sensing of Environment, vol. 61, no. 3, pp. 399 – 409, 1997.
  • [10] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational Intelligence and Neuroscience, vol. 2018, pp. 1–13, 02 2018.
  • [11] L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: an overview,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8599–8603.
  • [12] H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, and R. Ward, “Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 694–707, 2016.
  • [13] A. SINGH, “Review article digital change detection techniques using remotely-sensed data,” International Journal of Remote Sensing, vol. 10, no. 6, pp. 989–1003, 1989.
  • [14] M. Gong, H. Yang, and P. Zhang, “Feature learning and change feature classification based on deep learning for ternary change detection in sar images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 129, pp. 212 – 225, 2017.
  • [15] R. Liu, D. Jiang, L. Zhang, and Z. Zhang, “Deep depthwise separable convolutional network for change detection in optical aerial images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 1109–1118, 2020.
  • [16] F. Bovolo and L. Bruzzone, “A split-based approach to unsupervised change detection in large-size multitemporal images: Application to tsunami-damage assessment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 6, pp. 1658–1670, 2007.
  • [17] P. Coppin, I. Jonckheere, K. Nackaerts, B. Muys, and E. Lambin, “Digital change detection methods in ecosystem monitoring: a review,” International Journal of Remote Sensing, vol. 25, no. 9, pp. 1565–1596, 2004.
  • [18] J. Feranec, G. Hazeu, S. Christensen, and G. Jaffrain, “Corine land cover change detection in europe (case studies of the netherlands and slovakia),” Land Use Policy, vol. 24, no. 1, pp. 234 – 247, 2007.
  • [19] C. M. Viana, S. Oliveira, S. C. Oliveira, and J. Rocha, “29 - land use/land cover change detection and urban sprawl analysis,” in Spatial Modeling in GIS and R for Earth and Environmental Sciences, H. R. Pourghasemi and C. Gokceoglu, Eds.   Elsevier, 2019, pp. 621 – 651.
  • [20] “Deep learning in remote sensing applications: A meta-analysis and review,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 152, pp. 166 – 177, 2019.
  • [21] X. X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and F. Fraundorfer, “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017.
  • [22] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geoscience and Remote Sensing Magazine, vol. 4, no. 2, pp. 22–40, 2016.
  • [23] S. Liu, L. Bruzzone, F. Bovolo, and P. Du, “Hierarchical unsupervised change detection in multitemporal hyperspectral images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 1, pp. 244–260, 2015.
  • [24] G. Yang, H. Li, W. Wang, W. Yang, and W. J. Emery, “Unsupervised change detection based on a unified framework for weighted collaborative representation with rddl and fuzzy clustering,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 11, pp. 8890–8903, 2019.
  • [25] R. Liu, Z. Cheng, L. Zhang, and J. Li, “Remote sensing image change detection based on information transmission and attention mechanism,” IEEE Access, vol. 7, pp. 156 349–156 359, 2019.
  • [26] K. L. de Jong and A. S. Bosman, “Unsupervised change detection in satellite images using convolutional neural networks,” CoRR, vol. abs/1812.05815, 2018.
  • [27] N. Kadhim, M. Mourshed, and M. Bray, “Advances in remote sensing applications for urban sustainability,” Euro-Mediterranean Journal for Environmental Integration, vol. 1, no. 7, 2016.
  • [28]

    Z. Dianjun and Z. Guoqing, “Estimation of soil moisture from optical and thermal remote sensing: A review,”

    Sensors (Basel), vol. 16, no. 8, 2016.
  • [29] J. Sublime and E. Kalinicheva, “Automatic post-disaster damage mapping using deep-learning techniques for change detection: Case study of the tohoku tsunami,” Remote Sensing, vol. 11, no. 9, p. 1123, May 2019.
  • [30] M. Kolos, A. Marin, A. Artemov, and E. Burnaev, “Procedural synthesis of remote sensing images for robust change detection with neural networks,” in Advances in Neural Networks – ISNN 2019, H. Lu, H. Tang, and Z. Wang, Eds.   Cham: Springer International Publishing, 2019, pp. 371–387.
  • [31] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.   The MIT Press, 2016.
  • [32] R. D. Reed and R. J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks.   MIT Press, 1999.
  • [33] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Biophysics, vol. 5, pp. 115–133, 1943.
  • [34] D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory.   New York: Wiley, 1949.
  • [35] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review, vol. 65, no. 6, pp. 386–408, 1958.
  • [36] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-propagating Errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
  • [37] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Neurocomputing: Foundations of research,” J. A. Anderson and E. Rosenfeld, Eds.   Cambridge, MA, USA: MIT Press, 1988, ch. Learning Representations by Back-propagating Errors, pp. 696–699.
  • [38] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504 – 507, 2006.
  • [39] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards ai,” in Large Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, Eds.   Cambridge, MA: MIT Press, 2007.
  • [40] A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14–22, 2012.
  • [41] A. Mohamed, T. N. Sainath, G. Dahl, B. Ramabhadran, G. E. Hinton, and M. A. Picheny, “Deep belief networks using discriminative features for phone recognition,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 5060–5063.
  • [42] S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep learning for hyperspectral image classification: An overview,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 9, pp. 6690–6709, 2019.
  • [43] C. Buckner and J. Garson, “Connectionism,” E. N. Zalta (Ed.), The stanford encyclopedia of philosophy (Fall 2019). Stanford: Metaphysics Research Lab, Stanford University., 2019. [Online]. Available:
  • [44]

    J. Zabalza, J. Ren, J. Zheng, H. Zhao, C. Qing, Z. Yang, P. Du, and S. Marshall, “Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging,”

    Neurocomputing, vol. 185, pp. 1 – 10, 2016.
  • [45] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, pp. 541–551, 1989.
  • [46] L. Zhang, F. Yang, Y. Daniel Zhang, and Y. J. Zhu, “Road crack detection using deep convolutional neural network,” in 2016 IEEE International Conference on Image Processing (ICIP), Sep. 2016, pp. 3708–3712.
  • [47] Y. Rikiya, D. R. K. Gian, and T. Kaori, “Convolutional neural networks: an overview and application in radiology,” Insights into Imaging, vol. 9, pp. 115–133, 2018.
  • [48] Y. Ito, “Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory,” Neural Networks, vol. 4, no. 3, pp. 385 – 394, 1991.
  • [49] G. A. Anastassiou, “Univariate hyperbolic tangent neural network approximation,” Mathematical and Computer Modelling, vol. 53, no. 5, pp. 1111 – 1132, 2011.
  • [50] F. Agostinelli, M. D. Hoffman, P. J. Sadowski, and P. Baldi, “Learning activation functions to improve deep neural networks,” CoRR, vol. abs/1412.6830, 2014.
  • [51] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” 2015.
  • [52] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proceedings of International Computer Vision and Pattern Recognition (CVPR 2014), 2014.
  • [53] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds.   Curran Associates, Inc., 2012, pp. 1097–1105.
  • [54] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
  • [55] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
  • [56] J. Jordan, “Common architectures in convolutional neural networks,” in, 2018.
  • [57] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” 2015.
  • [58] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., ser. Morgan Kaufmann Series in Data Management Systems.   Amsterdam: Morgan Kaufmann, 2011.
  • [59] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [60] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014.
  • [61] C. Junyoung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” in NIPS 2014 Workshop on Deep Learning, December 2014, 2014.
  • [62] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.   Curran Associates, Inc., 2014, pp. 2672–2680.
  • [63] R. Jaturapitpornchai, M. Matsuoka, N. Kanemoto, S. Kuzuoka, R. Ito, and R. Nakamura, “Newly built construction detection in sar images using deep learning,” Remote Sensing, vol. 11, no. 12, pp. 1–24, Jun 2019.
  • [64] M. Yang, L. Jiao, F. Liu, B. Hou, and S. Yang, “Transferred deep learning-based change detection in remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 9, pp. 6960–6973, Sep. 2019.
  • [65] M. Gong, J. Zhao, J. Liu, Q. Miao, and L. Jiao, “Change detection in synthetic aperture radar images based on deep neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 1, pp. 125–138, 2016.
  • [66] H. Lyu, H. Lu, and L. Mou, “Learning a transferable change rule from a recurrent neural network for land cover change detection,” Remote Sensing, vol. 8, no. 6, p. 506, Jun 2016.
  • [67] P. Zhang, M. Gong, L. Su, J. Liu, and Z. Li, “Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 116, pp. 24 – 41, 2016.
  • [68]

    Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised learning,” in

    ICML ’07, 2007.
  • [69] Y.-A. Chung, H.-Y. Lee, and J. Glass, “Supervised and unsupervised transfer learning for question answering,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1.   New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 1585–1594.
  • [70] Y. Bengio, “Deep learning of representations: Looking forward,” in Statistical Language and Speech Processing, A.-H. Dediu, C. Martín-Vide, R. Mitkov, and B. Truthe, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 1–37.
  • [71] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, “Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?” in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2015, pp. 44–51.
  • [72] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds.   Cham: Springer International Publishing, 2015, pp. 234–241.
  • [73] Z. M. Hamdi, M. Brandmeier, and C. Straub, “Forest Damage Assessment Using Deep Learning on High Resolution Remote Sensing Data,” Remote Sensing, vol. 11, no. 17, SEP 1 2019.
  • [74] D. Peng, Y. Zhang, and H. Guan, “End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet plus,” Remote Sensing, vol. 11, no. 11, JUN 1 2019.
  • [75] Q. Wang, Z. Yuan, Q. Du, and X. Li, “GETNET: A General End-to-End 2-D CNN Framework for Hyperspectral Image Change Detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 57, no. 1, pp. 3–13, JAN 2019.
  • [76] W. A. Malila, “Change vector analysis: An approach for detecting forest changes with landsat,” in LARS Symp., 1980.
  • [77] W. Wiratama, J. Lee, S.-E. Park, and D. Sim, “Dual-Dense Convolution Network for Change Detection of High-Resolution Panchromatic Imagery,” Applied Sciences-Basel, vol. 8, no. 10, OCT 2018.
  • [78] C. Zhang, S. Wei, S. Ji, and M. Lu, “Detecting Large-Scale Urban Land Cover Changes from Very High Resolution Remote Sensing Images Using CNN-Based Classification,” ISPRS International Journal of Geo-Information, vol. 8, no. 4, APR 2019.
  • [79] R. C. Daudt, B. L. Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Computer Vision and Image Understanding, vol. 187, p. 102783, 2019.
  • [80] W. Zhang and X. Lu, “The Spectral-Spatial Joint Learning for Change Detection in Multispectral Imagery,” Remote Sensing, vol. 11, no. 3, FEB 1 2019.
  • [81] Z. Zhang, G. Vosselman, M. Gerke, D. Tuia, and M. Y. Yang, “Change detection between multimodal remote sensing data using siamese CNN,” CoRR, vol. abs/1807.09562, 2018.
  • [82] X. Cao, Y. Ji, L. Wang, B. Ji, L. Jiao, and J. Han, “Sar image change detection based on deep denoising and cnn,” IET Image Processing, vol. 13, no. 9, pp. 1509–1515, 2019.
  • [83] W. Wiratama and D. Sim, “Fusion Network for Change Detection of High-Resolution Panchromatic Imagery,” Applied Sciences-Basel, vol. 9, no. 7, APR 1 2019.
  • [84] C. Cao, S. Dragicevic, and S. Li, “Land-use change detection with convolutional neural network methods,” Environments, vol. 6, 2019.
  • [85] N. Lv, C. Chen, T. Qiu, and A. K. Sangaiah, “Deep learning and superpixel feature extraction based on contractive autoencoder for change detection in sar images,” IEEE Transactions on Industrial Informatics, vol. 14, no. 12, pp. 5530–5538, Dec 2018.
  • [86] H. Zhang, M. Gong, P. Zhang, L. Su, and J. Shi, “Feature-level change detection using deep representation and feature change analysis for multispectral imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 11, pp. 1666–1670, Nov 2016.
  • [87] L. Su, M. Gong, P. Zhang, M. Zhang, J. Liu, and H. Yang, “Deep learning and mapping based ternary change detection for information unbalanced images,” Pattern Recognition, vol. 66, pp. 213 – 228, 2017.
  • [88] F. Gao, X. Liu, J. Dong, G. Zhong, and M. Jian, “Change detection in sar images based on deep semi-nmf and svd networks,” Remote Sensing, vol. 9, p. 435, 05 2017.
  • [89] Y. Wang and Y. Zhang, “Nonnegative matrix factorization: A comprehensive review,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1336–1353, 2013.
  • [90] J. Xue, J. Li, and Y. Gong, “Restructuring of deep neural network acoustic models with singular value decomposition.” in INTERSPEECH, F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino, and P. Perrier, Eds.   ISCA, 2013, pp. 2365–2369.
  • [91] M. Gong, Y. Yang, T. Zhan, X. Niu, and S. Li, “A generative discriminatory classified network for change detection in multispectral imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 1, pp. 321–333, Jan 2019.
  • [92] J. Geng, X. Ma, X. Zhou, and H. Wang, “Saliency-guided deep neural networks for sar image change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7365–7377, 2019.
  • [93] A. B. Geva, “Hierarchical unsupervised fuzzy clustering,” IEEE Transactions on Fuzzy Systems, vol. 7, no. 6, pp. 723–733, 1999.
  • [94] X. Li, Z. Yuan, and Q. Wang, “Unsupervised deep noise modeling for hyperspectral image change detection,” Remote Sensing, vol. 11, no. 3, 2019.
  • [95] F. Huang, Y. Yu, and T. Feng, “Automatic building change image quality assessment in high resolution remote sensing based on deep learning,” Journal of Visual Communication and Image Representation, vol. 63, p. 102585, 2019.
  • [96] S. Suresh, R. V. Babu, and H. J. Kim, “No-reference image quality assessment using modified extreme learning machine classifier,” Applied Soft Computing, vol. 9, no. 2, pp. 541 – 552, 2009.
  • [97] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
  • [98] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, 2016.
  • [99] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” 2018.
  • [100] A. Abdalla, H. Cen, L. Wan, R. Rashid, H. Weng, W. Zhou, and Y. He, “Fine-tuning convolutional neural network with transfer learning for semantic segmentation of ground-level oilseed rape images in a field with high weed pressure,” Computers and Electronics in Agriculture, vol. 167, p. 105091, 2019.
  • [101] B. Hou, Y. Wang, and Q. Liu, “Change detection based on deep features and low rank,” IEEE Geoscience and Remote Sensing Letters, vol. PP, pp. 1–5, 11 2017.
  • [102] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
  • [103] G. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, L. Zhang, and X. Lu, “Aid: A benchmark data set for performance evaluation of aerial scene classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 7, pp. 3965–3981, 2017.
  • [104] N. Venugopal, “Sample selection based change detection with dilated network learning in remote sensing images,” Sensing and Imaging, vol. 20, 12 2019.
  • [105] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
  • [106] M. Zhang, G. Xu, K. Chen, M. Yan, and X. Sun, “Triplet-based semantic relation learning for aerial remote sensing image change detection,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 2, pp. 266–270, Feb 2019.
  • [107] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” CoRR, vol. abs/1606.00915, 2016.
  • [108] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
  • [109] B. Fang, L. Pan, and R. Kou, “Dual learning-based siamese framework for change detection using bi-temporal vhr optical remote sensing images,” Remote Sensing, vol. 11, p. 1292, 2019.
  • [110] Y. Gao, F. Gao, J. Dong, and S. Wang, “Transferred deep learning for sea ice change detection from synthetic-aperture radar images,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 10, pp. 1655–1659, Oct 2019.
  • [111] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015.
  • [112] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 3357–3364.
  • [113] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Transactions on Cybernetics, pp. 1–14, 2020.
  • [114] C. Hoel, K. Driggs-Campbell, K. Wolff, L. Laine, and M. Kochenderfer, “Combining planning and deep reinforcement learning in tactical decision making for autonomous driving,” IEEE Transactions on Intelligent Vehicles, pp. 1–12, 2019.
  • [115] Y. Dai, D. Xu, K. Zhang, S. Maharjan, and Y. Zhang, “Deep reinforcement learning and permissioned blockchain for content caching in vehicular edge computing and networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4312–4324, 2020.
  • [116] Z. Teng, B. Zhang, and J. Fan, “Three-step action search networks with deep q-learning for real-time object tracking,” Pattern Recognition, vol. 101, pp. 1–11, 2020.
  • [117] W. Luo, P. Sun, F. Zhong, W. Liu, T. Zhang, and Y. Wang, “End-to-end active object tracking and its real-world deployment via reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–14, 2019.
  • [118]

    Z. Liu, J. Wang, S. Gong, H. Lu, and D. Tao, “Deep reinforcement active learning for human-in-the-loop person re-identification,” in

    The IEEE International Conference on Computer Vision (ICCV), October 2019.
  • [119] W. Zhang, X. He, W. Lu, H. Qiao, and Y. Li, “Feature aggregation with reinforcement learning for video-based person re-identification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3847–3852, 2019.
  • [120] A. Shrestha and A. Mahmood, “Review of deep learning algorithms and architectures,” IEEE Access, vol. 7, pp. 53 040–53 065, 2019.
  • [121] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48.   New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1928–1937.
  • [122] K. Fu, Y. D. Li, H. Sun, X. Yang, G. Xu, Y. Li, and X. Sun, “A ship rotation detection model in remote sensing images based on feature fusion pyramid network and deep reinforcement learning,” Remote Sensing, vol. 10, pp. 1–26, 2018.
  • [123] Y. Li, K. Fu, H. Sun, and X. Sun, “An aircraft detection framework based on reinforcement learning and convolutional neural networks in remote sensing images,” Remote Sensing, vol. 10, pp. 1–17, 2018.
  • [124] J. Peng and R. J. Williams, “Incremental multi-step q-learning,” in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirsh, Eds.   San Francisco (CA): Morgan Kaufmann, 1994, pp. 226 – 232.
  • [125] Z.-H. Zhou, “A brief introduction to weakly supervised learning,” National Science Review, vol. 5, no. 1, pp. 44–53, 08 2017.
  • [126] G. Cheng and J. Han, “A survey on object detection in optical remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 117, pp. 11 – 28, 2016.
  • [127] Y. Li, L. Guo, and Z. Zhou, “Towards safe weakly supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–14, 2019.
  • [128] P. Zhou, G. Cheng, Z. Liu, S. Bu, and X. Hu, “Weakly supervised target detection in remote sensing images based on transferred deep features and negative bootstrapping,” Multidimensional Systems and Signal Processing, vol. 27, pp. 925–944, 2016.
  • [129] P. Zhou, D. Zhang, G. Cheng, and J. Han, “Negative bootstrapping for weakly supervised target detection in remote sensing images,” in 2015 IEEE International Conference on Multimedia Big Data, 2015, pp. 318–323.
  • [130] H. Jiang, X. Hu, K. Li, J. Zhang, J. Gong, and M. Zhang, “Pga-siamnet: Pyramid feature-based attention-guided siamese network for remote sensing orthoimagery building change detection,” Remote Sensing, vol. 12, no. 3, 2020.