The Earth surface is constantly evolving. Real-time and accurate access to the surface changes is of great importance for the better understanding of human activities, ecosystem and their interactions [Lu2004]. Multi-temporal remote sensing images can reveal the dynamic changes in the surface. Therefore, change detection based on multi-temporal images has become more and more significant. Change detection (CD) is the process of identifying differences in the state of an object or phenomenon by observing it at different time [Singh1989], which has played an important role in a large number of applications, to name a few, land-use and land-cover change analysis, resource management, ecosystem monitoring, and disaster assessment [Xian2009, Coppin2004, Luo2018, Sim2006, Zelinski2014, 2014Li, Brunner2010, Wu2017b, Doroodgar2014]. Recently, owing to the development of Earth observation technology, very-high-resolution (VHR) images are available by more and more satellite sensors (e.g. SPOT, IKONOS, QuickBird, and GaoFen). The VHR images can provide abundant surface details and spatial distribution information, which makes it possible to detect subtler changes. Now, CD in multi-temporal VHR images has caught more and more attention [Bovolo2009, Somasundaram2010, Chen2012, Lei2014, Huo2016, Tan2016, Zhan2017, Lv2018a, Daudt2018, Security2018, Wu2017a].
Pixel-based change detection (PBCD) model is the earliest CD method, which has been deeply studied and widely used [Singh1989, Sharma2007, Bovolo2012, Thonfeld2016, Deng2008, Nielsen1997, Nielsen2007, Wu2014]
. Change vector analysis (CVA) [Sharma2007]
is a classic CD model, which is designed to calculate change intensity and change direction for binary and multi-class CD. Based on CVA, many extension models are proposed, such as compressed CVA (
, robust CVA (RCVA) [Thonfeld2016]
and so on. Principal component analysis (PCA) is a transformation method, which transforms the difference images or stacked images into a new feature space and selects a part of principal components for CD [Deng2008]
. In [Nielsen1997]
, Nilsen et al propose multivariate alteration detection (MAD) that aims to maximize the variance of transformed variables and is invariant to affine transformation. Based on the theory of slow feature analysis (SFA), Wu et al [Wu2014]
propose a novel CD model that is able to extract the most invariant components from multi-temporal images, and transform original images into a new feature space, where the changed pixels are highlighted and unchanged ones are suppressed. Though these PBCD models are effective in their respective application scenarios, they only utilize spectral information of multi-temporal images due to the limitation of models themselves. However, the increase in the spatial resolution of VHR images limits the spectral resolution and brings the spectral variability, which makes CD results of PBCD models suffered from the salt-and-pepper noise and internal fragmentation [Huo2016, Lv2018a, Li2017b]
. Therefore, it is greatly important to employ the spatial context information and spectral information in VHR images CD.
For the purpose of exploring the spatial context information in VHR images, two major and conventional approaches, object-based change detection (OBCD) model and spatial feature extraction method are developed. In OBCD, the image object consisting of pixels with similar spectral signatures becomes the basic unit of CD [Hussain2013, Gil-Yepes2016, Desclee2006, Somasundaram2010]
. To get the image objects, the first step of OBCD is image segmentation, which divides the images into multiple homogeneous regions. After obtaining the image objects, some representative features of objects, such as shape index, texture index, and mean and standard deviation of each band, are extracted for change detection. The second approach is the spatial feature extraction methods. In these methods, spatial context information is extracted from the local area of each pixel In [Lu2017, Lei2014, Hoberg2015, Zhou2016]
. In [Lei2014]
, a CD method based on texton forest is proposed to capture spatial context information in VHR images. Integrating macro- and micro-texture features with random forest and fuzzy set model, Li et al [Li2017c]
propose a multi-texture CD method to extract spatial features for CD. Hoberg el at [Hoberg2015]
introduce conditional random field (CRF) model into CD of multi-temporal VRH images to model spatial information in VHR images and much research is conducted after that [Zhou2016, Lv2016, Cao2016, Lv2018a]
. Nonetheless, in both aforementioned approaches, only low-level features are utilized and many of them are hand-crafted, which are insufficient for representing the key information of original data and coping with complex ground situations in VHR images. In addition, the performance of OBCD depends on the effects of segmentation, but sometimes it is difficult to select the appropriate segmentation parameters.
Last few years, deep learning (DL) has achieved spectacular performance in the field of computer vision and remote sensing image interpretation [Lecun2015, Zhang2016b, Zhu2017a]
. Different from conventional methods, DL models have the capability to extract representative high-level features from VHR images. Therefore, a variety of models based DL are proposed for CD in multi-temporal VHR images. In [Gong2016]
, a CD method based on deep neural network (DNN) is proposed for CD in SAR images. In [Zhan2017]
, Zhan et al specifically design a deep siamese convolutional network for multi-temporal aerial images, which extracts spatial-spectral features by two weight-shared branches. Lyu et al [Lyu2016]
adopt recurrent neural network (RNN) to tackle the temporal connection between multi-temporal images. Going one step further, in [Mou2019]
, a CD architecture based on recurrent convolutional neural network is proposed to extract unified features for binary and multi-class CD. Combining a pre-trained deep convolutional neural network with CVA, Saha et al [Saha2019]
design a CD method called deep CVA for VHR images CD. In [CayeDaudt2018]
, Daudt et al first introduce fully convolutional network (FCN) into CD and propose two siamese extensions of FCN, which achieve good performance in two open VHR images CD datasets. Though these methods based on DL models achieve good performance in CD, the training process of DL models is in supervised learning fashion with annotated data. And it is undeniable that the manual selection of annotated samples is labor-consuming, especially for remote sensing data.
Therefore, several unsupervised feature extraction models, including restricted Boltzmann machines (RBMs) [hinton2006]
and auto-encoder (AE) [Bengio2007]
, have been adopted to solve this problem. However, these models flatten the image patches into vectors, ignoring the property of image in the spatial domain, due to equipped with fully connected layers [Mou2019]
. Another way is cooperating deep learning models with unsupervised pre-detection algorithms. Gao et al [Gao2016]
present a CD method based on PCANet [Chan2015]
for multi-temporal SAR images, and a pre-detection algorithm based on Gabor wavelets and fuzzy c-means is utilized to select interested pixels with a high probability of being changed or unchanged. Then the network is trained on the samples selected by the automatic pre-detection algorithm. Learning nonlinear features with DNN and highlighting changes via SFA, Ru et al [Du2019a]
propose an unsupervised deep slow feature analysis (DSFA) model for CD. For training the DNN of DSFA, a pre-detection method based on CVA is utilized to selecting samples. In [Li2019a]
, a supervised spatial fuzzy clustering is adopted to produce pseudo-labels for training the DCNN. This approach solves the sample problem of the DL model to a certain degree. However, if the pre-detection algorithm does not perform well on one data set, the performance of DL model is also damaged. What’s more, most of these existing DL-based methods are merely focus on binary CD. And there are currently only a few methods [Saha2019, Zhang2019]
that can be used for unsupervised multi-class CD.
Considering the above issues comprehensively, in this paper, we utilize the unsupervised subspace learning algorithms, kernel principal component analysis (KPCA) [Bernhard1998], to develop a novel feature extraction model called KPCA convolution to extract representative spatial-spectral features from VHR images in a totally unsupervised manner. Based on the KPCA convolution, a powerful and general network called KPCA-MNet is designed for unsupervised binary and multi-class CD. First, the high-level spatial-spectral feature maps are extracted by KPCA-MNet with deep siamese network architecture. Then, pixel-wise subtraction is implemented to get the feature difference map. To efficiently utilizing the change information in the feature difference map, KPCA-MNet maps the feature difference map into a 2-D polar domain. Finally, the unsupervised threshold segmentation methods or clustering techniques would be performed to get the desired CD results.
The rest of this paper is organized as follows. In section II, the background of KPCA and CNN are introduced. Section III elaborates the proposed KPCA convolution and KPCA-MNet. Section IV provides experimental settings, experimental results and discussion. In section V, the experiment of multi-class CD is carried. Finally, Section VI draws the conclusion of our work in this paper.