Deep Siamese Domain Adaptation Convolutional Neural Network for Cross-domain Change Detection in Multispectral Images

04/13/2020 ∙ by Hongruixuan Chen, et al. ∙ Wuhan University 2

Recently, deep learning has achieved promising performance in the change detection task. However, the deep models are task-specific and data set bias often exists, thus it is difficult to transfer a network trained on one multi-temporal data set (source domain) to another multi-temporal data set with very limited (even no) labeled data (target domain). In this paper, we propose a novel deep siamese domain adaptation convolutional neural network (DSDANet) architecture for cross-domain change detection. In DSDANet, a siamese convolutional neural network first extracts spatial-spectral features from multi-temporal images. Then, through multiple kernel maximum mean discrepancy (MK-MMD), the learned feature representation is embedded into a reproducing kernel Hilbert space (RKHS), in which the distribution of two domains can be explicitly matched. By optimizing the network parameters and kernel coefficients with the source labeled data and target unlabeled data, the DSDANet can learn transferrable feature representation that can bridge the discrepancy between two domains. To the best of our knowledge, it is the first time that such a domain adaptation-based deep network is proposed for change detection. The theoretical analysis and experimental results demonstrate the effectiveness and potential of the proposed method.



There are no comments yet.


page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Change detection (CD) is one of the most widely used interpretation techniques in the field of remote sensing, and has been intensively studied in previous years [Singh1989]

. Nonetheless, most traditional CD models only explore low-level features in multispectral images, which are insufficient for representing the key information of original images. Recently, deep learning (DL) has been shown to be very promising in the field of computer vision and remote sensing images interpretation. Hence, a number of CD methods based on DL models are developed.

[Zhu2017, Chen2019].

However, the training process of these DL-based CD methods requires a lot of labeled data and there is no denying that the manual selection of labeled data is labor-consuming, especially for remote sensing images. Besides, deep networks are often task-specific, in other words, they have a relatively weak generalization. And due to several factors, including noise and distortions, sensor characteristics, imaging conditions, the data distributions of different CD data sets are often quite dissimilar. Thus, if we train a deep network on one multi-temporal data set with abundant labeled samples, it would suffer degraded performance after we transfer it to a new multi-temporal data set, which makes it unavoidable to manually label numerous samples in the new data set. Nowadays, there are massive amounts of remote sensing images are available by satellite sensors, these images can provide diverse and abundant information for covered regions. Therefore, it is incentive to develop an efficient CD model that is trained on a data set (source domain) with enough labeled data but can be easily transferred to a new data set (target domain) with very limited (even no) labeled data. This can be defined as a domain adaption problem in change detection area.

Considering the above issues comprehensively, in this paper, a novel deep network architecture called DSDANet is proposed for cross-domain CD. By incorporating a domain discrepancy metric MK-MMD into the network architecture, the DSDANet can learn transferrable features, where the distribution of two domains would be similar. To the best of authors’ knowledge, it is the first time that such a deep network based on domain adaptation is designed for CD in multispectral images.

2 Methodology

2.1 Mk-Mmd

Caused by plenty of factors, the probability distributions characterizing source domain

and target domain are dissimilar. And due to only limited (or no) labeled data in target domain available, it is challenging to construct a model that can match these two domains and learn transferable representation. An efficient and common way is combining the CD errors with a domain discrepancy metric.

A widely used metric is the maximum mean discrepancy (MMD). MMD is a nonparametric kernel-based metric that measures the distance between two distributions in a RKHS. And when the distributions of two domains tend to be the same and the RKHS is universal, MMD would approach zero.

Figure 1: Overview of the CD architecture based on the proposed DSDANet.

Nonetheless, it is difficult to find an optimal RKHS and the representation ability of single kernel is limited. And it is reasonable to assume that the optimal RKHS can be expressed as the linear combination of single kernels, thus the multi-kernel variant of MMD entitled MK-MMD [Gretton2012] is introduced.

Considering a source data set and a target data set , the formulation of MK-MMD is defined as


where is the RKHS norm, is the feature map induced by multi-kernel , which is defined as the linear combination of positive semi-definite kernels


where each is associated uniquely with an RKHS , and we assume the kernels are bounded. Owing to leveraging diverse kernels, the representation ability of MK-MMD can get improvement.

If the network can learn a domain-invariant representation that minimizes the MK-MMD between two domains, it can be easily transferred to the target domain with sparsely labeled data.

2.2 Network Architecture

Introduced MK-MMD for domain adaptation, the structure of the proposed DSDANet is shown in Fig. 1. Given a source data set with enough labeled data and a target domain without labels, is an image patch centered -th pixel and is the corresponding label of -th pixel. For each image patch-pair in both domains, the spatial-spectral features and

are extracted by cascade convolutional layers and max-pooling layers.

After that, the absolute value of multi-temporal spatial-spectral features’ difference is calculated. Since the two branches of DSDANet are weight-shared, the change information could be highlighted through this operation.

As we all konw, deep features learned by CNN transition from general to specific by the network going deeper. Especially for the last few fully connected (FC) layers, there exists an insurmountable transferability gap between features learned from different domains. If we train a network in the source domain, it cannot be transferred to the target domain via fine-tuning with sparse target labeled data. Therefore, the MK-MMD is adopted to make the network learn domain-invariant features from two domains. An intuitive idea is combining MK-MMD with the penultimate FC layer, which can directly make the classifier adaptive to two domains. But considering a single layer may not cope with domain distribution bias, thus the MK-MMD is embedded into the two FC layers in front of the classifier. Since we aim to construct a network that is trained on the source CD data set but also perform well on the target task, thus the loss function of DSDANet is


where is CD loss on the source labeled data, is layer index, means the MK-MMD between the two domain on the features in the -th layer and denotes a domain adaptation penalty parameter.

2.3 Optimization

In the training procedure, two types of parameters require to learn, one is the network parameters and another is the kernel coefficient . However, the cost of MK-MMD computation by kernel trick is

, it is unacceptable for deep networks in large-scale data sets and makes the training procedure more difficult. Therefore, the unbiased estimate of MK-MMD

[Gretton2012] is utilized to decrease the computation cost from to , which can be formulated as


where is a quad-tuple evaluated by multi-kernel and is learned features in -th layer.

As for the kernel parameters , the optimal coefficient for each can be sought by jointly maximizing

itself and minimizing the variance, which results in the optimization


where is estimation variance. Eventually, this optimization finally can be resolved as a quadratic program (QP) [Gretton2012].

By alternatively adopting stochastic gradient descent (SGD) to update

and solving QP to optimize , the DSDANet can gradually learn transferrable representation from source labeled data and target unlabeled data. By minimizing Eq. 3, the marginal distributions and of two domains become very similar, yet the conditional distributions and

of two domains may still be slightly different. Thus, a very small part of target labeled data is selected to fine-tune the classifier of DSDANet. Compared with the enough labeled data in the source domain, the labeled data provided by the target domain is very limited, so this procedure can be treated as a semi-supervised learning fashion.

Figure 2: The WH data set adopted as source domain. In the ground truth, red indicates change and green means non-change.
Figure 3: Two data sets adopted as target domains. (a)-(c) HY data set. (d)-(f) QU data set. In the ground truth, red indicates change and green means non-change.

3 Experiments

3.1 General Information

The data set used as the source domain is WH data set captured by GF-2, as shown in Fig. 2. The size of the two images is 1000 1000 pixels with four spectral bands and they have a spatial resolution of 4m.

The data sets adopted as the target domains are HY data set and QU data set, as shown in Fig. 3. The HY data set was also captured by GF-2 with a size of 1000 1000 pixels. The second target data set was acquired by QuickBird with four spectral bands and a spatial resolution of 2.4m denoted as QU. Both images in this data set are 358 280 pixels. Since the WH and QU were acquired by different sensors leading to diverse spatial resolutions and statistical characteristics, the data distributions of these two data sets are significantly different.

In the training procedure, we randomly select 10 samples (the particular number is 50416) from the source domain as labeled training samples. And we train the DSDANet with labeled source training samples and all target samples without labels. After training, we only select 200 labeled samples from each target domain for fine-tuning the classifier. Compared with the labeled source data, the labeled data provided by the target domain is sparse.

To evaluate the proposed method, we compare it with CVA [Sharma2007] and SVM. To further evaluate the effectiveness of MK-MMD, we compare the DSDANet to its variants that don’t perform domain adaptation, including directly inferring target data without fine-tuning (DSCNet-v1), directly training in the target labeled data instead of training in the source domain (DSCNet-v2) and fine-tuning with target labeled data but not equipped with MK-MMD (DSCNet-v3).

Figure 4: Binary change maps obtained by the proposed method and comparison methods on the WH. (a) CVA. (b) SVM. (c)-(e) Variants of DSDANet. (f) DSDANet.
Figure 5: Binary change maps obtained by the proposed method and comparison methods on the QU. (a) CVA. (b) SVM. (c)-(e) Variants of DSDANet. (f) DSDANet.

3.2 Experimental Results

The binary change maps obtained by different methods on the HY data set are shown in Fig. 4. It can be observed that the proposed model generates the best CD result with more complete changed regions and less noise. For the QU data set, even though the distributions of the two domain are significantly different due to the diverse characteristics of the two sensors, the DSDANet still can generate an accurate binary change map. It implies that through embedding data distributions into the optimal RKHS and minimize the distance between them, the network is capable of learning domain-invariant representation from source labeled data and unlabeled target data and can be easily transferred from one CD data set to another.

The quantitative results are listed in Table 1. Due to only providing very limited target labeled data that cannot contain all the kinds of changed and unchanged land-cover types, fine-tuning without domain adaptation also performs not well. By contrast, the DSDANet achieves the best OA and KC on the two target data set.

Method HY QU
CVA 0.9445 0.7171 0.8079 0.5352
SVM 0.8467 0.4565 0.8381 0.6285
DSCNet-v1 0.8751 0.4310 0.7060 0.1147
DSCNet-v2 0.8759 0.5610 0.8286 0.5404
DSCNet-v3 0.9279 0.6650 0.8297 0.5391
DSDANet 0.9618 0.8021 0.9016 0.7670
Table 1: Accuracy assessment on binary change maps obtained by different methods on the two target data set

4 Conclusion

In this paper, a novel network architecture entitled DSDANet is proposed for cross-domain CD in multispectral images. Through restricting the domain discrepancy with MK-MMD and optimizing the network parameters and kernel coefficient, the DSDANet can learn transferrable representation from source labeled data and target unlabeled data, which can efficiently bridge the discrepancy between two domains. The experimental results in two target data sets demonstrate the effectiveness of the proposed DSDANet in cross-domain CD. Even though the data distributions of the two domains are significantly different, the DSDANet only needs sparse labeled data of the target domain to fine-tune the classifier, which makes it superior in actual production environments.