Cross-Domain Collaborative Learning via Cluster Canonical Correlation Analysis and Random Walker for Hyperspectral Image Classification

08/29/2018 ∙ by Yao Qin, et al. ∙ Southwest Jiaotong University Università di Trento 4

This paper introduces a novel heterogenous domain adaptation (HDA) method for hyperspectral image classification with a limited amount of labeled samples in both domains. The method is achieved in the way of cross-domain collaborative learning (CDCL), which is addressed via cluster canonical correlation analysis (C-CCA) and random walker (RW) algorithms. To be specific, the proposed CDCL method is an iterative process of three main stages, i.e. twice of RW-based pseudolabeling and cross domain learning via C-CCA. Firstly, given the initially labeled target samples as training set (TS), the RW-based pseudolabeling is employed to update TS and extract target clusters (TCs) by fusing the segmentation results obtained by RW and extended RW (ERW) classifiers. Secondly, cross domain learning via C-CCA is applied using labeled source samples and TCs. The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Thirdly, both TS and estimated probability maps are used for updating TS again via RW-based pseudolabeling. When the iterative process finishes, the result obtained by the ERW classifier using the final TS and estimated probability maps is regarded as the final classification map. Experimental results on four real HSIs demonstrate that the proposed method can achieve better performance compared with the state-of-the-art HDA and ERW methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 6

page 7

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Hperspectral images (HSIs) can capture detailed spectral information measured in contiguous bands of the electromagnetic spectrum [1, 2, 3] and have been widely used in various remote sensing applications, such as environmental monitoring [4] and mineral exploration [5]

. One fundamental challenge in these applications is to assign a unique label to each pixel in the image, which is called HSI classification. When the problem is treated as a supervised learning and solved using machine learning methods (including random forest

[6]

, support vector machine (SVM)

[7], laplacian SVM (LapSVM) [8, 9, 10]

, decision trees

[11]

and support tensor machine (STM)

[12]

), a large amount of labeled samples are required due to the high dimensionality of hyperspectral data. This would require extensive and expensive field data collection campaigns. Consequently, only a small quantity of labeled samples are available in most practical applications of HSI classification. In order to solve the problem, several machine learning and feature extraction methods have been widely applied to hyperspectral data

[2]

, such as active learning (AL)

[13, 14, 15, 16, 17]

, semi-supervised learning (SSL)

[18, 19, 15], spectral-spatial classification [20, 21, 22, 23], domain adaptation (DA) [24, 25, 3]

and more recently deep learning based techniques

[26, 27, 28]. In this paper, we focus on applying DA to HSI classification.

According to the machine learning and pattern recognition literature, DA refers to solving the problem of adapting model trained on the source domain to the target domain. When applied to HSI classification, DA aims to generate accurate classification map of target HSI by utilizing the knowledge learned on the source HSI. According to

[24], unsupervised DA refers to the case where there are no labeled samples available in the target domain, whereas semi-supervised DA represents the case where there are few target labeled samples. Further, heterogenous DA (HDA) refers to the dimensions of features in both domains are assumed to be different. Since we assume that a limited amount of labeled samples are available in the target HSI, we therefore focus on semi-supervised HDA for HSI classification. Although there are several HDA methods based on deep learning for visual and remote sensing applications [29, 30, 31], the feature representation ability of deep learning models is strongly dependent on the availability of a large number of training samples [32, 33]. Therefore, it is difficult to obtain a reliable deep learning model with the availability of very few samples in hyperdimensional feature spaces. Given the assumption of the limited number of training samples, in our paper we focus on handcrafted features for HDA.

In the literature of HDA, one of the simplest feature-based approaches is the feature augmentation proposed in [34], which extended versions, called heterogeneous feature augmentation (HFA) and semi-supervised HFA (SHFA), have been recently proposed in [35]. In [36], a robust domain adaptation low-rank reconstruction method is introduced, where a transformed intermediate representation of the samples in the source domain is linearly reconstructed by the target samples. In [37], the authors align domains with canonical correlation analysis (CCA) and then perform change detection. The approach is extended to a kernel and semisupervised version in [38], where the authors perform change detection with different sensors. In [39], the supervised multi-view canonical correlation analysis ensemble is presented to address HDA problems. In [40], the proposed cross-domain landmark selection (CDLS) method is able to learn representative cross-domain landmarks for deriving a proper feature subspace for adaptation and classification purposes. Different from the above feature-based category, several studies employ manifold learning to preserve the original geometry. In [41], the method of domain adaptation using manifold alignment (DAMA) can reuse labeled data from multiple source domains in the target domain even in the case when the input domains do not share any common features or instances. In [42], semi-supervised manifold alignment (SSMA) is proposed, where both domains are matched through manifold alignment while preserving label (dis)similarities and the geometric structures of the single manifold in both domains. Recently, the kernelized manifold alignment (KEMA) has been introduced in [43]. In [31]

, a deep feature alignment neural network is introduced to carry out the domain adaptation, where discriminative features for the source and target domains are extracted using deep convolutional recurrent neural networks and then aligned with each other layer-by-layer. In

[44]

, a kernel-based domain-invariant feature selection method has been proposed for the classification of hyperspectral images, where a novel measure of data shift for evaluating the domain stability is defined.

As stated earlier, it is not feasible to obtain a large amount of labeled target samples in practical applications. On the other hand, if a sufficient number of labeled samples are available in the target HSI, an accurate classification map can be achieved by using newly-developed deep learning methods [45]. Therefore, it is reasonable to assume that only limited labeled samples can be used in the semi-supervised HDA problem. In oder to address the problem and obtain better classification performance, two key problems should be solved, i.e. how to obtain more pseudo-labeled reliable target samples for adaptation and how to achieve better adaptation with these samples.

In this paper, random walker (RW)-based pseudolabeling [46] and cluster canonical correlation analysis (C-CCA) [47] are employed to solve the above two problems, respectively. The RW-based pseudolabeling algorithm has been proved to be effective for high-confidence samples extraction [46], whereas C-CCA uses all pair-wise correspondences within a cluster across the two domains and results in cluster segregation [47]. Fig. 1 illustrates the difference between CCA and C-CCA. It is clear that CCA requires paired samples and can hardly be directly applied when multiple clusters of samples in the source domain correspond to several clusters of samples in the target domain.

In the proposed approach, the two algorithms work in a collaborative manner, i.e. RW-based pseudolabeling is employed to extract target samples with high confidence, whereas C-CCA is employed for cross-domain learning. Then the projected samples are used for RW-based pseudolabeling. Therefore, the proposed method is denoted as cross domain collaborative learning (CDCL). As is shown in Fig. 2, the proposed method is based on an iterative process, consisting of three main components, i.e. RW-based pseudolabeling, cross domain learning via C-CCA and classification using the extended RW (ERW) algorithm. Firstly, given the initially labeled target samples as training set (), the RW-based pseudolabeling is employed to update the and extract target clusters () by fusing the segmentation results obtained by RW and ERW classifiers. Secondly, cross domain learning via C-CCA is applied using labeled source samples and . The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Then, both and estimated probability maps are used for updating again via RW-based pseudolabeling. Finally, when the iterative process converges, the classification map is obtained by the ERW classifier using the final and the estimated probability maps. Comprehensive experiments on four publicly available benchmark HSIs have been conducted to demonstrate the effectiveness of the proposed algorithm.

Fig. 1: Representation of CCA and C-CCA methods to obtain correlated subspaces between source and target samples. (a) CCA uses pairwise correspondences between source and target samples and can hardly segregate the two clusters. (b) C-CCA uses all pairwise correspondences within a cluster across the two sets of samples and results in cluster segregation.

The rest of the paper is organized as follow. The C-CCA and RW algorithms are reviewed in Section II. The proposed methodology of CDCL is presented in section III. Section IV describes the experimental datasets and setup. Results and discussions are presented in Section V. Section VI summarizes the contributions of our research.

Fig. 2: Illustration of the proposed CDCL technique. Note that source clusters refer to the labeled source samples.

Ii Background Algorithms

This section briefly describes background algorithms, i.e. C-CCA and RW algorithms.

Ii-a C-Cca

Let us consider two sets of labeled samples and extracted from the source and the target domains, respectively. Each set is divided into corresponding clusters, which are denoted as and , respectively. The -th cluster of is represented as and the -th cluster of is denoted as , where and represent the number of samples in and , respectively. The aim of C-CCA is to find a projection for and for , so that correlation between projections of and are maximized and clusters are well separated.

In C-CCA, a one-to-one correspondence between all pairs of samples in a given cluster across the two sets is established and thereafter standard CCA is used to learn the projections. The C-CCA problem is written as

(1)

where the covariance matrices , and are defined as:

(2)
(3)
(4)

where

is the total number of cross-set correspondences. The problem can be solved as an eigenvalue problem as in CCA.

Ii-B Rw

The RW algorithm has been initially designed for general image segmentation based on a small set of labeled pixels [48]. The algorithm assigns each unlabeled pixel to the label that a random walker starting from that pixel would be most likely to reach first. To be specific, it considers an image as a graph G= with vertices and edges . Then vertices and edges represent the pixels in the image and the links connecting the adjacent pixels, respectively. The structure of image intensities can be defined by the edge weights. The edge weight between the -th and -th pixels is defined as , where indicates the image intensity at pixel and is a free parameter that controls the smoothness of graph edges. The corresponding Laplacian matrix of the graph is denoted as .

The vertices of the image can be divided into the set of labeled pixels and unlabeled set , where has assigned a label from the set . Given the intensity representation of the image G and , the RW algorithm determines the probability that a random walker starting at unlabeled pixel will first reach a labeled pixel belonging to with label . The set of probabilities is addressed analytically and quickly with closed solutions by minimizing the energy function . By assigning each pixel to the label with the largest probability, a high-quality image segmentation is obtained [48].

When RW is directly applied to HSI classification, the spectral information can hardly be integrated in the energy function . To address the problem, an ERW-based spectral-spatial algorithm is proposed in [49], including the aspatial energy function defined as follows:

(5)

where is a diagonal matrix, where the values are the initial probabilities for pixels. The probabilities can be estimated by applying SVM classifier to the HSI image. The combined energy function for ERW algorithm is formulated as

(6)

where is a free parameter controlling the dynamic range of the aspatial function. Similar to the solution of RW, the set of probabilities in ERW can be estimated by solving linear equations [50]. Given the optimized probabilities, each unlabeled pixel is assigned with the label of the largest probability.

Iii Proposed Method

Iii-a Problem Definition

Assume that we have labeled training samples in the source domain, where and . The samples in the target domain are divided into the labeled and unlabeled sets, which are denoted as and , respectively, and . In this paper, we only consider semi-supervised heterogeneous problem, thus we assume that . For better illustration, the sets of labeled training source samples, the labeled and unlabeled target samples are denoted as , and , respectively.

As shown in Fig. 2, the proposed algorithm is based on an iterative process, including three main components, i.e. RW-based pseudolabeling, cross domain learning via C-CCA and ERW-based classification. It is notable that both RW-based pseudolabeling and ERW-based classification require training set and probability maps (which measure the probabilities that each sample of the target HSI belongs to different classes). The pseudolabeling procedure is introduced to extract several labeled samples with high-confidence as target clusters for C-CCA and more reliably labeled samples for updating the training set. To be specific, the strategy of RW-based label verification in [46] is applied to obtain reliable results of pseudolabeling. For simplicity, the estimated probability maps, target clusters and training set are denoted as , and , respectively.

In the following, the RW-based pseudolabeling will be first described. Then, the details of the proposed method will be introduced.

Fig. 3: Illustration of the RW-based label verification. (a) : segmentation result of RW; (b) : segmentation result of ERW; (c) Fusion of and : samples with high confidences via label verification.

Iii-B RW-based Pseudolabeling

Given the training set and the estimated probability map , the RW-based target samples pseudolabeling consists of the following five steps:

1) Graph construction: In order to make full use of the spatial information, the first principal component (PC) of the hyperspectral image is used to construct a weighted graph G=. Here, the vertices () refer to the sample values in the first PC, and the edges () refer to the links connecting the adjacent samples (eight neighbors are considered for each sample). A weight is defined for each edge to model the difference between adjacent samples in the weighted graph, where is a free parameter.

2) RW segmentation: When the graph representation and are available, RW probabilities can be directly obtained by minimizing the energy function . Denoted as , the segmentation result is obtained by choosing the label with the maximum of probabilities for each sample.

3) ERW segmentation: Given the graph representation, and the initial probability map , ERW probabilities can be optimized by minimizing the energy function . Note that there is a free parameter controlling the dynamic range of the aspatial function. Once the optimized probability map is obtained, the segmentation result is easily computed by choosing the label corresponding to the maximum of probabilities for each sample.

4) Label verification [46]: After obtaining and , label verification is employed to extract sample candidates for further and updating. As illustrated in Fig. 3, and are compared to verify the confidences of the unlabeled samples in the target HSI. To be specific, samples segmented as the same label in and are considered as the sample candidates with high confidences. The rationality of the strategy is as follows. Firstly, the RW and ERW can take complementary decisions. The RW algorithm is only based on the spatial correlation among adjacent samples, whereas the ERW algorithm combines the spectral information and the spatial correlations of adjacent samples. Secondly, the core idea of the strategy is similar to the voting-based decision fusion strategy, i.e. if different classifiers take the same decision for a sample, the decision of this sample is assumed to be more reliable.

5) and updating: Although candidate samples are extracted by label verification strategy with high confidences, and are expected to include more correctly labeled samples. In order to ensure the accuracy of , unlabeled samples in the candidate samples are selected according to the modified breaking ties (MBT)-based query strategy [51]. To be specific, the MBT strategy finds the samples maximizing the ERW probability . Then the samples with the predicted label are added into . In addition, the sample having its largest probability larger than the mean probability of its predicted class is used for extraction.

Cross Domain Collaborative Learning
:
   Samples , and .
   Threshold for correlation coefficients: =0.5.
   Initial training set: .
   Free parameters for ERW: .
   Number of samples added into : .
1: Repeat:
2:   Train a linear SVM classifier with probability esti-
     mation using .
3:   Classify and estimate probability map .
4:   Update and via RW-based pseudolabeling
     using and .
5:   C-CCA using and .
6:   Samples projected onto the subspace kept by .
7:   Train a linear SVM classifier with probability esti-
     mation using projected and .
8:   Classify and estimate probability map .
9:   Update via RW-based pseudolabeling using
     and .
10: Until convergence
11: ERW-based classification using and .
12: Return Classification map

Iii-C Details of Proposed Technique

As illustrated in Algorithm 1, the proposed algorithm can be denoted as cross domain collaborative learning (CDCL) via RW-based pseudolabeling and C-CCA, with and updated iteratively. The details of the proposed algorithm are as follows.

1) RW-based pseudolabeling: As illustrated in Fig. 2, in one iterative process, pseudolabeling is applied twice, i.e. before and after C-CCA. Firstly, probability estimation for pseudolabeling is achieved by training a linear SVM classifier on . Then, the obtained probability maps and are employed to extract and update . Note that the initial only contains . Secondly, after C-CCA using and , the probability maps are estimated by the linear SVM trained using the projected and . Given and the newly estimated probability map , pseudolabeling is applied again for updating . In summary, is updated twice and is computed only once in a single iterative process.

2) Cross domain learning via C-CCA: Given and , more than one pair of projection vectors and with corresponding correlation coefficient are derived via C-CCA. Note that is the dimension number of the obtained subspace, which is smaller than both and . Higher values of the correlation coefficients indicate better correlations between samples projected from different domains, resulting in better domain transfer abilities. In order to generalize correlation subspace with good transfer abilities, we fix the threshold for as 0.5 and the corresponding vectors are kept. After projecting all samples in both domains onto the correlation subspace, are classified with estimated probability maps by using a linear SVM trained on the projected and . Although non-linear classifiers like SVM with RBF kernel generally perform better than linear classifiers in classification task, the optimal parameters of such classifier tuned by source samples usually perform worse than expected for target samples under the context of DA. On the contrary, linear kernel can capture original relationships between samples from different domains.

3) ERW-based classification: When the iterative process of RW-based pseudolabeling and C-CCA converges, the classification map is obtained by ERW using the estimated probability maps and the final .

Iii-D Performance Analysis and Convergence

The classification ability and convergence of the proposed CDCL method are analyzed as follows:

1) Given and , the classification ability of the proposed method relies on two factors, i.e. transfer abilities of C-CCA and ERW-based classification. It is clear that the transfer abilities of C-CCA relies on the number of samples in and the corresponding accuracies, whereas ERW-based classification requires a good estimation of and to achieve higher accuracy. In each iterative process, the samples with highest confidence are added into and several samples are extracted by label verification as . If and are accurate, good cross domain learning would be achieved. Since the classifier is trained using labeled samples from both domains, performs better than when it is used for RW-based pseudolabeling. Therefore, more reliable samples are added into , ensuring and are accurately updated in the next iteration. With and updated iteratively, good classification result can be obtained using the proposed method.

2) As stated above, higher classification accuracy via the proposed method is easily obtained under the assumption to have reasonably accurate and . However, note that since the C-CCA is based on the pairwise correspondences within a cluster across domains, it is expected that the source clusters are aligned with the corresponding target clusters even if there are few mislabelled samples in . With the iterative process going on, both RW and ERW segmentation results will be close to the ground truth, resulting in more samples extracted as candidates via label verification. Then more samples would be extracted as . Since samples with their probability larger than the mean probabilities of their predicted class are considered as , the number of samples in is smaller than the number of all unlabeled samples. In fact, the number of samples in can hardly be monotonically increasing with iterations due to the inconsistency between segmentation results obtained by RW and ERW algorithms. Therefore, if the increase of sample amount in is less than 5% of the total unlabeled samples, we consider the convergence reached.

Iv Experimental Data and Setup

Iv-a DataSet Description

No. Class Color Pavia University Pavia Center
TM GT TM GT
1 Asphalt 548 6631 678 7585
2 Meadows 540 18649 797 2905
3 Trees 524 3064 785 6508
4 Baresoil 532 5029 820 6549
5 Bricks 514 3682 485 2140
6 Bitumen 375 1330 808 7287
7 Shadows 231 947 195 2165
No. Class Color Salinas Indian
1 Weeds_1/Alfalfa 2009 46
2 Weeds_1/Corn_n 3726 1428
3 Fallow/Corn_m 1976 830
4 Fallow_r/Corn 1394 237
5 Fallow_s/Grass-pasture 2678 483
6 Stubble/Grass-trees 3959 730
7 Celery/Grass-pasture_m 3579 28
8 Graphes_u/Hay_w 11271 478
9 Soil_v/Oats 547 20
10 Corn_s/Soybean_n 3278 972
11 Lettuce_4wk/Soybean_m 1068 2455
12 Lettuce_5wk/Soybean_c 1927 593
13 Lettuce_6wk/Wheat 916 205
14 Lettuce_7wk/Woods 1070 1265
15 Vinyard_u/Buildings-Grass 7268 386
16 Vinyard_v/Stone-Steel 1807 93
TABLE I: Number of Labeled Samples Available for Pavia Data Set (Top) and Salinas/Indian Data Set (Down).

The first dataset consists of two hyperspectral images collected by the Reflective Optics Spectrographic Image System (ROSIS) sensor over the University of Pavia and Pavia City Center. The Pavia City Center image contains 102 spectral bands and has a size of 1096492 pixels. The Pavia University image contains instead 103 spectral reflectance bands and has a size of 610340 pixels. Only seven classes shared by both images are considered herein. In the experiments, the Pavia University image is considered as the source domain, while the Pavia City Center image as the target domain, or vice versa. These two cases are denoted as Univ/Center and Center/Univ, respectively. Note that there are manually selected training maps (TM) which are publicly available and widely used in related publications [51, 46, 1, 52]. The color composite image, ground truth (GT) and TM of Pavia dataset are illustrated in Fig. 4, whereas the corresponding number of labeled samples is detailed in Table I.

The second dataset consists of two hyperspectral images captured with Airborne Visible Infrared Imaging Spectrometer (AVIRIS) over Salinas Valley, California and Northwest Indiana. After discarding 20 water absorption bands, Salinas image contains 224 bands of 512 217 pixels. Fig. 5(a-b) show the color composite image and the GT of the Salinas data set, in which 16 different classes represent mostly different types of crops. After removing 20 spectral bands due to noise and water absorption, Indian Pines image contains 200 bands of 145 145 pixels, and its spatial resolution is 20 m per pixel. The color composite image and the GT containing 16 different classes are presented in Fig. 5(c-d). The classes of both images are listed in Table I

with the corresponding number of samples. Since we mainly focus on the HDA problem, a low-dimensional image is considered as the source domain by clustering the spectral space of the original data for each image. Specifically, the original bands of the HSI are clustered into 50 groups using the K-means algorithm, and the mean value of each cluster is considered as a new spectral band, providing a total of 50 new bands. The corresponding cases are denoted as

Salinas and Indian cases, respectively.

Fig. 4: ROSIS Pavia dataset used in our experiments. (a) Color composite image, (b) ground truth and (c) training map of the University scene; (d) color composite image, (e) ground truth and (f) training map of City Center scene.
Fig. 5: AVIRIS Salinas and Indian Pines datasets used in our experiments. (a) Color composite image and (b) ground truth of the Salinas data; (c) color composite image and (d) ground truth of Indian Pines data.

Iv-B Experimental Setup

In order to make a general comparison, the default parameters of the ERW classifier given in [49, 46] are adopted for the proposed algorithm. Specifically, the parameters of the RW and ERW in the proposed method are set to be and . In addition, the threshold of correlation coefficients and the query size are set to 0.5 and 10, respectively, in all experiments. The free parameter of linear SVM in our method is tuned in the range () with 5-fold cross-validation.

Several approaches of semi-supervised HDA proposed for visual and remote sensing applications are employed as baseline methods:

CCA [53]: CCA aligns both domains by using the same number of labeled samples from source and target domains. To be specific, a random selection of samples from source or target domain is applied to ensure pairwise correspondences between domains.

C-CCA [47]: C-CCA is directly employed by using the labeled samples in both domains.

DAMA [41]: DAMA adopts a linear projection to match the differences between the source and target subspaces.

SSMA [42]: SSMA carries out adaptation through manifold alignment while preserving label (dis)similarities and the geometric structures of the single manifold in both domains.

KEMA [43]: KEMA is a kernerlized version of SSMA.

SHFA [35]: SHFA simultaneously learns the target classifier and infers the target labels in an augmented common feature space.

CDLS [40]: CDLS jointly explores a domain-invariant feature subspace and identifies cross-domain landmarks.

Data Set Univ/Center Center/Univ Salinas/Salinas Indian/Indian
TR_S 50 10, 20, 50 50 5, 10, 15
TR_T 2 2, 3, 5 2 2, 3, 5
TE_T 2% 2% 2% 10%
TABLE II: Number of Training and Test Samples Used for the Pavia and Salinas/Indian Data Sets.
Class Methods
CCA C-CCA DAMA SSMA KEMA SHFA CDLS NA LapSVM ERW CDCL
Asphalt
Meadows
Trees
Baresoil
Bricks
Bitumen
Shadows
OA
AA
Kappa
TABLE III: Classification results for the Pavia Center Dataset. The Best Results For Each Row are Reported in Italic Bold. The Proposed CDCL Approach Significantly Outperforms All the Baseline Methods.

Moreover, several methods applied only to the target domain are also employed as baselines:

No Adaptation (NA): NA is a basic baseline that learns linear SVM [54] using the initially labeled target samples.

LapSVM [9]: LapSVM is a typical baseline for semi-supervised classification and the one-vs-one strategy for linear SVM is applied for fair comparison.

ERW [49]: ERW carries out classification using the initial probabilities learned by linear SVM and the initially labeled target samples.
The threshold of correlation coefficient for CCA and C-CCA is set as 0.5. The parameter is set as 0.9 for DAMA, SSMA and KEMA, whereas the optimal dimensionality of final projection for the three methods are cross-validated by exploiting labeled source and target samples. Once samples are projected onto new subspace, the final classification results of CCA, C-CCA, DAMA, SSMA and KEMA are obtained by training the linear SVM using labeled samples of both domains with parameter tuned in the range (). The parameters of SHFA is tuned as in [35]. The dimensionality of PCA in CDLS is set as 30, whereas other parameters of CDLS are tuned as in [40]. The parameters of LapSVM are set to be and , whereas parameters of ERW are set to be and for fair comparison.

Fig. 6: Classification map of Pavia Center by (a) CCA (OA=44.63%), (b) C-CCA (OA=75.24%), (c) DAMA (OA=82.28%), (d) SSMA (OA=71.51%), (e) KEMA (OA=72.26%), (f) SHFA (OA=69.31%), (g) CDLS (OA=70.93%), (h) NA (OA=79.98%), (i) LapSVM (OA=71.74%), (j) ERW (OA=85.89%), (k) CDCL (OA=91.03%) methods and (l) denotes the corresponding ground truth.

In a practical application, the number of labeled samples in the target HSI is typically not enough to learn a reliable classifier, whereas the amount of labeled samples in the source HSI is relatively larger. To model this scenario, we randomly select a limited amount of samples from the target HSI as labeled. Table II lists the settings of training and test samples used in our experiments, which consist of three parts: 1) training samp les (labeled) from the source HSI (TR_S); 2) training samples (labeled) from the target HSI (TR_T); and 3) test samples (unlabeled) from the target HSI (TE_T). The integers (i.e., 2 3 5) in Table II represent the number of samples per class, whereas the percentages refer to the ratio of training or testing samples. For example, the setting of Univ/Center case means that 50 labeled source samples and 2 labeled target samples per class are selected as training samples, and 2% of all unlabeled target samples are used for testing. Note that testing samples of four cases are selected from the corresponding target ground truth. The training samples for Pavia dataset are selected from publicly available training maps (see Fig. 4), whereas the training samples for Salinas and Indian datasets are selected from the ground truth. To exploit the effectiveness of various training samples in both domains, various settings of TR_S and TR_T are applied for Center/Univ and Indian cases. For each setting in Table II, 50 trials of the classification have been performed to ensure stability of the result. The classification results are evaluated in terms of Overall Accuracy (OA), Average Accuracy (AA) and Kappa statistic. All our experiments have been conducted by using Matlab R2017b in a desktop PC equipped with an Intel Core i5 CPU (at 3.1GHz) and 8GB of RAM.

V Results and Discussions

Method Metrics Number of Source/Target Training Samples (per class)
    10/2     20/2     50/2     10/3     20/3     50/3     10/5     20/5     50/5
CCA OA
AA
Kappa
C-CCA OA
AA
Kappa
DAMA OA
AA
Kappa
SSMA OA
AA
Kappa
KEMA OA
AA
Kappa
SHFA OA
AA
Kappa
CDLS OA
AA
Kappa
NA OA
AA
Kappa
LapSVM OA
AA
Kappa
ERW OA
AA
Kappa
CDCL OA
AA
Kappa
TABLE IV: Classification results for the Pavia University Dataset. The Best Results For Each Column are Reported in Italic Bold. The Proposed CDCL Approach Outperforms All the Baseline Methods.
Fig. 7: Individual accuracies of different classes (a-g) obtained by NA, C-CCA, ERW and CDCL methods on the Pavia University dataset. Note that the different configurations of training samples correspond to the settings in Table IV.

V-a Results of Univ/Center Case

To illustrate the effectiveness of the proposed CDCL on the whole HSI, an experiment is performed with the setting (TR_T and TR_S) in Table II and all unlabeled samples in Pavia Center HSI as TE_T. Fig. 6(a)-(k) show the classification results obtained by different methods, including CCA, C-CCA, DAMA, SSMA, KEMA, SHFA, CDLS, NA, LapSVM, ERW and the proposed CDCL methods. Fig. 6(l) represents the corresponding ground truth. From this figure, it can be seen that the CDCL method can effectively remove the noise in the NA and ERW classification results. Furthermore, the CDCL method obtained the highest OA = 91.03%. Table III

reports the results of different methods in terms of individual class accuracies, the mean and standard variance of OA, AA, and Kappa statistics using the setting in Table

II. The following observations can be done:

Class Methods
CCA C-CCA DAMA SSMA KEMA SHFA CDLS NA LapSVM ERW CDCL
Weeds_1
Weeds_2
Fallow
Fallow_r
Fallow_s
Stubble
Celery
Graphes_u
Soil_v
Corn_s
Lettuce_4wk
Lettuce_5wk
Lettuce_6wk
Lettuce_7wk
Vinyard_u
Vinyard_v
OA
AA
Kappa
TABLE V: Classification results for the Salinas Dataset. The Best Results For Each Row are Reported in Italic Bold. The Proposed CDCL Approach Outperforms All the Baseline Methods.

The CDCL method gives the highest classification accuracies for “Baresoil”, “Bricks” and “Bitumen” classes. Moreover, the CDCL method also shows the best performance in terms of OA = 83.24%, AA = 82.29%, and Kappa = 80.00%.
The results of KEMA and SHFA are comparable and better than results of other HDA methods, whereas the CCA method performs worst due to the fact that only part of labeled samples are used.
The NA method outperforms LapSVM and ERW methods, and even all the baseline HDA methods. It can be concluded that the knowledge of Pavia University data can hardly be well transferred to the Center data with limited target labeled samples. In addition, both CDCL and ERW perform worse than NA method on the “Meadows” and “Shadows” classes, confirming the relation between ERW and CDCL methods.

Fig. 8: Classification map of Salinas Image by (a) CCA (OA=36.07%), (b) C-CCA (OA=76.19%), (c) DAMA (OA=73.90%), (d) SSMA (OA=77.02%), (e) KEMA (OA=74.27%), (f) SHFA (OA=81.48%), (g) CDLS (OA=79.27%), (h) NA (OA=78.17%), (i) LapSVM (OA=81.36%), (j) ERW (OA=83.40%), (k) CDCL (OA=88.13%) methods and (l) denotes the corresponding ground truth.

V-B Results of Center/Univ Case

Table IV

illustrates the OAs, AAs, Kappa statistics and the corresponding standard errors obtained by the proposed

CDCL method and the baseline methods for the Center/Univ case. The experiments are performed with different numbers of source and target training samples illustrated in Table II. The following observations can be easily drawn:
When increasing the number of labeled source and target samples, the mean OAs, AAs and Kappa statistics of most methods increase as expected. The increasing trend of mean OAs with more target training samples confirms that 50 trials are enough for achieving stable results. Moreover, the standard errors of OAs, AAs and Kappa statistics for smaller numbers of labeled samples appear to be higher.
The CDCL method gives the highest classification accuracies with different numbers of training samples. To be specific, the mean OAs of NA, ERW and CDCL methods are in the range of 58.88%-67.4%, 70.05%-83.12% and 72.35%-85.66%, respectively. Further, when only 10 per class source labeled samples are used for training, the CDCL method yields 2.30%, 7.11% and 2.48% higher mean OAs than ERW with 2, 3 and 5 target samples per class.

Fig. 7 reports individual class accuracies for the Center/Univ case obtained by C-CCA, NA, ERW and CDCL methods using different numbers of labeled samples, assessed by the mean OAs (main curves) and their standard errors (shaded area for each curve). The classification accuracies of 7 classes (“asphalt”, “meadows”, “trees”, “baresoil”, “bricks”, “bitumen”, “shadows”) are shown in Fig. 7(a-g), respectively. Note that the abscissas represent the different settings of training number in Table IV. The CDCL method outperforms C-CCA, NA and ERW methods on “asphalt” (a), “meadows” (b), “baresoil” (d) and “bitumen” (f) classes, and shows comparable accuracy on “bricks” class (e) with the ERW method, yielding a better overall classification accuracy. Further, the ERW method performs worse than NA on “trees” (c) and “shadows” (g) classes, resulting in low accuracies of CDCL method on these two classes.

V-C Results of Salinas Case

Method Metrics Number of Source/Target Training Samples (per class)
    5/2     10/2     15/2     5/3     10/3     15/3     5/5     10/5     15/5
CCA OA
AA
Kappa
C-CCA OA
AA
Kappa
DAMA OA
AA
Kappa
SSMA OA
AA
Kappa
KEMA OA
AA
Kappa
SHFA OA
AA
Kappa
CDLS OA
AA
Kappa