Log In Sign Up

Active Multi-Kernel Domain Adaptation for Hyperspectral Image Classification

by   Cheng Deng, et al.

Recent years have witnessed the quick progress of the hyperspectral images (HSI) classification. Most of existing studies either heavily rely on the expensive label information using the supervised learning or can hardly exploit the discriminative information borrowed from related domains. To address this issues, in this paper we show a novel framework addressing HSI classification based on the domain adaptation (DA) with active learning (AL). The main idea of our method is to retrain the multi-kernel classifier by utilizing the available labeled samples from source domain, and adding minimum number of the most informative samples with active queries in the target domain. The proposed method adaptively combines multiple kernels, forming a DA classifier that minimizes the bias between the source and target domains. Further equipped with the nested actively updating process, it sequentially expands the training set and gradually converges to a satisfying level of classification performance. We study this active adaptation framework with the Margin Sampling (MS) strategy in the HSI classification task. Our experimental results on two popular HSI datasets demonstrate its effectiveness.


page 7

page 8


MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation

In recent years, machine learning has achieved impressive results across...

Improved Multi-Source Domain Adaptation by Preservation of Factors

Domain Adaptation (DA) is a highly relevant research topic when it comes...

Active Adversarial Domain Adaptation

We propose an active learning approach for transferring representations ...

Combating Label Distribution Shift for Active Domain Adaptation

We consider the problem of active domain adaptation (ADA) to unlabeled t...

Tensor Alignment based Domain Adaptation for Hyperspectral Image Classification

This paper presents a tensor alignment (TA) based domain adaptation meth...

Boosting Deep Hyperspectral Image Classification with Spectral Unmixing

Recent advances in neural networks have made great progress in addressin...

1 Introduction

A very challenging problem in the remote sensing community is to generate land-cover maps for semantically characterizing Earth’s surface DBLP:journals/lgrs/AlajlanPMF14 ; Li:2015 ; Li:2016 ; Tong:2016 ; Tang:2016 ; Li:2017

. As one of the most widely used approaches, hyperspectral image (HSI) classification has recently gained in popularity and attracted research interests from other scientific disciplines such as image processing, machine learning, and computer vision

Plaza:2009 ; Tarabalka:2012 ; Li:2014 ; Samat:2016 ; Ye:2016 ; Yan:2016 ; Zhang:2016 ; Shao:2017 ; Appice:2017 . Most of these studies belong to the supervised learning methods, which have shown promising classification performance in practice. However, they usually require many labeled samples to train classifier’s parameters properly, which is quite expensive and time-consuming for real-world applications. Moreover, the high dimensionality of HSI makes it difficult to find an expected classifier only with a few labeled samples Ifarraguerri:2000 ; DBLP:journals/tgrs/RajanGC08 .

To address these issues, one feasible way is to exploit the available information from other geographical areas with abundant labeled samples (regarded as source domain). However, usually these areas are different from the target one, and there always exist certain shifts in data distribution, especially for the image data with underlying structures Chen:2015

. From a perspective of machine learning, this shift problem can be modeled by transfer learning, especially the domain adaptation (DA) approaches. In such scenario, it is always assumed that source domain and target domain possess similar characteristics, i.e., they share the same set of label classes or correlated class distributions

DBLP:journals/pami/BruzzoneM10 .

A number of DA techniques have been adopted in HSI classification tasks DBLP:journals/pami/BruzzoneM10 ; zhuo ; DBLP:journals/tgrs/BruzzoneF01 , where both labeled samples from source domain and unlabeled samples from target domain are exploited to train a classifier for the target domain. DBLP:journals/pami/BruzzoneM10 proposed a DA framework named DASVM, which extended Transductive SVM (T-SVM) to label unlabeled target samples and remove some source samples progressively. In zhuo , a multiple-kernel DA technique was designed to learn a discriminative model by simultaneously minimizing both the SVM structural risk function and the distribution mismatch. The DA work of DBLP:journals/tgrs/BruzzoneF01 adapted the maximum-likelihood classifier to target domain through updating the classifier parameters.

Most of the existing domain adaptation works assume that labeled samples are available only for the source domain, but not for the target domain, or only a few labeled data exists in target domain. In fact, the labeled information in target domain can directly and notably improve the classification performance. To alleviate the expensive cost on the collection of labeled data, an effective solution for discriminative classifier training is to interactively generate the labeled data using the active learning (AL) technique. In the AL literature, we know that some samples from the target domain can be selected and further labeled by the user, which finally form a new training set together with the existing training set in the source domain in order to adapt the classifier to the target domain DBLP:conf/igarss/PerselloB11 . There are a number of studies that have already taken the advantages of the AL strategy in HSI classification 5764734 ; DBLP:journals/tgrs/PerselloB12 ; DBLP:journals/tgrs/PerselloB12 ; Dai:2007:BTL:1273496.1273521 ; Tuia2011 . However, they usually either only focus on the target domain without exploiting the useful information of the source domain, or lack of the power of capturing the data set shift using the ineffective active queries selection without fully utilizing the domain correlations.

In this paper, we first propose a novel framework named multi-kernel learning with active learning (MKL-AL for short) for HSI classification, which combines the powerful AL and DA techniques based on the multiple kernels and largely helps compensate the data shift occurred between two image acquisitions. The main idea of our method is to retrain a multi-kernel classifier using the labeled samples from both source domain and the user-labeled samples selected from target domain, and the process of retraining is convergent to a satisfying performance of the desired level. On one side, the AL technique can enable us to fully utilize the target information. On the other side, DA based on multi-kernel learning can offer a good distribution distance measurement across domains, which helps us determine which kernel space is most fit for the data of the two domains and thus select the informative samples. To illustrate the performance of our MKL-AL framework, here we choose margin sampling (MS) schohn2000less DBLP:journals/prl/MitraSP04 DBLP:journals/tgrs/DemirPB11 , simple and common used uncertainty criterion, as the active learning strategy DBLP:journals/spm/Camps-VallsTBB14 DBLP:journals/pieee/CrawfordTY13 DBLP:journals/tgrs/DemirPB11 for selecting most informative pixels. Therefore, in the following, we can specialize our proposed MKL-AL method as multi-kernel learning with margin sampling (MKL-MS for short). We conducted extensive experiments on HSI classification over two hyperspectral datasets, and the experimental results demonstrate the effectiveness of our proposed framework.

The rest of this paper is organized as follows. The following section introduces the related works. Section 3 presents our framework for hyperspectral image classification and elaborates on the detailed components of the proposed framework including the multi-kernel learning, domain adaptation and the active learning MS strategy. In Section 4, we present and discuss experimental results. Finally, we conclude in Section 5.

2 Related Works

In the literature, support vector machine (SVM) has become one of the most successfully used techniques for hyperspectral image classification

Melgani:2004 , mainly due to the fact that SVM can deal with the high-dimensional and noisy data, with the help of the sparse set of the support vectors Xu:2017 and the powerful kernel tricks Liu:2014:pr . This has been proved in many applications like biological problems, which involve high-dimensional, noisy data, for which SVMs are known to behave well compared to other statistical or machine learning methods Bandyopadhyay:2007 . Therefore, many kernel-based SVM methods have been studied to capture the complex semantic structure of the hyperspectal images. Basically, these methods first uplift data in the original feature space to a high-dimensional kernel space, and then solve the linear classification problem in the uplifted space. With the supervised information from many labeled samples fed to the model, the kernel-based solutions have demonstrated excellent performance in hyperspectral data classification, in terms of accuracy and robustness Camps:2005 ; Camps:2006 ; Camps:2007 ; Liu:2016 ; Li:2016 .

Most of these existing HSI classification methods assume that labeled samples are available for the concerned domains. However, in practice it is common that there exist different domains in the hyperspectral images and some of them do not have sufficient labeled samples, because it is usually expensive to collect labels for all domains. Therefore, respectively training a classifier for each domain becomes infeasible. At the same time, due to the spectral shifts among the domains, the model trained in one domain cannot directly fit the other domains. This will obviously limit the power of the traditional HSI classification methods. To address the problem, the domain adaptation (DA) serves as a successful strategy that transfers the well trained classifier from a source domain to the different, yet related target domain Tuia:2016 . In DA based HSI classification, both labeled samples from the source domain and unlabeled samples from the target domain are exploited to train a classifier for the target domain. Such a technique has been proved able to avoid the expensive and time-consuming labelling efforts, and meanwhile achieve satisfying classification performance across domains DBLP:journals/pami/BruzzoneM10 ; zhuo ; DBLP:journals/tgrs/BruzzoneF01 .

There are several ways in DA research to migrate the classifiers among the source and the target domains. One of the typical strategies is to make the data distributions more similar across the domains to train a single model that can simultaneously classify the source and target domains. zhuo designed a multiple-kernel DA technique to learn a discriminative model by simultaneously minimizing both the SVM structural risk and the distribution divergence. DBLP:journals/tgrs/BruzzoneF01 adapted the maximum-likelihood classifier to the target domain through updating the classifier parameters. Even with the promising progress achieved by the DA based HSI classification methods, it is still beyond the desired performance without the supervised information from target domain, mainly due to the divergence among the source and the target domains.

Since the labeled information in target domain can directly and notably improve the classification performance, it is obviously helpful that we can exploit a small number of labeled data to boost the classification performance, which at the same time only brings a quite limited additional cost for the labeled data collection. The active learning (AL) strategies have been widely studied in the literature to tackle such a challenging task in recent years Tong:2002 ; Jain:2010 ; Huang:2014 ; Liu:2016:cvpr , with the aim to exploit the information available from unlabeled data and to enrich the labeled data. In AL process, labelling originally unlabeled data is usually completed by a user according to an specific informative measure.

Due to the promising performance, many efforts have been devoted to the integration of AL in domain adaptation based HSI classification DBLP:conf/igarss/PerselloB11 ; DBLP:journals/tgrs/PerselloB12 ; Tuia2011 ; 6353565 ; 5764734 . DBLP:journals/tgrs/PerselloB12 offered an iterative AL process by adding the most informative samples to the training set, while removing the source-domain samples that do not fit with the distributions of the classes in the target domain. 6353565 is a framework that efficiently combines the DA and AL techniques, and the most informative pixels are sampled with active queries from the target image while adapting the obtained classifier using a transfer learning strategy. In Tuia2011 , a DA framework equipped with the active selection pursued the training samples in unknown areas using the strategies based on uncertainty and clustering. These active learning methods are able to alleviate the expensive cost on the collection of labeled data by interactively generating the labeled data.

3 Active Multi-Kernel Domain Adaptation

First of all, we introduce the notations adopted throughout this paper. Let be the labeled samples (i.e., pixels) in source domain with the corresponding labels , be the unlabeled samples in target domain, and , be all the labeled samples from both source and target domains, which will be iteratively updated in our active multi-kernel domain adaptation. Here, we adopt binary classifier for simplicity and easy understanding, so each label . In practical scenarios, this can be further extended using the one-against-all strategy for the multi-class problem.

The flowchart in Fig. 1

outlines the general procedure, including the updating of the base kernels and the maximum mean discrepancy. It consists of two parts, of which the first one is the active learning for target data labelling, with MS selection criteria that heuristically updates the labeled dataset from both source and target domain (the top part), and the other corresponds to the retraining of multi-kernel classifier based on the adaption from source domains to the target (the bottom part). In active learning, the most interesting candidates for labelling are the ones that fall within the margin of the current classifier, as they are the most likely to become new support vectors

devis1 . So the most interesting candidates from the target domain are identified by using the margin sampling (MS) strategy. After assigning the corresponding true labels by annotators, these candidates are further added to training data in target domain for a better MKL classifier.

Figure 1: The flowchart of our proposed MKL-AL method. It consists of two parts: the active learning for target data labelling (the top part) and multi-kernel based domain adaptation (the bottom part). The active learning technique heuristically updates the labeled target domain according to the MS selection criteria, while the domain adaptation is applied to the retraining of multi-kernel classifier based on the adaption from source domains to the target.

3.1 Multi-Kernel Learning for HSI Classification

The multiple kernel learning (MKL) framework has been proved powerful for support vector machine (SVM) based classification in the literature Vishwanathan:2010 . Therefore, in this paper we employ MKL technique for HSI classification. Specifically, we pursue a decision function of the form , where each function belongs to a different reproducing kernel Hilbert space (RKHS). According to the above functional framework, the well-known SimpleMKL method proposed a weighted 2-norm regularization formulation with an constraint on the combination weights, which encourage the sparsity simplemkl . Based on the combination, it solves a standard SVM optimization problem with a kernel defined as a linear combination of multiple kernels. Supposing contains labeled samples (or pixels), the MKL based SVM problem can be addressed by solving the following convex problem, which we will be referred to as the primal MKL problem in SimpleMKL


where is the linear combination coefficients for components, and each controls the contribution of each component in the objective function. The smaller means the smoother , according to the measurement (Note that when , has also to be set to zero for a finite objective value). The constraint on actually corresponds to a the popular sparsity -norm constraint, which forces some to be zero and thus encourages sparse basis kernel expansions. Note that since the above formulation is convex and differentiable, it is easy to solve the problem by a simple gradient method simplemkl .

In SVM based classification, we usually adopt a linear projection based classifier. Specifically, we use the form , where is the projection vectors, and is the nonlinear feature mapping function, which induces the kernel function , i.e., . Based on the linear classifier formulation, the above MKL problem can be further rewritten as follows


Such a formulation is quite similar to the standard SVM, i.e., given the combination weight , its dual problem can be easily obtained using the Lagrangian multipliers satisfying KKT conditions. According to the theoretical result in simplemkl , the above optimization problem can be turn to its associated dual problem as follows:


where is the kernel matrix defined for the labeled data by , and is the Hadamard product operator which performs the product in an elementwise manner.

3.2 Domain Adaptation based on MKL

In our MKL framework, besides the label information from the training samples, we further consider the correlation between the source and target domains for HSI classification, and incorporate the domain adaptation technique into the MKL based classification. Intuitively, since we have more training data from the source domain than those from the target domain, it is natural to find the optimal migration between the two domains. Therefore, we attempt to reduce the distribution discrepancy when we transfer from source domains to the target one.

To address this issue, we follow duan1 and develop the SimpleMKL formulation with a Maximum Mean Discrepancy (MMD). The MMD measures the mismatch based on the distance between the means of the samples, respectively, from the source domain and the target domain in a RKHS, namely,


If we define the matrix :


then will turn to


where , with each kernel component induced from defined as


where , , and are the kernel matrices defined for the source domain, the target domain, and the cross domain from the source domain to the target domain, respectively. The above formulation indicates that MMD serve as a good measure of distribution discrepancy in kernel space. For more details about MMD, readers can refer to Borgwardt MMD .

With the discrepancy measurement MMD, we can now present the final formulation for DA based on MKL as follows:


where is positive parameters that controls the balance between the domain adaptation and the classification accuracy. Such a formulation actually forces the learnt classifier can simultaneously predict the labels accurately and preserve the semantic relations across domains.

By introducing domain adaptation, we can explore the available knowledge on a given source domain to develop a classifier built on the target domain where a priori information is not available. This will significantly reduce the heavy requirement of labeled samples in the target domain. Subsequently, our work can enjoy the capability of transferring the model to different target domains easily.

3.3 Alternating Optimization

There are two variables involved in the above formulation, i.e., the variable for the MKL in domain adaptation, and variable for the classifier. We employ the reduced gradient descent procedure proposed in simplemkl to iteratively update the linear combination coefficient and the dual variable . The optimization consists of two main steps, each of which can be easily solved using the simple and efficient existing techniques.

  • -step: with the fixed , Problem 8 becomes the standard SVM problem defined in Equation 3, where can be treated as one kernel matrix. We adopted the LIBSVM tool to directly get the optimal .

  • -step: with the learnt , we can rewrite the problem 8 with respect to as follows


    with . Since the above problem is convex with respect to , we can directly apply the second-order gradient descent method to solving it.

To obtain the global optimization solution, we alternatingly update the linear coefficient and the dual variable in a few iterations. After we have these parameters, the final classier based on the multiple kernels can be formulated as follows


3.4 Active Learning

The domain adaption based on MKL can help fully utilize the training data from different domains to obtain a discriminative classifier for HSI. However, as we mentioned above, these exist more or less divergence between the source and the target domains, which limits the power of the domain adaptation. To maximally exploit the semantic information from the target domain and meanwhile minimally rely on the large number of the labeled data, active learning serves as an promising solution to improving the classification performance with a small set of the selected labeled samples Tong:2002 ; Jain:2010 ; Liu:2016:cvpr ; Samat:2016 .

Specifically, in our proposed framework we repeat the training and active learning stages alternatingly, where the active learning stage selects the most informative samples from the target domain, labels it and adds it into the training set , and then the training stage retrain the MKL classifier using the updated dataset containing more labeled samples from the source domain.

In the active learning, the selection criteria is quite important for the classification performance. In our framework we adopt the SVM model for HSI classification, which basically pursues the hyperplane that gives the maximum margin. Therefore, in a one-against-all setting for multi-class problems

devis2 , a margin sampling (MS) strategy is employed to heuristically select the best points, i.e., the closest points , to the hyperplane of the classifier learnt in the last iteration) from the remaining unlabeled dataset according to the following ranking criterion for each candidate :


Here is the distance of the samples to the hyperplane defined for any class, with its dual variables and the training data in . Note that when is quite huge, i.e., there are a huge number of the unlabeled samples in the target domain, the above ranking solution will be quite time-consuming due to the expensive computation of the distances for all samples. In this case, we can employ the existing speedup techniques like the popular locality sensitive hashing Jain:2010 ; Liu:2016:cvpr .

Algorithm 1 lists the detailed procedures of our proposed active MKL-MS framework.

1:  Source domain , Target domain ;
2:  Initialize candidate set , the labeled set , and the weight ;
3:  Calculate the base kernels according to Equation (7);
4:  Train the multi-kernel classifier using and the corresponding labels by solving Problem (3) using LIBSVM;
5:  Optimize the multi-kernel combination by solving Problem (9) using gradient descent method;
6:  Score the unlabeled samples according to the distance in Equation 11;
7:  Select the top- closest samples from ;
8:  Annotate and update the candidate set ;
9:  Until:  The stop criterion is satisfied.
Algorithm 1 Framework of the proposed active MKL-MS

4 Experiments

(a) Pavia Center
(b) University Area
Figure 2: The left column is false color composition and the right one is ground-truth map containing five different land-cover classes: (a) Pavia Center and (b) University Area
(a) Pavia Center
(b) University Area
Figure 3: The left column is false color composition and the right one is ground-truth map with the enlarged source regions: (a) Pavia Center and (b) University Area

4.1 Settings and Protocol

In the experiments, we employ two hyperspectral datasets to evaluate the proposed method, which adapts the multi-kernel classifier trained on the area S (considered as source domain) to the spatially separate area T (considered as target domain). The first dataset is Pavia center (northern Italy) containing pixels. As shown in Fig. 2(a), the ground-truth map with five classes (water, trees, meadow, soil and tile) of interest available for the scene, displayed in the form of a class assignment for each labeled pixel. These classes have been included in a labeled data set of 126,069 samples extracted by visual inspection. The second hyperspectral dataset is the university area, whose image size is in pixels. Similar to the Pavia center, it is also divided into nine classes. Fig. 2(b) presents the five reference classes of interest, i.e., asphalt, meadows, gravel, trees, and bricks. Totally, the number of different labeled samples available for university area scene is 34,125.

On each dataset, 20 samples of each class (i.e., 100 samples totally for five classes) were randomly selected as the initial training set to obtain a multi-kernel classifier. To suppress the randomness in the evaluation, all the results are averaged over ten times of experiments, namely, we sample ten different initial training sets from the source domain. The active learning process runs with two settings: selecting and adding and pixels into the training set per iteration. For the above settings, we respectively repeat 40 and 30 iterations for Pavia center image, and 30 for University area in all cases.

As to the multi-kernel classifier, we following the common way for multi-class classification problem that one-versus-all classifiers are trained with four types of kernels (i.e., ): Gaussian kernel , Laplacian kernel , inverse square distance kernel , and inverse distance kernel . Here represents the kernel parameter, for which we use the default value , with respect to the mean value of the square distances between all training samples.

We compare our method with those classic active learning methods with different heuristics like margin sampling (MS), random sampling (RS), and MKL-RS (with random sampling heuristic), in terms of the classification accuracy (%) and Kappa statistic. The Kappa statistic is a metric that compares an bbserved accuracy with an expected accuracy (random chance) Viera:2005 . It takes into account random chance (agreement with a random classifier), which generally means it is a more robust measure than simple percent agreement calculation. We adopted it for the comprehensive evaluation of the proposed method in our paper. For these active learning methods, a small set of labeled data in source domains with the sequentially selected samples in the target domain are treated as the training data. In addition, we also adopt those traditional methods without active learning process (non-AL for short), like single kernel version of DTMKL (SKV), classic SVM, and domain transfer multi-kernel learning (DTMKL) zhuo , as the baselines, where they utilize all the labeled samples in source domains to train the classifier for the target domain. In those kernel methods we simply choose the Gaussian kernel the default one. As to the SVM based methods, the popular LIBSVM is applied in the experiments, and the optimal SVM parameters are found using the grid search in a tenfold cross-validation manner, where the parameter and .

Method # target samples ()
0 40 100 200
RS 47.770 86.200 90.445 93.865
MS 47.770 87.405 95.685 96.255
MKL-RS 47.930 92.250 94.275 94.655
MKL-MS 47.945 94.765 98.610 98.235
Table 1: Classification Accuracy (%) with Respective to the Number of Samples Added to Target Domain on Pavia Center
Method # source/target samples OA Kappa
SKV 2723 / 0 46.550 0.358
DTMKL 2723 / 0 46.850 0.360
SVM 2723 / 0 49.750 0.388
RS 100 / 300 92.220 5.288 0.882 0.085
MS 100 / 300 96.140 2.427 0.943 0.036
MKL-RS 100 / 300 92.960 2.383 0.897 0.035
MKL-MS 100 / 300 97.930 0.658 0.970 0.010
Table 2: Classification Accuracy (%) and Kappa Statistic on the Pavia Center Dataset
region size Method # source/target samples OA Kappa



80 / 300 93.2440.215 0.9140.008
100/300 92.8250.153 0.9220.013


80/300 98.3010.322 0.9750.010
100/300 97.4670.235 0.9630.002



80 / 300 90.1320.056 0.9030.006
100/300 90.0980.028 0.8920.016


80/300 94.2000.032 0.9230.005
100/300 94.0670.005 0.9120.016
Table 3: Classification Accuracy (%) and Kappa Statistic on the Pavia Center Dataset with different size of source region

4.2 Results and Analysis

4.2.1 Experiments on Pavia Center Dataset

Fig. 4(a) plots the obtained overall accuracy (OA) of the classification using four different AL methods: RS, MS, MKL-RS, and MKL-MS, with respect to the number of labeled training samples for the Pavia center scene. From the figure, we can first notice that when the training set is very small, the proposed MKL-MS provides lower OA than MKL-RS. But as the number of training samples increases, e.g., to 140 at the fourth iteration, MKL-MS substantially improves its classification accuracy, and gets the best performance among all methods. Besides, we can also notice that MKL-MS attains a quick convergence, after about 10 iteration with 100 target domain pixels selected, and its OA curve (in red) reaches 97.4% (about 4.6% performance gains over MKL-RS). This performance is never reached by the three baseline approaches during the 40 active learning iterations.

Table 3 further investigates the OA for different numbers of samples added during the AL process. As it could be expected, more added samples indicate a higher classification accuracy. Moreover, the proposed MKL-MS better behaves compared to other approaches with a obvious performance gap. Table 2

reports the results of three non-AL methods and four active learning based methods, in terms of OA, Kappa, and their standard deviations. Here, those non-AL algorithms like SKV, classic SVM and DTMKL utilize the whole source domain (i.e., 2723 samples in Pavia Center) as the training set to adapt the learnt model for target domain. We can we can notice that even though with much more training samples, the classification accuracy of non-AL methods still can not outperform AL strategies with DA technique, which even can not reach 50% of OA in Table

2. This is because that there exists huge distribution variation between the source and the target domains. Instead, our proposed method MKL-MS incorporating the active learning to select the most informative samples to reduce their distribution discrepancy, and thus reaches 97.4% OA with totally 400 samples, including 100 samples from source domain and 300 samples from target domain. The observation clearly verifies the effectiveness of our strategy combining MKL and AL.

To comprehensive evaluate the performance, we randomly select more and different source regions from the two datasets and keep the same target image as the prior experiments. The performance further confirm that our proposed method with MS strategy (MKL-MS) again obtain the best performance. This means that there exist the obvious domain shift, and our method can capture the correlations, select the most informative samples for a better classifier, and consistently achieve the best performance. Besides, we also compare our method with the state-of-the-art classification methods including SKV, DTMKL, and SVM. With the help the samples from target domains, RS, MS, MKL-RS, MKL-MS can significantly improve the performance. Besides, by considering the domain shift, MKL-MS method can obtain the best performance in most cases. This further confirm that our method can not only largely leverage the information from target domain, but also exploit the information from source domains, and thus show the robust and better performance in practice.

To further investigate the effect of the source region size, we gradually enlarge the source region from 3842 samples to 5338 ones on Pavia Center, and keep the target region unchanged. Fig. 3) demonstrate the region growth, and Table 3 lists the performance using different number of labeled samples in our framework, where we can see that MKL-MS outperforms MKL-RS consistently, and get the best performance compared to the other baselines (in Table 2) in all cases.

Method # target samples (q=20)
0 40 100 200
RS 62.290 87.385 89.955 91.030
MS 62.290 83.600 88.910 90.785
MKL-RS 48.575 88.625 90.320 91.850
MKL-MS 42.525 89.830 92.625 96.925
Table 4: Classification Accuracy (%) with Respective to the Number of Samples Added to Target Domain on University Area
Method # source/target samples OA Kappa
SKV 1907 / 0 48.350 0.3385
DTMKL 1907 / 0 44.800 0.298
SVM 1907 / 0 64.650 0.526
RS 100 / 300 91.430 0.4264 0.871 0.006
MS 100 / 300 91.610 0.5592 0.873 0.009
MKL-RS 100 / 300 93.480 1.214 0.901 0.019
MKL-MS 100 / 300 97.070 0.343 0.956 0.005
Table 5: Classification Accuracy (%) and Kappa Statistic on the University Area Dataset
region size Method # source/target samples OA Kappa



100/300 94.8250.825 0.9220.013
200/300 94.5530.157 0.9180.018


100/300 97.2500.235 0.9560.003
200/300 96.9500.453 0.9520.004



100/300 93.5501.55 0.9020.024
200/300 93.9500.05 0.9090.001


100/300 96.8790.225 0.9530.004
200/300 96.8250.225 0.9520.003
Table 6: Classification Accuracy (%) and Kappa Statistic on the University Area Dataset with different size of source region
(a) on Pavia center
(b) on Pavia center
(c) on University Area
(d) on University Area
Figure 4: The learning curves on Pavia center and University Area, where RS (cyan line) = random sampling, MS (blue line) = margin sampling, MKL-RS (green line) = random sampling combined with MKL, and MKL-MS (red line) = margin sampling combined with MKL. presents the number of pixels selected from the target domain.

4.2.2 Experiments on University Area Dataset

To comprehensively evaluate the proposed method, in Fig. 4(b) we further display the learning curves obtained on University Area dataset using the four active learning methods. The proposed MKL-MS algorithm clearly obtains the best results, which certainly demonstrates the advantage of MKL-MS. Similar to the results on Pavia center, at the beginning the proposed MKL-MS lies 15% below the traditional method (MS and RS) without adaptation. However, after three active iterations MKL-MS algorithm reaches nearly 90% OA. At the same time, it can be observed that MKL-RS also archives a promising learning curve, which is closest to that of MKL-MS among all methods. This fact confirms that selecting samples near the decision boundary helps boost the classification performance, and significantly outperforms the random sampling. In this figure, all methods converge quickly after adding 150 samples from the target domain, where we can see that using we can achieve much more efficiency with fewer active iterations than . Moreover, in all cases our proposed method MKL-MS achieves the best performance, i.e., nearly 97.0% OA, while only about 95.0% using MKL-RS, 91.5% using MS and RS.

In Table 4, we also list the classification accuracy of the four methods by varying the number (ranging from 0 to 200) of samples selected from target domain. From the table, we can see that MKL-MS always sustains the highest OA from the beginning of the AL process. Table 5 further reports the OA and Kappa results of three non-AL and four AL classification methods. For these non-AL algorithms, we still use all the source domain (i.e., 1907 samples in University Area) as training set to train the classifier. We can see that there exists significant domain shift between the source and target domains, by comparing the basic classification methods and the active learning methods with DA. Without considering the divergence between the source and target domains, all the non-AL methods that directly transfer the domain knowledge, get unsatisfying performance in both OA and Kappa. Owing to the AL strategies, our proposed approach MKL-MS only relies on a very few training samples and obtains promising results. For example, in Fig. 4(b) with it takes only 15 iterations (150 samples are labeled and added into training set) to converge at the level of 96.9% accuracy, and while with , 140 samples are selected totally. By comparing its performance to the three non-AL classification methods, we can conclude that the active learning strategy would be very helpful for discriminative classifier learning, when there exists a large distribution deviation from the source domain to the target domain. In Table 6 we also investigate the performance with respect to different source region size, and obtain the similar conclusion that our proposed method is able to robustly achieve the promising performance in different scenarios.

5 Conclusion

This paper presents a novel active framework MKL-AL based on domain adaptation for addressing the hyperspectral image (HSI) classification problem. It fully utilizes the label information from the auxiliary domains by compensating domain distribution shifts in an sequential active learning manner, which not only significantly boosts the classification accuracy but also saves the expensive computational cost and human labelling efforts. The extensive experimental results on two popular HSI datasets demonstrated that when a large bias exists between the source and target domains, the conventional DA classifier can not promise a satisfying performance. Instead, by actively expanding the training set without too much efforts, our method can efficiently improve the classification accuracy.

6 Acknowledgement

This work was supported by the National Natural Science Foundation of China (61402026 and 61572388), the Key R&D Program - The Key Industry Innovation Chain of Shaanxi (Grant No. 2017ZDCXL-GY-05-04-02 and Grant No. 2017ZDCXL-GY-05-02), Beijing Municipal Science and Technology Commission (Z171100000117022) and the Foundation of State Key Lab of Software Development Environment (SKLSDE-2016ZX-04).



  • (1) N. Alajlan, E. Pasolli, F. Melgani, A. Franzoso, Large-scale image classification using active learning, IEEE Geosci. Remote Sensing Lett. 11 (1) (2014) 259–263.
  • (2) N. Li, H. Zhao, P. Huang, G. Jia, X. Bai, A novel logistic multi-class supervised classification model based on multi-fractal spectrum parameters for hyperspectral data, Int. J. Comput. Math. 92 (4) (2015) 836–849.
  • (3)

    Z. Li, C. Li, C. Deng, J. Li, Hyperspectral image super-resolution using sparse spectral unmixing and low-rank constraints, in: IEEE IGARSS, 2016, pp. 7224–7227.

  • (4) L. Tong, J. Zhou, Y. Qian, X. Bai, Y. Gao, Nonnegative-matrix-factorization-based hyperspectral unmixing with partially known endmembers, IEEE Trans. Geoscience and Remote Sensing 54 (11) (2016) 6531–6544.
  • (5) Y. Tang, E. Fan, C. Yan, X. Bai, J. Zhou, Discriminative weighted band selection via one-class SVM for hyperspectral imagery, in: IEEE IGARSS, 2016, pp. 2765–2768.
  • (6)

    Hyperspectral image reconstruction by deep convolutional neural network for classification, Pattern Recognition 63 (2017) 371 – 383.

  • (7) J. Plaza, A. Plaza, R. Perez, P. Martinez, On the use of small training sets for neural network-based characterization of mixed pixels in remotely sensed hyperspectral images, Pattern Recognition 42 (11) (2009) 3032 – 3045.
  • (8) Y. Tarabalka, J. Chanussot, J. Benediktsson, Segmentation and classification of hyperspectral images using watershed transformation, Pattern Recognition 43 (7) (2010) 2367 – 2379.
  • (9)

    W. Li, S. Prasad, J. E. Fowler, Hyperspectral image classification using gaussian mixture models and markov random fields, IEEE Geoscience and Remote Sensing Letters 11 (1) (2014) 153–157.

  • (10) A. Samat, J. Li, S. Liu, P. Du, Z. Miao, J. Luo, Improved hyperspectral image classification by active learning using pre-designed mixed pixels, Pattern Recognition 51 (2016) 43 – 58.
  • (11) M. Ye, Y. Qian, J. Zhou, Y. Tang, Dictionary learning based feature level domain adaptation for cross-scene hyperspectral image classification, IEEE Trans. Geoscience and Remote Sensing (2016) 1544–1562.
  • (12) C. Yan, X. Bai, P. Ren, L. Bai, W. Tang, J. Zhou, Band weighting via maximizing interclass distance for hyperspectral image classification, IEEE Geosci. Remote Sensing Lett. 13 (7) (2016) 922–925.
  • (13) E. Zhang, X. Zhang, L. Jiao, L. Li, B. Hou, Spectral cspatial hyperspectral image ensemble classification via joint sparse representation, Pattern Recognition 59 (2016) 42 – 54.
  • (14) Probabilistic class structure regularized sparse representation graph for semi-supervised hyperspectral image classification, Pattern Recognition 63 (2017) 102 – 114.
  • (15) A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data, Pattern Recognition 63 (2017) 229 – 245.
  • (16) A. Ifarraguerri, C.-I. Chang, Unsupervised hyperspectral image analysis with projection pursuit, IEEE Transactions on Geoscience and Remote Sensing 38 (6) (2000) 2529–2538.
  • (17) S. Rajan, J. Ghosh, M. M. Crawford, An active learning approach to hyperspectral data classification, IEEE T. Geoscience and Remote Sensing 46 (4) (2008) 1231–1242.
  • (18) C. Chen, S. Li, H. Qin, A. Hao, Structure-sensitive saliency detection via multilevel rank analysis in intrinsic feature space, IEEE Transactions on Image Processing 24 (8) (2015) 2303–2316.
  • (19) L. Bruzzone, M. Marconcini, Domain adaptation problems: A DASVM classification technique and a circular validation strategy, IEEE Trans. Pattern Anal. Mach. Intell. 32 (5) (2010) 770–787.
  • (20) Z. Sun, C. Wang, H. Wang, J. Li, Learn multiple-kernel svms for domain adaptation in hyperspectral data, IEEE Geosci. Remote Sensing Lett. 10 (5) (2013) 1224–1228.
  • (21) L. Bruzzone, D. Fernández-Prieto, Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images, IEEE T. Geoscience and Remote Sensing 39 (2) (2001) 456–460.
  • (22) C. Persello, L. Bruzzone, A novel active learning strategy for domain adaptation in the classification of remote sensing images, in: 2011 IEEE International Geoscience and Remote Sensing Symposium, 2011, pp. 3720–3723.
  • (23) D. Tuia, E. Pasolli, W. Emery, Dataset shift adaptation with active queries, in: Urban Remote Sensing Event, 2011, pp. 121–124.
  • (24) C. Persello, L. Bruzzone, Active learning for domain adaptation in the supervised classification of remote sensing images, IEEE T. Geoscience and Remote Sensing 50 (11) (2012) 4468–4483.
  • (25) W. Dai, Q. Yang, G.-R. Xue, Y. Yu, Boosting for transfer learning, in: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, ACM, New York, NY, USA, 2007, pp. 193–200.
  • (26) D. Tuia, E. Pasolli, W. Emery, Using active learning to adapt remote sensing image classifiers, Remote Sensing of Environment 115 (9) (2011) 2232–2242.
  • (27) G. Schohn, D. Cohn, Less is more: Active learning with support vector machines, in: ICML, 2000, pp. 839–846.
  • (28) P. Mitra, B. U. Shankar, S. K. Pal, Segmentation of multispectral remote sensing images using active support vector machines, Pattern Recognition Letters 25 (9) (2004) 1067–1074.
  • (29) B. Demir, C. Persello, L. Bruzzone, Batch-mode active-learning methods for the interactive classification of remote sensing images, IEEE T. Geoscience and Remote Sensing 49 (3) (2011) 1014–1031.
  • (30) G. Camps-Valls, D. Tuia, L. Bruzzone, J. A. Benediktsson, Advances in hyperspectral image classification: Earth monitoring with statistical learning methods, IEEE Signal Process. Mag. 31 (1) (2014) 45–54.
  • (31) M. M. Crawford, D. Tuia, H. L. Yang, Active learning: Any value for classification of remotely sensed data?, Proceedings of the IEEE 101 (3) (2013) 593–608.
  • (32) F. Melgani, L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing 42 (8) (2004) 1778–1790.
  • (33) J. Xu, X. Liu, Z. Huo, C. Deng, F. Nie, H. Huang, Multi-class support vector machine via maximizing multi-class margins, in: IJCAI, 2017, pp. 3154–3160.
  • (34) X. Liu, J. He, B. Lang, Multiple feature kernel hashing for large-scale visual search, Pattern Recognition 47 (2) (2014) 748–757.
  • (35) S. Bandyopadhyay, S. Bandyopadhyay, Analysis of Biological Data: A Soft Computing Approach - Vol. 3, World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2007.
  • (36) G. Camps-Valls, L. Bruzzone, Kernel-based methods for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 43 (6) (2005) 1351–1362.
  • (37) G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances, J. Calpe-Maravilla, Composite kernels for hyperspectral image classification, IEEE Geoscience and Remote Sensing Letters 3 (1) (2006) 93–97.
  • (38) G. Camps-Valls, T. V. B. Marsheva, D. Zhou, Semi-supervised graph-based hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 45 (10) (2007) 3044–3054.
  • (39) T. Liu, Y. Gu, X. Jia, J. A. Benediktsson, J. Chanussot, Class-specific sparse multiple kernel learning for spectral spatial hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 54 (12) (2016) 7351–7365.
  • (40) D. Tuia, C. Persello, L. Bruzzone, Domain adaptation for the classification of remote sensing data: An overview of recent advances, IEEE Geoscience and Remote Sensing Magazine 4 (2) (2016) 41–57.
  • (41) S. Tong, D. Koller, Support vector machine active learning with applications to text classification, J. Mach. Learn. Res. 2 (2002) 45–66.
  • (42) P. Jain, S. Vijayanarasimhan, K. Grauman, Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning, in: Advances in Neural Information Processing Systems, 2010, pp. 928–936.
  • (43)

    L. Huang, Y. Liu, X. Liu, X. Wang, B. Lang, Graph-based active semi-supervised learning: A new perspective for relieving multi-class annotation labor, in: IEEE ICME, 2014, pp. 1–6.

  • (44) X. Liu, X. Fan, C. Deng, Z. Li, H. Su, D. Tao, Multilinear hyperplane hashing, in: IEEE CVPR, 2016, pp. 1–9.
  • (45) G. Matasci, D. Tuia, M. Kanevski, Svm-based boosting of active learning strategies for efficient domain adaptation, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 5 (5) (2012) 1335–1343.
  • (46) D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, W. J. Emery, Active learning methods for remote sensing image classification, IEEE T. Geoscience and Remote Sensing 47 (7-2) (2009) 2218–2232.
  • (47) S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt, M. Varma, Multiple kernel learning and the smo algorithm, in: Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010, pp. 2361–2369.
  • (48) A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet, Simplemkl, Journal of Machine Learning Research 9 (2008) 2491–2521.
  • (49) L. Duan, D. Xu, I. W. Tsang, J. Luo, Visual event recognition in videos by learning from web data, IEEE Trans. Pattern Anal. Mach. Intell. 34 (9) (2012) 1667–1680.
  • (50) K. M. Borgwardt, A. Gretton, M. J. Rasch, H. Kriegel, B. Schölkopf, A. J. Smola, Integrating structured biological data by kernel maximum mean discrepancy, in: Proceedings 14th International Conference on Intelligent Systems for Molecular Biology, 2006, pp. 49–57.
  • (51) D. Tuia, M. Volpi, L. Copa, M. F. Kanevski, J. Muñoz-Marí, A survey of active learning algorithms for supervised remote sensing image classification, J. Sel. Topics Signal Processing 5 (3) (2011) 606–617.
  • (52) A. Viera, J. Garrett, Understanding interobserver agreement: The kappa statistic, Family Medicine 37 (5) (2005) 360–363.