Log In Sign Up

Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization

Remote sensing image classification exploiting multiple sensors is a very challenging problem: data from different modalities are affected by spectral distortions and mis-alignments of all kinds, and this hampers re-using models built for one image to be used successfully in other scenes. In order to adapt and transfer models across image acquisitions, one must be able to cope with datasets that are not co-registered, acquired under different illumination and atmospheric conditions, by different sensors, and with scarce ground references. Traditionally, methods based on histogram matching have been used. However, they fail when densities have very different shapes or when there is no corresponding band to be matched between the images. An alternative builds upon manifold alignment. Manifold alignment performs a multidimensional relative normalization of the data prior to product generation that can cope with data of different dimensionality (e.g. different number of bands) and possibly unpaired examples. Aligning data distributions is an appealing strategy, since it allows to provide data spaces that are more similar to each other, regardless of the subsequent use of the transformed data. In this paper, we study a methodology that aligns data from different domains in a nonlinear way through kernelization. We introduce the Kernel Manifold Alignment (KEMA) method, which provides a flexible and discriminative projection map, exploits only a few labeled samples (or semantic ties) in each domain, and reduces to solving a generalized eigenvalue problem. We successfully test KEMA in multi-temporal and multi-source very high resolution classification tasks, as well as on the task of making a model invariant to shadowing for hyperspectral imaging.


page 5

page 11

page 12

page 17

page 19

page 20

page 22

page 23


Semisupervised Manifold Alignment of Multimodal Remote Sensing Images

We introduce a method for manifold alignment of different modalities (or...

Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods

Hyperspectral images show similar statistical properties to natural gray...

Kernel Manifold Alignment

We introduce a kernel method for manifold alignment (KEMA) and domain ad...

Representation Learning for Remote Sensing: An Unsupervised Sensor Fusion Approach

In the application of machine learning to remote sensing, labeled data i...

Randomized kernels for large scale Earth observation applications

Dealing with land cover classification of the new image sources has also...

1 Introduction

Many real-life problems currently exploit heterogeneous sources of remote sensing data: forest ecosystems studies [1, 2], post-catastrophe assessment [3, 4] or land-use updating [5, 6, 7] take advantage of the wide coverage and short revisit time of remote sensing sensors. They typically design specific image processing pipelines to produce maps of a product of interest. Despite the promises of remote sensing to tackle such ambitious problems, two main obstacles prevent this technology from reaching a broader range of applications: on the one hand, there is generally a lack of labeled data present at each acquisition and, on the other hand, the models need to be capable of dealing with images obtained under different conditions and thus potentially with different sensors.

Working under label scarcity has been extensively considered in recent remote sensing image processing literature by means of optimizing the use of the few available labels [8]

. In our view, the problem of adapting remote sensing classifiers boils down to compensating for a variety of distortions and mis-alignments: for example, data resolution may differ or seasonal conditions might offer remarkable differences in the spectral signatures observed. When the images cover the same area, registration can be approximate. Moreover, each scene depends on its particular illumination and viewing geometry, which causes spectral signatures to shift among acquisitions 

[9]. As a consequence, it becomes difficult, often impossible, to re-use field data acquired on a given campaign to process newly acquired images. Transferring models from one remote sensing image acquisition to the other can be a very challenging task.

Adapting classifiers to (even slightly) shifted data distributions is an old problem in remote sensing, which started in the 1970s with the signature extension field [10, 11], and then evolved, due to the technological advances in both sensor and processing routines, into what is generally referred to as the transfer learning problem [12, 13]

. By transfer learning, we mean all kind of methodologies aiming at making models

transferable across image/data acquisitions. In recent remote sensing literature, works have mainly considered three research directions [14]: 1) unifying the data representation, for example via atmospheric correction [15]

, feature selection 

[16], or feature extraction [17, 18, 19]; 2) incorporating invariances in the classifier, for example via synthetic (‘virtual’) examples [20] or physically-inspired features [21, 22]; and 3) adapting the classifier to cope with the shift among acquisitions, for example via semi-supervised-inspired strategies [23, 24]

or active learning 


Most of the methodologies above rely on the fact that all images are acquired by the same sensor (i.e. they share the same -dimensional data space, as well as the nature -and physical meaning- of the features), or that all information and know-how necessary to convert to surface reflectance is available to the user performing the analysis, which is unfortunately often not the case. Moreover, at the application level there is generally no requirement of sticking to a specific sensor (taking the example of post-catastrophe intervention, the fact of waiting for the next cloud-free image of a specific sensor can mean the loss of human lives): since more and more images are currently available to the general public and organizations, new transfer learning approaches must be capable to unify data from different sensors, at different resolutions, without co-registration, and without being specific to a given end classifier [26]. The recently proposed manifold alignment methods gather all these properties.

Manifold alignment [27]

is a machine learning framework aiming at matching, or

aligning, a set of domains (the images) of potentially different dimensionality using feature extraction under pairwise proximity constraints [28]. In some sense, manifold alignment performs registration in the feature space and matches corresponding samples, where the correspondence is defined by a series of proximity graphs encoding some prior knowledge of interest (e.g. co-location, class consistency). An intuition of how manifold alignment functions is provided in Fig. 1. Its application to remote sensing data is relatively recent: in [29], authors presented the semi-supervised manifold alignment method (SSMA), which gathers all properties above, but at the price of requiring labeled pixels in all domains to perform the alignment. Authors in [30] studies issued of spatial consistency and in [31] they propose a multi-scale alignment procedure not relying on labels in all domains. Finally, true colour visualization for hyperspectral data was tackled in [32].

In this paper, we study the effectiveness of the nonlinear counterpart of SSMA, the Kernel Manifold Alignment (KEMA [33]), as well as its relevance for remote sensing problems. KEMA is a flexible, scalable, and intuitive method for aligning manifolds. KEMA provides a flexible and discriminative projection function, only exploits a few labeled samples (or semantic ties [34], when images are roughly registered – see Section 3.3) in each domain, and reduces to solving a simple generalized eigenvalue problem.

KEMA is introduced in Section 2. In Section 3, we test it in several real-life scenarios, including multi-temporal and multi-source very high resolution image classification problems, as well as in the challenging task of making a model shadow-invariant in hyperspectral image classification. Section 4 concludes the paper.

Figure 1: Illustration of KEMA aligning data distributions in a multi-sensor setting.

2 Kernel Manifold Alignment (KEMA)

In this section, we detail the KEMA method. We first recall the linear counterpart, the SSMA method [35]. Noting the main problems of this method, we introduce KEMA as a solution to address them. The reader interested in more theoretical details of KEMA can find them in [33]. Code can be found at the URL:

2.1 Notation

To fix notation, we consider a series of domains. For each one of them, we have a data set: , where is the number of samples issued from domain with data dimensionality , and . Some of the pixels in are labeled (), and most are unlabeled. From one domain to another, the data are not necessarily semantically paired, i.e. , nor it is mandatory that all domains have the same dimension, i.e. .

2.2 Semi-supervised manifold alignment (SSMA)

The linear SSMA method was originally proposed in [35] and successfully adapted to remote sensing problems in [29]. The SSMA method aligns data from all domains by projecting them into a common latent space using a set of domain-specific projection functions, , collectively grouped into the projection matrix . The latent space has two properties: it is discriminant for classification and respects the original geometry of each manifold. To do so, SSMA tries to find a data projection matrix that maximizes the following cost function

where we aim to maximize a topology/geometry (GEO) and a class similarity (SIM) terms while minimizing a class dissimilarity term (DIS) between all samples, and is a parameter controlling the contribution of the similarity and the topology terms. The three terms correspond to:

  1. a geometry-preservation term, GEO, forcing the local geometry of each manifold to remain unchanged, i.e. penalizing projections mapping neighbors in the input space far from each other,


    where is a similarity matrix returning the value if two pixels of domain are neighbours in the original feature space and otherwise. is typically a -NN graph. is the graph Laplacian issued from the similarity matrices , stacked in a block-diagonal matrix. All the out-of-diagonal blocks of are empty, since we do not want to preserve neighbourhood relationships between the images.

  2. a class similarity term, SIM, penalizing projections mapping samples of the same class far from each other,


    where is a similarity matrix returning the value if two pixels from domains and belong to the same class. These are the tie points performing registration in the spectral space, and are used to match the images to each other.

  3. a class dissimilarity term, DIS, penalizing projections mapping pixels of different classes close to each other.


    where is a dissimilarity matrix returning the value if two pixels from domains and belong to different classes. These tie points prevent the solution to collapse in a single point and, together with the SIM term, foster the latent space to be discriminative.

Now, by combining Eqs. (1)-(3), it is straightforward to show that the solution boils down to finding the last eigenvalues of the following generalized eigenproblem [35], which is directly derived:


where is a block-diagonal matrix containing the data from the different domains to be aligned. is the researched common projection matrix of size , with . The rows of contain a block of projectors for each domain, scaled by , in a particular block structure:


where the eigenvectors for the first domain are highlighted in green.

Once the projection matrix is obtained, any sample from domain (one of the domains considered) can be projected in the latent space by using the corresponding () block of eigenvectors :


As for (k)PCA and other methods based on eigen-decomposition, the data can be projected onto a subspace of dimension lower than by simply using only the first columns of . In this sense, SSMA leaves some control on the dimensionality of the latent space for class separation.

2.3 Kernel Manifold Alignment (KEMA)

The idea behind kernelization is to map the data into a high dimensional Hilbert space with the mapping function such that the mapped data is better suited for solving our problem. This technique has found wide adoption in many remote sensing data analysis problems [36]. In practice, computing this mapping explicitly can be prohibitive due to its high dimensionality. This can be avoided by expressing the problem in terms of dot products within . We can then define an easy-to-compute kernel function returning similarities between mapped samples without having to compute explicitly.

In the multi-modal setting considered here, we would have to map the datasets to Hilbert spaces of dimension , ,

. Then, we replace all the samples with their mapped feature vectors. The

GEO, SIM and DIS terms become:


As for the SSMA case, combining Eqs. (2.3)-(9) leads to a generalized eigendecomposition problem:

where is a block diagonal matrix containing the data matrices and contains the eigenvectors organized in rows for the particular domain defined in Hilbert space , where . As stressed above, and live in a high dimensional space that might be very costly or even impossible to compute. Therefore, we express the eigenvectors as a linear combination of mapped samples using the Representer’s theorem [37] , (or in matrix notation):


where is a block diagonal matrix containing the kernel matrices . Now the eigenproblem becomes of size instead of , and we can extract a maximum of components.


This dual formulation is advantageous when dealing with very high dimensional datasets, for which the SSMA problem is not well-conditioned. Operating in

-mode endorses the method with numerical stability and computational efficiency in current high-dimensional problems, e.g. when using Fisher vectors or deep features for data representation. This type of problems with much more dimensions than points are becoming more and more prominent in remote sensing 

[38, 39]. In this sense, even KEMA with a linear kernel (which corresponds to the SSMA solution) becomes a valid solution for these problems, as it has all the advantages of methods related to (kernel) Canonical Correlation Analysis ((k)CCA [40]), but can also deal with unpaired data.

Projection of a new test vector to the latent space requires first mapping it to its corresponding kernel form and then applying the corresponding projection vector defined therein:


where is a vector of kernel evaluations between sample and all samples from domain used to define the projections . Therefore, projection to the kernel latent space is possible through the use of dedicated reproducing kernel functions.

3 Experimental Results

In this section, we present experimental results in three challenging remote sensing problems: multi-temporal / multi-source VHR classification, shadow removal in hyperspectral images, and multi-source image alignment without labels.

3.1 Multi-temporal and multi-sensor VHR classification

The first experiment is a direct comparison to the multi-source experiment reported in [29]. We consider three VHR images (Fig. 2) depicting peri-urban settlements:

  • Prilly: the first image is acquired by the WorldView-2 VHR satellite (8 visible and near-infrared bands) over Prilly, a residential neighborhood of Lausanne, Switzerland. The image is acquired on August 2, 2011 and has been pansharpened using the Gram-Schmid transform to a resolution of approximatively 0.7m.

  • Malley: the second image is also acquired by WorldView-2 over another residential neighborhood of Lausanne, Montelly. The image is acquired on September 29, 2010 and has also been pansharpened using the Gram-Schmid transform to 0.7m.

  • Zurich: the third image is acquired by the QuickBird satellite (4 bands, RGB- NIR) over a residential neighborhood of Zurich, Switzerland. The image has been acquired on October 6, 2006 and pansharpened.

Prilly (WV2) Montelly (WV2) Zurich (QB)
Figure 2: The WorldView-2 (WV2) and QuickBird (QB) images used in the remote sensing semantic classification experiments. Color legend: residential, meadows, trees, roads, shadows, commercial building, railway, bare soil, highway.
Image predicted
Prilly Montelly Zurich (4 bands)

Leading training image



Zurich (QuickBird, 4 bands)

Figure 3: Numerical results for the multi-source experiment. Rows indicate the image from which 100 labeled pixels per class are used ( per class). performances for increasing number of labeled pixels in the two other images ( per class) are reported. Columns correspond to the image that has been used for testing. The baseline is the model obtained using 100 pixels per class from the test image only.

For each image, a ground truth consisting of 9 classes is available (see bottom row of Fig. 2). We follow the experimental protocol of [29]: from all the available labeled pixels in each image, 50% are kept apart as the testing set. The remaining 50% are used to extract the labeled and unlabeled pixels composing the sets. We then extract labeled pixels per class from what we call the leading domain image, which is the image carrying most labeled samples (we take each image in turn as the leading domain image). In our setting, we also need labeled pixels from the two other acquisitions: we tested an increasing additional of labeled samples, pixels per class. As in [29], the unlabeled examples are selected using an iterative clustering algorithm, the bisecting -means [41], which runs -means with clusters iteratively, by splitting the current largest cluster in the dataset. This way, we sample unlabeled examples per each image source. We use the labeled and unlabeled examples to extract both the SSMA and KEMA projections and then project all images in the latent space. Finally, we use all the projected labeled examples to train a single classifier (a linear SVM) in the latent space. This classifier is used to predict all the test pixels of all three images at once (i.e. no specific training is performed for the specific images separately).

In KEMA, we use RBF kernels with the bandwidth fixed as half the median distance between the samples of the specific image (labeled or unlabeled). By doing so, we allow different kernels in each domain, thus tailoring the similarity function to the data structure observed [33]. To build the graph Laplacians, we used a series of graphs built using -NN graphs with as in [29]. We validated the optimal number of dimensions, as well as the optimal parameter in the SVM classifier using the labeled samples in a cross-validation setting. Finally, as in [29] we add a baseline, which is the classifier learned with the original features. Since the Zurich image has a different input space than the two others, only the common bands between QuickBird and WorldView-2 are considered.

The results are reported in Fig. 3. Two distinct behaviours are observed:

  • Diagonal blocks of Fig. 3 (when predicting the leading domain image, which carried most labels): in this cases, the predictions of KEMA are better than those of SSMA by and remain consistent when adding samples from the other domains. This means that the images are aligned correctly and the inclusion of labels from other images does not disturb the classifier (as in the ‘no adaptation’ case). On the contrary, adding labeled samples from the other images is beneficial, as one can observe by comparing the KEMA results with the optimal case obtained when using only the 100 labeled pixels per class from the leading image (green bars): the final prediction is 5-10% more accurate than in the case, where the leading image is used alone (i.e. without extra labeled samples coming from the other acquisitions). This means that the extra labeled are aligned correctly, since the classifier trained with aligned examples per class outperforms the one obtained with pixels per class.

  • Off-diagonal blocks of Fig. 3 (when predicting the two other, scarcely labeled images): in the off-diagonal blocks we can observe a constant improvement of the results obtained by SSMA, which corresponds already to a strong improvement over the ‘no adaptation’ case. The improvement of KEMA with respect to the latter is more striking () when using little labels from the test images. In comparison to SSMA we observe a constant improvement.

3.2 Shadow compensation in hyperspectral image classification

In this experiment, we aim at compensating the reduction in reflectance due to a shadow casted by a large cloud. We consider a hyperspectral image acquired by the CASI sensor over Houston (see Fig. 4a) ). The data were originally provided to the community for the data fusion contest 2013 [42]111The data can be found at The contest was framed as a land use classification contest, where 15 land use classes were to be detected using two data sources: the hyperspectral image mentioned and a LiDAR DSM. The specificity of the contest is that the test pixels are partly located under a shadow cast by clouds (see Fig. 5d), thus raising the need for compensation algorithms. In our analysis, we compare three strategies for handling the hyperspectral image: using it without further processing (‘Raw’), applying a histogram matching (HM) on the shadowed area (the strategy also used before extracting features in [43]) and the proposed KEMA aligning the pixels under the shadow and those illuminated. For both the HM and KEMA, we define the shadowed pixels by defining a cloud mask by thresholding band 130 and then applying morphological operators to remove salt and pepper noise within the bigger connected component representing the shadow (cf. the mask in Fig. 4d).

In this experiment, we align the dataset using 20 labeled pixels per class. We use only classes occurring in both domains (shadowed and illuminated). Additionally, we sample randomly 200 unlabeled pixels per class. As for the first example, the kernel used in KEMA is an RBF with

bandwidth estimated as half of the median distance between the points of the domain. This is very important in this experiment, since it allows to have a much narrower bandwidth for the kernel acting on the shadowed domain than the one used in the illuminated domain. We classify using a support vector machine with RBF kernel, whose parameters are found by cross validation (

, ). We train the classifier on 95% of the training set available and predict on two validation datasets: the entire test set and the test samples under the shadowed area. We consider three feature sets, as detailed in Table 1

, and use them in three experiments: the first using only the HSI, the second adding LiDAR-derived features, and the third adding contextual features extracted from the optical bands. A last setting, called MV, uses all features, and also applies a majority voting on the solution. The experiments are repeated 10 times by varying the labeled pixels in KEMA and those picked for classification: therefore we report the average and standard deviation.


Figure 4: Domains reprojected by KEMA. (a): original CASI image. (b): first three dimensions of the latent space (R: 1, G: 2, B: 3). (c): dimensions 4-6. (d): cloud mask defining the two domains.
HSI Hyperspectral bands (144) KEMA aligned features (50)
LiDAR LiDAR band + opening and closing by reconstruction features with convolution of size pixels (7)
AVG Average filters, window size , applied on the:
10 first principal component projections (10) 10 first KEMA projections (10)
Table 1: Three feature types used in the experiment. Number in brackets is the number of features involved in each group.

The projections extracted by KEMA are visualized in Fig. 4 (geographical space, for projections and ) and Fig. 6 (feature space for dimensions ). At a first glance, the aligned features seem to be less dependent on the presence of the shadow than the original image (some artifacts remain at the border, due to the binary nature of the cloud mask). This is confirmed in the feature space, where the two domain seem correctly aligned both in terms of classes and domains.

Figure 5: Classification maps for the three settings (Raw, HM and KEMA). (left) using the spectral bands; (right) performing a majority voting on the map obtained by staking HSI, LiDAR and AVG features (for averaged numerical results, see Tab. 2). Bottom line shows the test samples and the cloud mask.
(a) per class (b) per domain
Figure 6: Projection per class (a) and per domain (b, shadow is in blue and illuminated in red) for the Houston data.

The classification results reported in Table 2 confirm these intuitions: KEMA is able to provide higher classification performance by working in the aligned latent space. The use of the raw images (‘Raw’ column), even though satisfactory on the global test set (OA of 85.5% in the best case), completely fails under the shadowed area (best OA: 23.8%). This can be also appreciated in the classification maps (first row in Fig. 5): from the maps it is clear that the shadow drains most of the shadowed pixels in the class ‘water’ (in cyan). Even including LiDAR features (right column of Fig. 5) does not solve entirely the problem and basically shifts most of the shadowed pixels in the class ‘highway’ (in beige). Using HM improves drastically the solution under the shadow, since the accuracy goes from 23.8% to 75.1% on average. Histogram matching solves the problem globally and provides the scaling and centering of the histogram necessary to make the images more similar, but still fails at accounting for subtle local variations, thus still leading to heavy misclassifications in the final map, in particular the highway being classified as buildings (see second row of Fig. 5). Finally, KEMA solves the problem locally by the flexibility of the kernel mapping: the accuracies are the highest (also matching those of the winners of the contest, who created an entirely ad-hoc system for this specific image) and reach an average of 94.3%, but also show an almost identical performance in the shadowed area (91.5%). The alignment has made the two domains more similar and the mismatch between domains becomes almost invisible in the classification maps (third row of Fig. 5).

Entire test set
HSI processing: Raw HM KEMA (us)
HSI 71.0 0.1 79.5 0.4 83.8 1.9
+ LiDAR 83.4 0.2 86.4 0.7 89.4 1.4
+ AVG 85.1 0.2 84.5 0.4 93.0 0.8
+ MV 85.5 0.2 86.0 0.3 94.3 0.8
Shadowed areas in the test set
HSI 04.2 0.1 67.4 0.7 70.0 1.0
+ LiDAR 22.5 0.3 77.1 1.3 82.6 5.4
+ AVG 23.2 1.2 73.6 0.8 90.4 4.9
+ MV 23.8 1.2 75.1 0.9 91.5 4.5
Table 2: Classification results (Overall accuracy, in %) for the Houston data.

3.3 Multi-source image classification without labels

In the last experiment, we break the requirement for labeled data in all domains. To do so, we need to reduce the flexibility of KEMA by adding a requirement on partial spatial overlap between the scenes. This can be understood as follows: KEMA is a spectral registration method that uses the labels as anchor points (or ties) to register the domains spectrally. If one of the domains is unlabeled, it is not possible to register them, since the and matrices in Eq. (10) cannot be computed. As a consequence, we can only preserve the inner domain geometry using , but there is no way to find the matching between domains.

Figure 7: Setting of the multi-source experiment. The cyan square represents the source domain image (RGB) and the red square the target domain image (NIR-R-G). They share a spatial subset, where the semantic ties are used to align the domains. The dark blue, green and yellow square are the image detailed in Fig. 8, used for both the semantic ties definition and the numerical assessment.
Prilly: source domain Spatially overlapping area Renens: target domain (unlabeled)
(RGB) with semantic ties (NIR-R-G)
(a) (b) (c)
Figure 8: Images involved in the multi-source experiment (corresponding to the dark blue, green and yellow squares in Fig. 7).

When using geographical data (as remote sensing data), a special case can break this requirement: whenever the domains are (at least partially) co-located in space. In this case, represented in Fig. 7, the two images share a spatial region, where we can co-locate objects, for instance by feature keypoint matching or by manual registration. Once these matches are found, they can be used to build the matrix , since, even if we ignore their class, we know that the pixels of the objects matched belong to the same class (they are known as semantic ties [34]). This type of weakly supervised alignment has been recently proposed in [44] and we use it here prior to aligning the data spaces with KEMA. The experiment is set as follows:

  • We use an RGB image (0.6m resolution) over the area of Prilly, a neighbourhood of Lausanne, Switzerland as source domain. The area is labeled into five classes (roads, buildings, trees, grass and shadows) by manual photo-interpretation, see Fig. 8a.

  • An FCIR (false colour infrared with NIR-G-B bands) ortho-photo of the area of Renens (another neighbourhood of Lausanne), at 0.25 cm resolution, is used as target domain and the labels are this time kept hidden (they are only used for validation), see Fig. 8c.

  • To find the projections with KEMA, we use an overlapping area between the two images. The overlapping areas are not registered nor they are at the same spatial resolution: to match them, we provide 40 tie object by manual drawing in both images (the operation takes less than 5 minutes), see Fig. 8b.

We use the labels in the source and the semantic ties to construct the matrix. For the matrix, we extracted the graph Laplacian from a dissimilarity matrix with values for pixels from different classes in the source and when issued from different objects in the semantic ties. We give a smaller penalization in the latter case, since two pixels coming from different objects can still belong to the same class. Once the domains are aligned, we train a linear SVM with 100 labeled pixels per class from the source domain (the RGB image) and test 400 pixels per class in the target domain (the FCIR image).

The projections retrieved are illustrated in Fig. 9: as for the previous examples, KEMA shows aligned data spaces, but also discriminative in terms of objects aligned: the bottom line in Fig. 9 illustrates six objects among the 40 semantic ties used to find the alignment. Figure 10 reports the classification performance in the FCIR domain: starting with six dimensions, KEMA outperforms the case where the RGB image is used to predict the FCIR one without any adaptation222To maximize the performance of the ‘no alignment’ case, we use the bands that share comparable wavelengths across domains: , .: when using 13 dimensions, KEMA performs comparably to a model trained on labeled pixels form the target domain itself (green line in the figure). We compare these results to those obtained by applying kCCA [40]. In order to compute the projection, we considered each object (each semantic tie in Fig. 8b) as a sample and used the spectrum of the most similar pixel to the object average to describe it. We then extract the kCCA projections between the 40 pairs of corresponding objects across image acquisitions. Back to the numerical results in Fig. 10,

the performance of KEMA is consistently better than that of kCCA. This is probably due to two reasons: 1) the fact that KEMA doesn’t need a one-to-one correspondence and thus all the pixels in an object are taken into account for the projection and 2) that class separability is explicitly taken into account by using the labels in the source domain.

a            Unprojected                        KEMA

Figure 9: Projections found by KEMA, colored by domain (top) and by object in the semantic ties set (bottom, six objects shown). The left panel shows the unprojected data [x axis: R, y axis: G, z axis: NIR or B], the right panel shows the projections by KEMA [Projections 1, 3 and 5].
Figure 10: Classification performances by a linear SVM using the labeled samples from the source domain (RGB) as they are (red line) or projected by KEMA (blue line) or kCCA (magenta line). In green a baseline obtained by training with labeled pixels form the target domain (FCIR).

4 Conclusions

In this paper, we presented a manifold alignment method based on kernels. The presented KEMA method is a feature extractor that finds projections from all the available source domains into a joint latent

space, where data is semantically aligned and class separability enhanced. Compared to recent manifold alignment methods, KEMA offers a more flexible framework, going beyond simple linear transformations (scalings and rotations) of the input data. KEMA exploits a few labeled samples (or semantic ties) in each domain along with the wealth of unlabeled samples. KEMA reduces to solving a simple generalized eigenvalue problem, and has very few (and interpretable) hyperparameters to tune. We successfully tested KEMA in multi-temporal and multi-source very high resolution classification tasks, as well as on the task of making a model invariant to shadows for hyperspectral imaging.

KEMA can be seen as a multivariate method for data pre-processing in general applications where multi-sensor, multi-modal, sensory data is acquired. The generality of the approach opens a wide field in remote sensing data processing applications. Our next steps with KEMA involve 1) performing semi-automatic atmospheric compensation in multi-temporal settings, 2) reduce the impact of the few labeled examples needed to perform the alignment, and 3) extend KEMA for challenging regression problems.


The authors would like to thank the Hyperspectral Image Analysis group and the NSF Funded Center for Airborne Laser Mapping (NCALM) at the University of Houston for providing the hyperspectral data used in the shadow correction experiment, and the IEEE GRSS Data Fusion Technical Committee for organizing the 2013 Data Fusion Contest. They also would like to thank Swisstopo ( for making available the FCIR orthophotos for academic use.


  • [1] G. P. Asner, D.E. Knapp, E.N. Broadbent, P.J.C. Oliveira, M. Keller, and J.N. Silva, “Ecology: Selective logging in the Brazilian Amazon,” Science, vol. 310, pp. 480–482, 2005.
  • [2] G. P. Asner, E. N. Broadbent, P. J. C. Oliveira, M. Keller, D. E. Knapp, and J. N. M. Silva, “Condition and fate of logged forests in the Brazilian Amazon,” Proc. Nat. Ac. Science (PNAS), vol. 103, no. 34, pp. 12947–12950, 2006.
  • [3] D. Brunner, G. Lemoine, and L. Bruzzone, “Earthquake damage assessment of buildings using VHR optical and SAR imagery,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2403–2420, 2010.
  • [4] H. Taubenböck, M. Wurm, M. Netzband, H. Zwenzner, A. Roth, A. Rahman, and S. Dech, “Flood risks in urbanized areas - multi-sensoral approaches using remotely sensed data for risk assessment,” Nat. Hazards Earth Sys. Science, vol. 11, pp. 431–444, 2011.
  • [5] L. Bruzzone and D. Fernandez-Prieto, “Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 2, pp. 456–460, 2001.
  • [6] A. A. Nielsen, “Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data,” IEEE Trans. Im. Proc., vol. 11, no. 3, pp. 293–305, 2002.
  • [7] J. Amorós-López, L. Gómez-Chova, L. Alonso, L. Guanter, R. Zurita-Milla, J. Moreno, and G. Camps-Valls, “Multitemporal fusion of Landsat/TM and ENVISAT/MERIS for crop monitoring,” Int. J. Appl. Earth Obs. Geoinf., in press.
  • [8] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, “Advances in hyperspectral image classification,” IEEE Signal Proc. Mag., vol. 31, pp. 45–54, 2014.
  • [9] G. Matasci, N. Longbotham, F. Pacifici, Kanevski M., and D. Tuia, “Understanding angular effects in VHR imagery and their significance for urban land-cover model portability: a study of two multi-angle in-track image sequences,” ISPRS J. Int. Soc. Photo. Remote Sens., vol. 107, pp. 99–111, 2015.
  • [10] M. D. Fleming, J. S. Berkebile, and R. M. Hoffer, “Computer-aided analysis of LANDSAT-I MSS data: a comparison of three approaches, including a “modified clustering” approach,” LARS information note 072475, Purdue University, 1975.
  • [11] I. Olthof, C. Butson, and R. Fraser, “Signature extension through space for northern landcover classification: A comparison of radiometric correction methods,” Remote Sens. Environ., vol. 95, no. 3, pp. 290–302, 2005.
  • [12] S. J. Pan and Y. Qiang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, 2010.
  • [13] V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adaptation: a survey of recent advances,” IEEE Signal Proc. Mag., vol. 32, no. 3, pp. 53–69, 2015.
  • [14] D. Tuia, C. Persello, and L. Bruzzone, “Recent advances in domain adaptation for the classification of remote sensing data,” IEEE Geosci. Remote Sens. Mag., in press.
  • [15] L. Guanter, R. Richter, and H. Kaufmann, “On the application of the MODTRAN4 atmospheric radiative transfer code to optical remote sensing,” Int. J. Remote Sens., vol. 30, pp. 1407–1424, 2009.
  • [16] L. Bruzzone and C. Persello, “A novel approach to the selection of spatially invariant features for the classification of hyperspectral images with improved generalization capability,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 9, pp. 3180–3191, 2009.
  • [17] M. Volpi, G. Camps-Valls, and D. Tuia, “Spectral alignment of cross-sensor images with automated kernel canonical correlation analysis,” J. Int. Soc. Photo. Remote Sens., vol. 107, pp. 50–63, 2015.
  • [18] H. Sun, S. Liu, S. Zhou, and H. Zou, “Unsupervised cross-view semantic transfer for remote sensing image classification,” IEEE Geosci. Remote Sens. Lett., vol. PP, no. 99, pp. 1–5, 2016.
  • [19] H. Sun, S. Liu, S. Zhou, and H. Zou, “Transfer sparse subspace analysis for unsupervised cross-view scene model adaptation,” IEEE J. Sel. Topics Appl. Earth Observ., vol. PP, no. 99, pp. 1–9, 2016.
  • [20] E. Izquierdo-Verdiguier, V. Laparra, L. Gómez-Chova, and G. Camps-Valls, “Encoding invariances in remote sensing image classification with SVM,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 5, pp. 981–985, 2013.
  • [21] F. Pacifici, N. Longbotham, and W. J. Emery, “The importance of physical quantities for the analysis of multitemporal and multiangular optical very high spatial resolution images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6241–6256, 2014.
  • [22] J. Verrelst, J. G. P. W. Clevers, and M. E. Schaepman, “Merging the Minnaert-k parameter with spectral unmixing to map forest heterogeneity with CHRIS/PROBA data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4014–4022, 2010.
  • [23] S. Rajan, J. Ghosh, and M. Crawford, “Exploiting class hierarchy for knowledge transfer in hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 11, pp. 3408–3417, 2006.
  • [24] L. Bruzzone and M. Marconcini, “Domain adaptation problems: A DASVM classification technique and a circular validation strategy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 770–787, 2010.
  • [25] G. Matasci, D. Tuia, and M. Kanevski, “SVM-based boosting of active learning strategies for efficient domain adaptation,” IEEE J. Sel. Topics Appl. Earth Observ., vol. 5, no. 5, pp. 1335–1343, 2012.
  • [26] L. Gómez-Chova, D. Tuia, G. Moser, and G. Camps-Valls, “Multimodal classification of remote sensing images: A review and future directions,” Proceedings of the IEEE, vol. 103, no. 9, pp. 1560–1584, 2015.
  • [27] C. Wang, P. Krafft, and S. Mahadevan, “Manifold alignment,” in Manifold Learning: Theory and Applications, Y. Ma and Y. Fu, Eds. CRC Press, 2011.
  • [28] J. Ham, D. D. Lee, and L. K. Saul, “Semisupervised alignment of manifolds,” in

    Proc. Int. Workshop Artificial Intelligence and Statistics

    , 2005.
  • [29] D. Tuia, M. Volpi, M. Trolliet, and G. Camps-Valls, “Semisupervised manifold alignment of multimodal remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 12, pp. 7708–7720, 2014.
  • [30] H.L. Yang and M.M. Crawford, “Spectral and spatial proximity-based manifold alignment for multitemporal hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 1, pp. 51–64, 2016.
  • [31] H.L. Yang and M.M. Crawford, “Domain adaptation with preservation of manifold geometry for hyperspectral image classification,” IEEE J. Sel. Topics Appl. Earth Observ., vol. PP, no. 99, pp. 1–13, 2016.
  • [32] D. Liao, D. Qian, J. Zhou, and Y.Y. Tang, “A manifold alignment approach for hyperspectral image visualization with natural color,” IEEE Trans. Geosci. Remote Sens., in press.
  • [33] D. Tuia and G. Camps-Valls, “Kernel manifold alignment for domain adaptation,” PLoS ONE, vol. 11, no. 2, pp. e0148655, 2016.
  • [34] J. Montoya-Zegarra, C. Leistner, and K. Schindler, “Semantic tie points,” in Proc. IEEE WACV, Clearwater Beach, FL, 2013.
  • [35] C. Wang and S. Mahadevan, “Heterogeneous domain adaptation using manifold alignment,” in International Joint Conference on Artificial Intelligence (IJCAI), 2011.
  • [36] G. Camps-Valls and L. Bruzzone, Eds., Kernel methods for Remote Sensing Data Analysis, Wiley & Sons, UK, Dec 2009.
  • [37] S. Yan, D. Xu, B. Zhang, H.J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 29, no. 1, pp. 40–51, 2007.
  • [38] A. Lagrange, B. Le Saux, A. Beaupere, A. Boulch, A. Chan-Hon-Tong, S. Herbin, H. Randrianarivo, and M. Ferecatu, “Benchmarking classification of earth-observation data: From learning explicit features to convolutional networks,” in Proc. IGARSS, Milan, Italy, 2015, pp. 4173 – 4176.
  • [39] D. Marmanis, M. Datcu, T. Esch, and U. Stilla,

    Deep-learning earth observation classification using imagenet pre-trained networks,”

    IEEE Geosci. Remote Sensing Lett., in press.
  • [40] P. L. Lai and C. Fyfe, “Kernel and nonlinear canonical correlation analysis.,” in Int. J. Neural Sys., 2000, pp. 365–377.
  • [41] R. Kashef and M.S. Kamel, “Enhanced bisecting -means clustering using intermediate cooperation,” Pattern Recogn., vol. 42, no. 11, pp. 2257–2569, 2009.
  • [42] C. Debes, A. Merentitis, R. Heremans, J. Hahn, N. Frangiadakis, T. van Kasteren, W. Liao, R. Bellens, A. Pizurica, S. Gautama, W. Philips, S. Prasad, Q. Du, and F. Pacifici, “Hyperspectral and lidar data fusion: Outcome of the 2013 GRSS Data Fusion Contest,” IEEE J. Sel. Topics Appl. Earth Observ. and Remote Sensing,, vol. 7, no. 6, pp. 2405–2418, 2014.
  • [43] D. Tuia, N. Courty, and R. Flamary, “Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions,” ISPRS J. Int. Soc. Photo. Remote Sens., vol. 105, pp. 272–285, 2015.
  • [44] D. Marcos, R. Hamid, and D. Tuia, “Geospatial correspondence for multimodal registration,” in Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016.