Effective Spectral Unmixing via Robust Representation and Learning-based Sparsity

09/02/2014 ∙ by Feiyun Zhu, et al. ∙ 0

Hyperspectral unmixing (HU) plays a fundamental role in a wide range of hyperspectral applications. It is still challenging due to the common presence of outlier channels and the large solution space. To address the above two issues, we propose a novel model by emphasizing both robust representation and learning-based sparsity. Specifically, we apply the ℓ_2,1-norm to measure the representation error, preventing outlier channels from dominating our objective. In this way, the side effects of outlier channels are greatly relieved. Besides, we observe that the mixed level of each pixel varies over image grids. Based on this observation, we exploit a learning-based sparsity method to simultaneously learn the HU results and a sparse guidance map. Via this guidance map, the sparsity constraint in the ℓ_p(0< p≤1)-norm is adaptively imposed according to the learnt mixed level of each pixel. Compared with state-of-the-art methods, our model is better suited to the real situation, thus expected to achieve better HU results. The resulted objective is highly non-convex and non-smooth, and so it is hard to optimize. As a profound theoretical contribution, we propose an efficient algorithm to solve it. Meanwhile, the convergence proof and the computational complexity analysis are systematically provided. Extensive evaluations verify that our method is highly promising for the HU task---it achieves very accurate guidance maps and much better HU results compared with state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Hyperspectral unmixing (HU) is one of the most fundamental processes for various hyperspectral applications, such as sub-pixel mapping [1], high-resolution hyperspectral imaging [2], hyperspectral enhancement [3], hyperspectral compression and reconstruction [4], hyperspectral visualization and understanding [5, 6], detection and identification substances in the scene [3, 7] etc. Generally, the goal of HU is to break down each pixel spectrum into a set of “pure” spectra (called endmembers such as the spectra of water, grass, tree etc.), weighted by the corresponding proportions, called abundances. Formally, HU methods take in a hyperspectral image with channels and pixels, and assume that each pixel is a composite of endmembers . Specifically, the linear combinatorial model is the most popular one

(1)

where is the composite abundance of the endmember. In the unsupervised setting, both endmembers and abundances are unknown. Such case makes the objective function non-convex and the solution space very large [6, 8]. Thus, reasonable prior knowledge is required to restrict the solution space, or even to bias the solution toward good stationary points.

Figure 1: As the mixed level of each pixel varies over image grids, the sparse constraint should be individually imposed according to the mixed level of each pixel, rather than roughly imposed at the same strength for all the pixels. (a) hyperspectral image of two substances: soil and tree. (b) abundance map. (c) is the guidance map exhibiting the individually mixed level of each pixel—the darker indicates the more mixed. (best viewed in color)

To shrink the solution space, various constraints are imposed upon abundances [8, 7, 9, 10, 11] as well as upon endmembers [12, 13]. Although, these methods work to some extent, they are far from optimal for the following two reasons. First, the side effects of blank or badly degraded channels are ignored by the existing methods. As many degraded channels deviate significantly from the majority of hyperspectral channels, the objective of existing methods is easily dominated by outlier channels, leading to a failure learning. Second, almost all existing constraints are roughly imposed at the same strength for all factors. Such implementation does not meet the practical situation; an example is illustrated in Fig. 1, where the sparse level of each pixel varies over image grids. Therefore, it is more reasonable to impose an individual strength of sparse constraints according to the mixed level of each pixel, rather than roughly impose the same strength of sparse constraints for all pixels. It is expected that the more accurate priors would bias the solution to the more satisfactorily local minima.

To alleviate the above two problems, we propose a novel method, named effective spectral unmixing via robust representation and learning-based sparsity (RRLbS) for the HU task. Specifically, the -norm is employed to measure the representation loss, preventing large errors from dominating our objective. In this way, the robustness against outlier channels is greatly enhanced. Besides, a learning-based sparsity method is exploited to individually impose the sparsity constraint according to the mixed level of each pixel. The main contributions of this work are summarized as follows.

  • It is the side influences of badly degraded channels that are generally ignored by existing methods. To our knowledge, in the HU field, this is the first work to propose the -norm based robust model to relieve the side effects of outlier channels.

  • Besides, we propose a novel learning-based sparsity method to simultaneously learn the HU results and a guidance map. Through this guidance map, the mixed level of every pixel is described and respected by imposing an individual sparsity constraint according to the mixed level of each pixel. Such implementation helps to achieve highly encouraging HU results and guidance maps.

  • We propose an efficient algorithm to solve the joint -norm and -norm based objective, which is highly non-convex, non-smooth and challenging to solve. Both theoretical and empirical analyses are conducted to verify its convergence property. Besides, the computational complexity analysis is theoretically analyzed as well.

The rest of this paper is organized as below: in Section II, the related HU work is systematically reviewed. Section III presents the new model (RRLbS) and its physical motivations. The algorithm as well its theoretical convergence proof and computational complexity analysis are given in Section IV. Then, extensive evaluations are provided in Section V. Finally, the conclusion of this work is drawn in Section VI.

Ii Previous Work

The existing HU methods are typically categorized into three types: supervised methods [3, 4], weakly supervised methods [14, 15] and unsupervised methods [16, 17, 9, 18, 7, 13]. For the supervised methods, the endmembers are given in advance; only the abundances

need to estimate. Although in this way, the HU problem is greatly simplified, it is usually inconvenient or intractable to obtain feasible

endmembers in a supervised manner, thus, hampering the acquisition of good estimations.

Accordingly, the weakly supervised methods [14, 15] has been proposed. A large library of substance spectra have been collected by a field spectrometer beforehand. Then, the HU task becomes the problem of finding the optimal subset of spectra in the library that can best represent all pixels in the scene [14]. Unfortunately, the library is far from optimal for the fact that the spectra in it are not standardly unified. First, for different hyperspectral sensors, the spectral shape of the same substance are greatly inconsistent. Second, for various hyperspectral images, the length of pixel spectra is largely different as well—for example some images have channels, while another has channels. Finally, the recording conditions are highly different as well—some hyperspectral images are captured far from the outer space, while some hyperspectral images are obtained from the airplane or ground. Due to the atmospheric effects etc., the different recording condition would lead to different spectral appearances. In short, the weakness of the library brings side effects on this kind of methods.

More commonly, the endmembers are selected from the image itself to ensure spectral coherence [3] and the unsupervised HU methods are preferred. The unsupervised HU methods could be generally categorized into two types: geometric methods [19, 20, 21, 22, 23, 24] and statistical ones [25, 26, 27, 28, 29]. The geometric methods usually exploit the simplex to model the distribution of spectral pixels. The vertices are treated as endmembers. Perhaps, N-FINDR [30] and Vertex Component Analysis (VCA) [20] are the most typical geometric methods. For N-FINDR, the endmembers are extracted by inflating a simplex inside the hyperspectral pixel distribution and treating the vertices of a simplex with the largest volume as endmembers [30]. VCA [20] projects all pixels onto a direction orthogonal to the simplex spanned by the chosen endmembers; the new endmember is identified as the extreme of the projection. Although these methods are simple and fast, they suffer from the requirement of pure pixels, which is usually unreliable in practice [9, 7, 31].

Accordingly, many statistical methods have been proposed for or applied to the HU problem, among which the Nonnegative Matrix Factorization (NMF) [32] and its extensions are the most popular. The goal of NMF is to find two nonnegative matrices to approximate the original matrix with their product [11]. There are two valuable reasons to apply the nonnegative constraint on both factor matrices. First, both endmembers and abundances should be nonnegative. Such case means that the NMF model is physically suited to the HU task. Second, the nonnegative constraint only allows for additive combinations, yielding a parts-based representation [11]. This parts-based property enables factor results more intuitive and interpretable, as existing studies on psychological and physiological field have shown human brain also works in the parts-based manner [33, 34].

Although NMF is well adapted to applications of face analysis [35, 36] and documents clustering [37, 38], the objective function is non-convex, naturally resulting in a large solution space [39]. Many extensions have been proposed by employing all kinds of priors to restrict the solution space. For the HU task, the priors are imposed either upon abundances [7, 10, 8] or upon endmembers [12, 13]. For example, the Local Neighborhood Weights regularized NMF method [10] (W-NMF) assumes that the hyperspectral pixels are distributed on a manifold structure and exploits appropriate weights in the local neighborhood to enhance the spectral and spatial information [40]. This information could be eventually transferred to the abundance space via the Laplace graph constraint. Actually, this constraint has a smooth impact. It will weaken the parts-based property of NMF.

Inspired by MVC-NMF [12], Wan et al. [13] proposed the EDC-NMF method. The basic assumption is originated from two perspectives. First, due to the high spectral resolution of hyperspectral sensors, the endmember signal should be smooth itself. Besides, the endmember signals should possess distinct shapes so that we can separate out different materials [3]. However, in their algorithm, they take a derivative along the endmembervector, introducing negative values to the updating rule. To make up this drawback, the elements in the endmember matrix are required to project to a given nonnegative value (, say) after each iteration. Consequently, the regularization parameters could not be chosen freely, which limits the efficacy of this method.

The sparsity-based methods are the most successful methods for the HU task. They assume that in hyperspectral images most pixels are mixed by a subset of endmembers, rather than all endmembers, thus employing various kinds of sparse constraints on abundances. Specifically, the -NMF [7] is a state of the art sparsity regularized NMF method that is derived from Hoyer’s lasso regularized NMF [41]. The lasso constraint [42, 43] could not enforce further sparse when the full additivity constraint is used, limiting the effectiveness of this method [7]. Thus, Qian et al. exploits the -norm to regularize the abundances as it has been proved by Fan et al. [44] that the constraint could obtain sparser solutions than the -norm does.

Although the related methods work to some extent, they are far from the optima. Thus, we propose a new method by emphasizing both robust representation and learning-based sparsity. Through the former, the side effects of outlier channels are greatly relieved. While via the latter, it is more likely to bias the HU solution to some suited stationary points in the large solution space.

Iii RRLbS: Robust Representation and Learning-based Sparsity

In this section, we propose a novel model by emphasizing both robust representation and learning-based sparsity. To relieve the side influences of badly degraded channels, the -norm, rather than -norm, is employed to measure the representation loss, preventing too large errors from dominating our objective. Then, a learning-based sparsity method is proposed to update a guidance map, by which the sparse constraint could be individually imposed according to the mixed level of each pixel. Such implementation is more reasonable, thus expected to get better HU results.

Notation. In this paper, we use boldface uppercase letters to denote matrices and boldface lowercase letters to represent vectors. Given a matrix , we denote the row and column as and respectively. is the -th entry in the matrix. A nonnegative matrix is denoted as or . The -norm of matrices is defined as .

Problem formalization. In the HU problem, we are often given a hyperspectral image of pixels and channels, which is denoted by a nonnegative matrix . From the linear mixture perspective, the goal of HU is to find two nonnegative matrices to well approximate with their product. Formally, the discrepancy between and its representation is modeled as

(2)

where is the endmember matrix including spectral bases, ; is the corresponding abundance matrix—the column vector contains all abundances at pixel ;

is a loss function measuring the difference between two terms. When setting

as the Euclidean loss, the objective (2) becomes the standard NMF problem [6, 32, 39], which is commonly used in a great number of state of the art HU methods [8, 12, 9, 6, 7, 13]. However, the Euclidean loss is prone to outliers [45, 46]. Accordingly, our initial goal is to propose a robust HU model.

Figure 2: -norm versus -norm in shape. Compared with the -norm, is more capable of preventing large errors from dominating the objective energy. Compared with the sparse constraint, tends to find a sparser solution [44, 47, 7].

Iii-a Robust Representation in the -norm

Existing HU methods generally use the Euclidean loss to measure the representation error, that is

(3)

where is the channel (i.e. row vector) in ; is the channel in

. Similar to the existing least square minimization based models in machine learning and statistics 

[48, 49], (3) is sensitive to the presence of outlier channels [46]. However, from the perspective of remote sensing, hyperspectral images are very likely to contain outlier channels. This is owing to the following two reasons. First, due to the high spectral resolution of hyperspectral sensors, it receives very litter energy from a narrow wavelength range when producing each hyperspectral channel. In this way, the imaging information is highly easy to be overwhelmed by various kinds of noises. Second, the bad imaging conditions are responsible for the degraded channels as well—when imaging from the outer space or airplanes, due to the water vapor and the atmospheric effects etc., the hyperspectral channels are easy to be blank or badly noised. Specifically, many noised channels deviate significantly from the majority of the hyperspectral channels. They are actually outlier channels.

To identify the outlier channels and to relieve the side effects they cause, a robust loss in the -norm is proposed

(4)

Considering all the row vectors together, the objective (4) becomes the concise matrix format:

(5)

In our new model, the -norm is applied to the representation loss—the -norm is imposed among channels and the -norm is used for pixels. As the -loss is capable of preventing large representation errors to dominate the objective, as shown in Fig. 2. The side effects of outlier channels are greatly reduced and the robustness of the HU task is enhanced.

Iii-B Learning-based Sparsity Constraint via the Guidance Map

Existing methods generally impose an identical strength of sparsity constraints for all the pixels, e.g.

However, the mixed level of each pixel varies over image grids, as shown in Fig. 1. It is more reasonable to impose an individual sparsity constraint according the mixed level of each pixel. To this end, we propose an iterative method to learn a guidance map , by which the sparsity constraint in the 111Fan et al. has shown that the sparsity of solution increases as decreases, whereas the sparsity of the solution of shows little change with respect to  [44, 7]. Thus, to ensure the sensitivity of the individually sparsity constraint, it is sufficient to use the -norm in our model. -norm will be individually applied as

(6)

where , is the entry in the guidance map , reflecting the mixed level of the pixel; is the -th element in the matrix is the column vector of all ones with length . For each pixel , the choice of is solely dependent upon the corresponding guidance value , performing a nonlinearly weighting role for the sparsity constraint upon . Specifically, as the sparsity of solution increases as decreases [44, 47, 7], a smaller will impose a stronger sparse constraint on . In this way, the sparse constraint is individually imposed for each pixel. The matrix format of (6) is

(7)

where is an element-wise exponential operation.

The remaining problem is how to learn the optimal guidance map . Obviously, is crucially dependent upon the mixed level of each pixel, i.e. the sparse level of the optimal abundance . In other words, if pixel is highly mixed (i.e. the abundance is weakly sparse), will be small; once is highly “pure” (i.e. is largely sparse), will be large. However, the mixed level of each pixel is unavailable due to the unknown of the optimal abundance.

To achieve an optimal guidance map, here we will propose a two-step strategy. First, a heuristic strategy is used to get an initial guess. Then a learning-based updating rule is exploited to generate a sequence of improved estimates until they reach a stable solution.

We observe that pixels in the transition area or image edges are very likely be highly mixed [6]. Accordingly, we propose a heuristic strategy to get an initial guidance map:

(8)

where is the neighborhood of that includes four neighbors; is a similarities measured as

is an easily tuned parameter. In this way, the pixels in the transition area are treated as mixed pixels. However, there is one vital problem with (8)—this heuristic strategy could only tackle with pixels in the sudden change area. For the vast smooth areas, the mixed information is intractable to achieve. Thus, other strategies are required.

Perhaps, the most direct clue to obtain guidance maps is the intrinsical correlation with the optimal abundance , that is , where

is a sparsity measure that maps real vectors to a nonnegative value [50]. Although the optimal abundance is unavailable, the updating abundance is available after the iteration. If over iteration steps, we could always generate a sequence of improved estimates towards the optimal guidance map, that is , by using the dependence of . In turn, the learnt helps to impose an improved individual sparsity constraint for each pixel, eventually leading to a more reliable unmixing result. This iterative process is supposed to generate a sequence of ever improved estimates until convergences.

Specifically, due to the good property [50], we use the Gini index to measure the sparse level of each abundance vector. Given a vector with its elements sorted as , the Gini index is defined as

(9)

In this way, large elements have a smaller weight than the small elements in the sparse measure, avoiding the situation where smaller elements have negligible (or no) effect on the measure of sparsity [50].

Iii-C Robust HU Model via Joint -Norm and -Norm

Considering the robust representation loss (5) and the pixel-level sparsity constraint (7) together, the overall HU objective (RRLbS) is given by

(10)

where is a nonnegative balancing parameter. Due to the non-convex and non-smooth property of (10), the above objective is very challenging to solve. The efficient solver as well as its convergent proofs and computational complexity analyses will be systematically provided in the next section.

Iv An Efficient Algorithm for RRLbS

Since (10) is highly non-convex and non-smooth in and , the final objective (10) is challenging to solve. As a profound theoretical contribution, we propose an efficient iterative algorithm to solve the joint -norm and -norm based model. We first introduce how to efficiently solve (10) and then give a systematic analysis to the proposed solver, including its convergence and computation complexity.

Iv-a Updating Rules for RRLbS

To ensure the Lipschitz condition [51, 6] of the learning-based sparsity constraint, we reformulate our model as

(11)

where is a small positive value adding to every entry in . It is obvious that (11) is reduced to (10) when .

Then, to deal with the nonnegative constraint on and , the Lagrangian multiplier is used, resulting in the following objective

(12)

where and are the Lagrangian multipliers of the inequality constraints and respectively. There are two variable matrices in (12). Thus, an alternate algorithm is proposed. Specifically, the solution with respect to is given in the following theorem.

Theorem 1.

An updated point could be achieved via the updating rules as

(13)
(14)

where is a positive-definite and diagonal matrix with the diagonal entry222To avoid singular failures, if , we obtain by , where typically set . as ; is the Hadamard product between matrices; is an element-wise exponential operation.

Proof:

According to the constrained optimization, a stationary point of (12) could be achieved by differentiating (12), setting the partial derivatives to zero and considering the Karush-Kuhn-Tucker (KKT) optimality conditions [52, 53]. This amounts to a two-step strategy. First, setting the partial derivatives333(12) is non-differential in . We use the derivatives of the positive vicinity as an approximation, that is , where is a small value (, say). to zero, we have

(15)
(16)

Then, considering the KKT conditions and , we have the equalities as

Solving the above equations, we get the final updating rules

In this manner, we give the solver for the objective (11). ∎

Given any positive and diagonal matrix , constitutes an uncertainty of the solution , that is . To get rid of this uncertainty, we adjust each row of to be unit -norm or unit -norm, and scale the columns of accordingly

(17)

Input: hyperspectral image ; the number of endmembers ; parameter ; the initial guidance map .

1:  initialize the factor matrices and .
2:  calculate
3:  repeat
4:     repeat
5:        update by the updating rule (14).
6:        update by the updating rule (13).
7:        scale and to get rid of the uncertainty of solutions (17).
8:     until stable update by the updating rule (18).
9:  until convergence

Output and as the final unmixing result.

Algorithm 1 for RRLbS (10)

As mentioned before, the most direct clue to estimate the guidance map is the crucial dependence upon abundances. Once getting the stable abundance , could be efficiently solved via the Gini index (9) as

(18)

Note that, to satisfy the sparse constraint in (6), every value in the guidance map needs to be scaled into the range of , that is

The solver for the RRLbS model (10) is given in Algorithm 1.

Iv-B Convergence Analysis

To ensure the reliability of (13) and (14), we would like to analyze their convergence property.

Lemma 2.

The updating rules (13) and (14) are equivalent to the following updating rules

(19)
(20)

where . This means that the objective (10) could be equivalently solved by (19) and (20).

Proof:

Considering and that is a positive-definite and diagonal matrix, the updating rules (19) and (20) become

(21)
(22)

Since , we have the following derivations

(23)

Substituting (23) into (21), we have

(24)

Since the updating rules (24), (22) are exact the same as the updating rules (13), (14), we have proved Lemma 2. ∎

Theorem 3.

The objective (10) is non-increasing by using the updating rules (13) and (14).

Proof:

Based on the partial derivatives (15), (16), it is obvious that the updating rules (13), (14) compute the optimal solution of the following problem,

(25)

which is equivalent to the objective as:

(26)

where . It has been proved in [6] that (26) is non-increasing under the updating rules (19), (20). Because (25) is the same problem as (26), and the updating rules (13), (14) are equivalent to (19), (20), as analyzed in Lemma 2, we infer that the objective (25) is non-increasing under the updating rules (13), (14) as well. For each iteration, we denote the updated as . Thus, we have the following inequalities

where and . This amounts to

(27)

Given the function , holds for any  [46, 48]. Thus we have

(28)

Combining the inequalities of (27) and(28) together, we have the inequality as

which is equivalent to the inequality

In this way, we have proven Theorem 3. ∎

Iv-C Computational Complexity Analysis

Theoretically, the computational complexity is important for algorithms. In this section, we analyze the additional computational cost of our method compared with the standard NMF. To give a precise comparison, the arithmetic operations of addition, multiplication, division and exponent, are counted for each iteration.

Based on the updating rules (13) and (14), it is easy to summarize the counts of operations in Tabel I, where the notations are listed in Table II. For RRLbS, it is important to note that is a positive-defined diagonal matrix. This property facilitates the savage of computational costs. For example, it only costs exponent operations to compute the exponent of , that is . While for a normal matrix of the same size, it costs to get the inverse matrix , which could also be treated as an exponent operation. The cost of matrix multiplication is greatly saved as well; it costs multiplication for our case. While for a normal , it takes multiplication and addition to get .

Table I: Computational operation counts for NMF and RRLbS at each iteration.
Table II: Parameters used in Computational Complexity Analysis.

Apart from the updating costs, RRLbS costs to get the initial guidance map and to get an updated one. If the updating process stops after steps and the learning-based guidance map updates after each (typically ) iterations, the total cost of RRLbS is

While the total cost of NMF is . Generally the computational complexities of RRLbS and NMF are of the same magnitude.

V Evaluation

In this section, extensive experiments are conducted to evaluate the effectiveness of our method in the HU task.

V-a Datasets

Three real hyperspectral datasets are used in the experiments. The important information of these datasets is listed below. Specifically, the ground truth is obtained by the method introduced in [17, 9, 18].

Urban is one of the most widely used datasets for the HU studying [8, 7, 9]. There are pixels. Each pixel is recorded at channels ranging from to . Due to the dense water vapor and the atmospheric effects etc., the channels of , , , , and are either blank or badly noised. There are four endmembers in this image: ‘#1 Asphalt’, ‘#2 Grass’, ‘#3 Tree’ and ‘#4 Roof’ as shown in Fig. (c)c.

Jasper Ridge, as shown in Fig. (b)b, is a popular dataset used in [54, 18, 6]. It consists of pixels, each of which is recorded at channels ranging from to , resulting in a very high spectral resolution as . Since this image is highly complex, we consider a sub-image of pixels. This sub-image starts from the -th pixel. Due to dense water vapor and atmospheric effects etc., the channels , , and are blank or badly noised. There are four endmembers, that is ‘#1 Tree’, ‘#2 Soil’, ‘#3 Water’ and ‘#4 Road’ respectively.

Samson (cf. Fig. (a)a) is a simple dataset that is available from http://opticks.org/confluence/display/opticks/Sample+Data. In this image, there are pixels. Each pixel is recorded at channels covering the wavelengths between and . The spectral resolution is highly up to . As the original image is too large, which is very expensive in terms of computational cost, a region of pixels is used. It starts from the -th pixel in the original image. Specifically, there are three targets in this image, i.e. ‘#1 Soil’, ‘#2 Tree’ and ‘#3 Water’ respectively.

(a) Samson
(b) Jasper Ridge
(c) Urban
Figure 3: Three benchmark hyperspectral images, that is Samson, Jasper Ridge and Urban, used in the experiment.

V-B Compared Algorithms and Parameter Settings

To verify the superior performance, the proposed method is compared with five state of the art methods. The details of all these methods (including our method) are listed as follows:

  1. Our method: Effective Spectral Unmixing via Robust Representation and Learning-based Sparsity (RRLbS) is a new method proposed in this paper.

  2. Vertex Component Analysis [20] (VCA) is the benchmark geometric method. The code is available on http://www.lx.it.pt/bioucas/code.htm.

  3. sparsity-constrained NMF [7] (-NMF) is a state-of-the-art method that could get sparser results than -NMF. Since the code is unavailable, we implement it.

  4. Local Neighborhood Weights regularized NMF [10] (W-NMF) is a manifold graph based NMF method. It integrates the spectral information and spatial information when constructing the weighted graph. Since this work is an extension of G-NMF [11], we implement the code by referring to the code on http://www.cad.zju.edu.cn/home/dengcai/Data/GNMF.html.

    Figure 4: The average performances (i.e. and ) of six methods on (a) Urban (b) Jasper Ridge and (c) Samson. Compared with the state-of-the-art method, our method achieves highly promising HU results.
  5. Endmember Dissimilarity Constrained NMF [13] (EDC-NMF) urges the endmember to be smooth itself and different from each other. The code is implemented by ourself.

  6. Graph-regularized -NMF [31] (GL-NMF) is a new method proposed in 2013. It considers both the sparse characteristic and the intrinsic manifold structure in hyperspectral images. We implement this code based on G-NMF as well.

There is no parameter in VCA. For the other five methods, there is mainly one parameter. To find a good parameter setting, typical procedures consist of two phases: a bracketing phase that finds an interval containing acceptable parameters, and a selection phase that zooms in to locate the final parameter.

V-C Evaluation Metrics for Quantitative Performances

We use two benchmark metrics to measure the quantitative HU results, i.e. (a) Spectral Angle Distance (SAD) [8, 20] and (b) Root Mean Square Error (RMSE) [8, 7, 55]. Both metrics assess the estimated errors. Thus, the smaller SAD and RMSE correspond to the better results. Specifically, SAD evaluates the estimated endmember, and RMSE assesses the estimated abundance map. They are defined as

(29)

and

(30)

, where is the estimated endmember, is the estimated abundance map (i.e. the row vector in ), are the corresponding ground truth; is the number of pixels in image.

Table IV: Unmixing performance on Jasper Ridge. The red value is the best, and the blue value is the best.
Table V: Unmixing performance on Samson. The red value is the best, and the blue value is the best.
Table IV: Unmixing performance on Jasper Ridge. The red value is the best, and the blue value is the best.
Table V: Unmixing performance on Samson. The red value is the best, and the blue value is the best.
Table III: Unmixing performance on Urban. The red value is the best, and the blue value is the best.

V-D Quantitative Performance Comparisons

To verify the performance of the proposed method, we conduct the unmixing experiments on three benchmark datasets. Each experiments is repeated eight times to ensure a reliable comparison.

The quantitative performances are summarized in Tables VVV and plotted in Fig. 4. Specifically, Table V summarizes the unmixing results on Urban, where the top sub-table illustrates SADs and the bottom shows RMSEs. In each sub-table, each row shows the results of one endmembers, i.e. ‘#1 Tree’, ‘#2 Soil’, ‘#3 Water’ and ‘#4 Road’ respectively; the last row shows the average results. As Table V shows, our method generally achieves the best results. This case is better illustrated in Fig. 4a, where the average results are illustrated. As we shall see, RRLbS performs the best—compared with the second best results, our method reduces for and for . Such extraordinary improvements rely on two reasons. First, due to the atmospheric effects, there are channels either blank or badly noised. Accordingly, our method is robust to these side channels and relieves their bad effects on the unmixing process. Besides, with the help of the guidance map, RRLbS exploits an individually sparse constraint according to the mixed level of each pixel, which is better suited to the practical situation. Both reasons help to achieve a better performance.

In Table V, there are two sub-tables illustrating SADs and RMSEs respectively on Jasper Ridge. In the sub-table, each row shows the results of one endmembers, that is ‘#1 Tree’, ‘#2 Soil’, ‘#3 Water’ and ‘#4 Road’ respectively. The last row shows the average performance. Specifically, the values in the red ink are the best, while the blue ones are the second best. As Table V shows, our method generally achieves the best results, and in a few cases it achieves comparable results with the best results of other methods. Such case is better illustrated in Fig. 4b. As we shall see, RRLbS is the best method that reduces for and for according to the second best results. However, compared with the results on Urban, the improvement of our method is not so huge. This is since Jasper Ridge is not so badly noised as Urban. The improvement mainly relies on the individually sparse constraint in RRLbS.

Table V summaries the HU performances of six methods on Samson. In this table, the rows display the results of three endmembers, that is ‘#1 Soil’, ‘#2 Tree’ and ‘#3 Water’ respectively. In general, the sparsity constrained methods, such as -NMF and GL-NMF, obtain better results than the other methods. This is since sparsity constraints tends to achieve expressive endmembers [35]. Such property is more reliable for the HU task. In Fig. 4c, the average performances of and are exhibited. As we shall see, our method obtains superior performances—compared with the second best results, RRLbS reduces and respectively for and . Since Samson is weakly degraded by outliers, compared with the robust loss, the individually sparse constraint contributes more for the improvements.

(a) Urban. The row shows the estimated error.
(b) Jasper Ridge. The row shows the estimated error.
(c) Samson. The row shows the estimated error.
Figure 5: Abundance maps of five methods on (a) Urban, (b) Jasper Ridge and (c) Samson. In each sub-figure, the top row shows abundance maps in pseudo color; the bottom row shows the estimated error of each method, i.e. . From the to the column, each column illustrates the results of one method. The last column shows the ground truths. (Best viewed in color)

V-E Visual Performance Comparisons

To give a visible comparison, the abundance map of five methods as well as their estimated errors are compared in Fig. 5. To begin, we give the definition of abundance maps in pseudo and the estimated errors by taking Fig. (a)a as an example. There are mainly four color inks in the top row of Fig. (a)a. Via these colors, we could display the abundances associated with pixel by plotting the corresponding pixel using the proportions of red, blue, green and black inks given by for , respectively. So, for instance, a pixel for which will be colored blue, whereas one for which will be colored with equal proportions of red and blue inks and so will appear purple. In the bottom row, the error map is displayed. At the pixel, the error value is obtained by computing the -norm of the corresponding error vector, that is .

The visualized abundances on Urban are illustrated in Fig. (a)a, containing two rows and six columns of sub-images. The top row shows the abundance map in pseudo, and the bottom rows shows the corresponding error maps. From the to the columns, each column shows the results of one methods. The last column shows the ground truth. As we shall see, our method achieves extraordinarily results. It gets the most similar abundance maps compared with the ground truth, and the error map is very small. For the other methods, they achieve really bad abundance maps as clearly demonstrated by the corresponding error map. As mentioned before, it is the serious noise in Urban makes other results bad. While for our method, the robust objective could greatly relieve the side effects of outlier channels. Thus, the performance is largely enhanced.

The abundance results on Jasper Ridge and Samson are displayed in Figs. (b)b and (c)c respectively. Compared with Fig. (a)a, most methods achieve acceptable results. This is due to the less noise in both datasets. Specifically, our method achieves much better results than the other method—the abundance map is highly similar with the ground truth; the corresponding error map is very small. Such results verify that the individually sparse constraint is very reliable, and that RRLbS is well suited to the HU task.

V-F Comparision of Guidance Maps: Quantitative & Visual

As a significant characteristic, RRLbS learns a guidance map to model the individual mixed level of each pixel. In turn, based on the learnt guidance map, we impose the individually sparse constraint according to the mixed level of each pixel. These two phases help each other to achieve the optima. The unmixing results have already been compared. In this section, the learnt guidance map is systematically compared.

The results of the guidance maps are illustrated in Figs. (a)a and (b)b, where the former summarizes the quantitative results and the latter shows the visual comparisons. There are three kinds of guidance maps in Fig. 6: 1) ‘Blank’ means the identical guidance map used by the traditional method; 2) ‘Initial’ denotes the initial guidance map learnt by the heuristic strategy; 3) ‘Learnt’ represents the learning-based guidance map obtained by RRLbS. As shown in Fig. 6, ‘Learnt’ is extraordinarily better than the other two methods. In terms of quantitative comparisons (cf. Fig. (a)a), the estimated error of ‘Learnt’ is a half of the second best one in average. When checking the visual comparison in Figs. (b)b, our method achieves the most similar appearance to the ground truth. Obviously, there is no meaning for the ‘Blank’. For the ‘Initial’, it achieves acceptable results in the sudden change areas in the scene. However, it is intractable to guess the mixed information for the vast smooth areas. As a huge improvement, the ‘learnt’ gets highly satisfactory mixed information in all kinds of areas. In short, the comparisons above verify that RRLbS is able to learn satisfactory guidance maps which can effectively indicate the mixed level of pixels.

(a) Quantitative results.
(b) Visual results.
Figure 6: Illustrations of guidance maps: (a) quantitative results, (b) visual results. Specifically, ‘Blank’ means the identical guidance map used by the traditional method; ‘Initial’ denotes the guidance map obtained by the heuristic strategy; ‘Learnt’ represents the learning-based guidance map achieved by RRLbS. (a) shows the SAD and RMSE performances. In (b), there are three rows and four columns of sub-images. Each row shows the results on one dataset, i.e. Urban, Jasper Ridge and Samson respectively. From the to the column, each column illustrates the results of one method. The last column shows the ground truths.

V-G Convergence Study

It has been theoretically proven that the objective (10) is able to converge to a minimum by using the updating rules (13) and (14). To verify this conclusion, we conduct experiments to show the empirical convergence property of RRLbS. The convergent curves are illustrated in Fig. 7, including three sub-figures, each of which shows the results on one dataset. In each sub-figure, the X-axis shows the number of iteration , and the Y-axis illustrates the objective energy defined in (10). As we shall see, the objective energy decreases monotonously over the iteration steps until convergence.

Vi Conclusions

In this paper, we propose a novel robust representation and learning-based sparsity (RRLbS) method for the HU task. The -norm is exploited to measure the representation loss, enhancing the robustness against outlier channels. Then, through the learning-based sparsity method, the sparse constraint is adaptively applied according to the mixed level of each pixel. Such case not only agrees with the practical situation but also leads the endmember toward some spectra resembling the highly sparse regularized pixel. Extensive experiments on three benchmark datasets verify the advantages of RRLbS: 1) in terms of both quantitative and visual performances, RRLbS achieves extraordinarily better results than all the compared methods; 2) the estimated guidance map is highly promising as well, providing a more accurate sparse constraint at the pixel level. Moreover, both theoretic proof and empirical results verify the convergence of our method.

Figure 7: Convergence curves of RRLbS. (a) Urban, (b) Jasper Ridge and (c) Samson.

In this appendix, we will provide a -norm based robust model to deal with the badly degraded channel. Specifically in Section III-C, we have proposed the -norm based measure for the representation error, leading to the following objective

(31)

However, there are theoretical and empirical evidences to demonstrate the fact that compared with or norms, the -norm is more able to prevent outliers from dominating the objective, enhancing the robustness [56]. Therefore, we provide another new model by using the -norm to measure the representation loss

(32)

where . In this case, we could control the robustness level of our model (32) by setting the value of —a smaller leads to a strong robustness under the same setting.

References

  • [1]

    K. C. Mertens, L. P. C. Verbeke, E. I. Ducheyne, and R. R. D. Wulf, “Using genetic algorithms in sub-pixel mapping,”

    International Journal of Remote Sens., vol. 24, no. 21, pp. 4241–4247, 2003.
  • [2] R. Kawakami, J. Wright, Y.-W. Tai, Y. Matsushita, M. Ben-Ezra, and K. Ikeuchi, “High-resolution hyperspectral imaging via matrix factorization,” IEEE CVPR, vol. 0, pp. 2329–2336, 2011.
  • [3] Z. Guo, T. Wittman, and S. Osher, “L1 unmixing and its application to hyperspectral image enhancement,” in Defense, Security, and Sensing, ser. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7334, no. 1.   The International Society for Optical Engineering., Apr. 2009, p. 73341M.
  • [4] C. Li, T. Sun, K. Kelly, and Y. Zhang, “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE TIP, vol. 21, no. 3, pp. 1200–1210, March 2012.
  • [5] S. Cai, Q. Du, and R. Moorhead, “Hyperspectral imagery visualization using double layers,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3028–3036, Oct 2007.
  • [6] F. Zhu, Y. Wang, B. Fan, G. Meng, S. Xiang, and C. Pan, “Spectral unmixing via data-guided sparsity,” IEEE Transactions on Image Processing (TIP under review), 2014.
  • [7] Y. Qian, S. Jia, J. Zhou, and A. Robles-Kelly, “Hyperspectral unmixing via sparsity-constrained nonnegative matrix factorization,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4282 –4297, nov 2011.
  • [8] X. Liu, W. Xia, B. Wang, and L. Zhang, “An approach based on constrained nonnegative matrix factorization to unmix hyperspectral data,” IEEE TGRS, vol. 49, no. 2, pp. 757–772, 2011.
  • [9] S. Jia and Y. Qian, “Constrained nonnegative matrix factorization for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 1, pp. 161–173, 2009.
  • [10] J. Liu, J. Zhang, Y. Gao, C. Zhang, and Z. Li, “Enhancing spectral unmixing by local neighborhood weights,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 5, pp. 1545–1552, 2012.
  • [11] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548 –1560, aug 2011.
  • [12] L. Miao, H. Qi, and H. Qi, “Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization,” IEEE TGRS, vol. 45, no. 3, pp. 765–777, 2007.
  • [13] N. Wang, B. Du, and L. Zhang, “An endmember dissimilarity constrained non-negative matrix factorization method for hyperspectral unmixing,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 2, pp. 554–569, 2013.
  • [14] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Sparse unmixing of hyperspectral data.” IEEE TGRS, vol. 49, no. 6, pp. 2014–2039, 2011.
  • [15] M. D. Iordache, J. M. B. Dias, and A. Plaza, “Total variation spatial regularization for sparse hyperspectral unmixing.” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 11, pp. 4484–4502, 2012.
  • [16]

    J. Bayliss, J. A. Gualtieri, and R. F. Cromp, “Analyzing hyperspectral data with independent component analysis,” in

    Proc. SPIE, vol. 3240.   SPIE, 1997, pp. 133–143.
  • [17] S. Jia and Y. Qian, “Spectral and spatial complexity-based hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 12, pp. 3867–3879, 2007.
  • [18] F. Zhu, Y. Wang, S. Xiang, B. Fan, and C. Pan, “Structured sparse method for hyperspectral unmixing,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 88, pp. 101–118, 2014.
  • [19] J. M. Boardman, F. A. Kruse, and R. O. Green, “Mapping target signatures via partial unmixing of aviris data,” in Proc. Summ. JPL Airborne Earth Sci. Workshop, vol. 1, 1995, p. 23–26.
  • [20] J. M. P. Nascimento and J. M. B. Dias, “Vertex component analysis: a fast algorithm to unmix hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 4, pp. 898–910, 2005.
  • [21] C.-I. Chang, C.-C. Wu, W. Liu, and Y. C. Ouyang, “A new growing method for simplex-based endmember extraction algorithm,” IEEE TGRS, vol. 44, no. 10, pp. 2804–2819, 2006.
  • [22] J. Li and J. M. Bioucas-Dias, “Minimum volume simplex analysis: A fast algorithm to unmix hyperspectral data.” in IEEE GARSS, vol. 4, 2008, pp. III–250–III–253.
  • [23] J. Bioucas-Dias, “A variable splitting augmented lagrangian approach to linear spectral unmixing,” in 1st WHISPERS, Aug 2009, pp. 1–4.
  • [24] G. Martin and A. Plaza, “Spatial-spectral preprocessing prior to endmember identification and unmixing of remotely sensed hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 380–395, 2012.
  • [25] J. Wang and C.-I. Chang, “Applications of independent component analysis in endmember extraction and abundance quantification for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 9, pp. 2601–2616, 2006.
  • [26] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and A. O. Hero, “Joint bayesian endmember extraction and linear unmixing for hyperspectral imagery,” IEEE TSP, vol. 57, no. 11, pp. 4355–4368, 2009.
  • [27] J. M. Bioucas-Dias, “A variable splitting augmented lagrangian approach to linear spectral unmixing,” in WHISPERS, 2009, pp. 1–4.
  • [28] J. M. P. Nascimento and J. M. Bioucas-Dias, “Hyperspectral unmixing based on mixtures of dirichlet components,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 3, pp. 863–878, 2012.
  • [29] N. Yokoya, J. Chanussot, and A. Iwasaki, “Nonlinear unmixing of hyperspectral data using semi-nonnegative matrix factorization,” IEEE TGRS, vol. 52, no. 2, pp. 1430–1437, 2014.
  • [30] M. E. Winter, “N-findr: an algorithm for fast autonomous spectral end-member determination in hyperspectral data,” in SPIE Conf. Imaging Spectrometry, 1999, pp. 266–275.
  • [31] X. Lu, H. Wu, Y. Yuan, P. Yan, and X. Li, “Manifold regularized sparse nmf for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5, pp. 2815–2826, 2013.
  • [32] D. D. Lee and H. S. Seung, “Learning the parts of objects with nonnegative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, Oct 1999.
  • [33] S. E. Palmer, “Hierarchical structure in perceptual representation,” Cognitive Psychology, vol. 9, no. 4, pp. 441 – 474, 1977.
  • [34] N. K. Logothetis and D. L. Sheinberg, “Visual object recognition,” Annual Review of Neuroscience, vol. 19, no. 1, pp. 577–621, 1996.
  • [35] S. Z. Li, X. Hou, H. Zhang, and Q. Cheng, “Learning spatially localized, parts-based representation,” in IEEE CVPR, 2001, pp. 207–212.
  • [36] R. Sandler et al., “Nonnegative matrix factorization with earth mover’s distance metric for image analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1590–1602, 2011.
  • [37] W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” in Int. Conf. on Res. and Development in Inform. Retrieval (SIGIR), 2003, pp. 267–273.
  • [38] F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, “Document clustering using nonnegative matrix factorization,” Inform. Proc. & Management, vol. 42, no. 2, pp. 373–386, 2006.
  • [39] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in NIPS.   MIT Press, 2000, pp. 556–562.
  • [40]

    D. Lunga, S. Prasad, M. M. Crawford, and O. K. Ersoy, “Manifold-learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning,”

    IEEE Signal Process. Mag., vol. 31, no. 1, pp. 55–66, 2014.
  • [41] P. O. Hoyer, “Non-negative sparse coding,” in

    IEEE Workshop Neural Networks for Signal Proc.

    , 2002, pp. 557–565.
  • [42] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 267–288, 1996.
  • [43] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1289–1306, 1996.
  • [44] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
  • [45]

    F. Nie, H. Huang, X. Cai, and C. H. Ding, “Efficient and robust feature selection via joint

    -norms minimization,” in NIPS, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds.   Curran Associates, Inc., 2010, pp. 1813–1821.
  • [46] H. Wang, F. Nie, H. Huang, and H. Huang, “Robust distance metric learning via simultaneous l1-norm minimization and maximization,” in ICML, 2014, pp. 1836–1844.
  • [47] J. Fan and H. Peng, “Non-concave penalized likelihood with a diverging number of parameters,” Annals of Statistics, vol. 32, no. 3, pp. 928–961, 2004.
  • [48]

    F. Nie, H. Wang, H. Huang, and C. H. Q. Ding, “Early active learning via robust representation and structured sparsity.” in

    IJCAI, 2013, pp. 1572–1578.
  • [49] K. Yu, J. Bi, and V. Tresp, “Active learning via transductive experimental design,” in ICML, 2006, pp. 1081–1088.
  • [50] N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Trans. Inf. Theor., vol. 55, no. 10, pp. 4723–4741, Oct. 2009.
  • [51] H. Li, S. Tak, and J. C. Ye, “Lipschitz-killing curvature based expected euler characteristics for p-value correction in fnirs,” Journal of neuroscience methods, vol. 204, no. 1, p. 61—67, February 2012.
  • [52] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed.   New York: Springer, 2006.
  • [53] C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural Computation, vol. 19, no. 10, pp. 2756–2779, 2007.
  • [54] Envi-Tutorials, “An overview of hyperspectral remote sensing http://www.cossa.csiro.au/hswww/Overview.htm,” 2013.
  • [55] K. Canham, A. Schlamm, A. Ziemann, B. Basener, and D. W. Messinger, “Spatially adaptive hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4248–4262, 2011.
  • [56] F. Nie, H. Wang, X. Cai, H. Huang, and C. Ding, “Robust matrix completion via joint schatten p-norm and lp-norm minimization,” in IEEE ICDM, 2012, pp. 566–574.