Log In Sign Up

Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

by   Alexandre Zouaoui, et al.

In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation:


page 1

page 9

page 10

page 11

page 13

page 14

page 15

page 16


Hyperspectral Data Unmixing Using GNMF Method and Sparseness Constraint

Hyperspectral images contain mixed pixels due to low spatial resolution ...

Improving the resolution of CryoEM single particle analysis

We present a new 3D refinement method for CryoEM single particle analysi...

Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization

In this paper, we study the nonnegative matrix factorization problem und...

Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

We develop a new neural network based independent component analysis (IC...

Sparsity Constrained Distributed Unmixing of Hyperspectral Data

Spectral unmixing (SU) is a technique to characterize mixed pixels in hy...

HyDe: The First Open-Source, Python-Based, GPU-Accelerated Hyperspectral Denoising Package

As with any physical instrument, hyperspectral cameras induce different ...

Attempted Blind Constrained Descent Experiments

Blind Descent uses constrained but, guided approach to learn the weights...

I Introduction

Hyperspectral imaging (HSI) [landgrebe_hyperspectral_2002, plaza_recent_2009, schaepman_earth_2009, goetz_imaging_1985, green_imaging_1998] consists of measuring the electromagnetic spectrum in a scene by using multiple narrow spectral bands. Thanks to its richer spectral information compared to traditional RGB images, HSI enables more accurate materials identification, leading to a broad range of applications including crop monitoring in agriculture [adao_hyperspectral_2017], waste sorting [karaca_automatic_2013], food safety inspection [gowen_hyperspectral_2007], or mineralogy [fox_applications_2017].

Remote sensing [clark_imaging_2003, bioucas-dias_hyperspectral_2013], such as airborne or satellite imagery, yields hyperspectral images whose pixels capture several objects or materials. As such, each pixel can include several pure spectral components (called endmembers), mixed in different proportions [ghamisi_advances_2017]

. Any further analysis hence requires identifying and disentangling endmembers present in a scene before estimating their respective proportions, or fractional

abundances, within each pixel of the HSI [parra_unmixing_1999]. Since the endmembers spectrum signatures are not known beforehand and must be estimated from data, this operation is named blind hyperspectral unmixing (HU) [keshava_spectral_2002, bioucas-dias_hyperspectral_2012] owing to its link with blind source separation [comon_handbook_2010].

In this paper, we adopt a linear mixing model since it is often relevant in remote sensing scenes where mixtures occur between macroscopic materials. Therefore, we assume that each observed pixel can be represented as a linear combination of endmembers and some additive noise. In other words, we are interested in tackling unsupervised linear HU


Further assumptions on the nature of endmembers are generally needed to estimate meaningful spectra. For instance, it can be assumed that there exists at least one pure pixel for each material present in the scene. The problem then requires finding these pure pixels within the original image. The pure pixel assumption is at the core of several geometrical endmember extraction methods including pixel purity index (PPI) [boardman_mapping_1995], N-FINDR [winter_n-findr_1999] and vertex component analysis (VCA) [nascimento_vertex_2005]. Once endmembers have been extracted, abundances can be estimated by minimizing the least squares errors between the original input spectra and the linearly reconstructed spectra as long as the abundances fractions satisfy the two physical constraints stating that they should be non-negative and sum to one for each pixel [heinz_fully_2001]. That being said, pure pixels are often missing in real scenarios. In the absence of pure pixels and in the case of linear models, endmembers and abundances can be simultaneously estimated by solving a constrained or penalized non-negative matrix factorization problem (NMF) [lee_algorithms_2000]. For example, the authors of [zhuang_regularization_2019] have proposed a formulation that involves a data fidelity term and a minimum volume regularization term on endmembers, whose minimization consists in alternating between solving for endmembers and abundances.

In this work, we do not assume the existence of pure pixels as they are often missing in real data, since, for instance, the spectral signatures of endmembers in HSI can be significantly affected by various changes in atmospheric, illumination, and environmental conditions within the scene [borsoi_spectral_2021]. We mitigate the effect of spectral variability by (i) normalizing each pixel by the -norm of its spectrum as a pre-processing step and (ii) modeling endmembers as convex combinations of pixels present in the scene. Not only HSI pixels are linear combinations of the estimated endmembers under the linear mixing model, but the estimated endmembers are also convex combinations of pixels. This corresponds to the archetypal analysis (AA) formulation introduced by Cutler and Breiman in [cutler_archetypal_1994]. Compared to plain NMF, AA provides greater interpretability given that each endmember can be traced back to pixels present in the HSI. In addition, since the estimated endmembers spectral signatures generally correspond to averaging the contributions of several pixels, the resulting estimation appears to be more robust to noise and spectral variability than pure pixel methods.

I-a Contributions and novelties

  1. We propose a new hyperspectral unmixing algorithm relying on entropic gradient descent for archetypal analysis. Our approach (i) provides better solutions for hyperspectral unmixing than traditional alternating optimization schemes based on projected gradient methods or active set algorithms, and (ii) allows more efficient GPU implementations.

  2. The efficiency of our method enables us to make a key practical contribution, consisting of an ensembling mechanism along with an appropriate model selection procedure, which makes our method almost parameter-free and thus easy to use (the only sensitive parameter is the number of endmembers we want to estimate).

  3. Our approach, available in an open-source package111Code is available at, outperforms state-of-the-art matrix factorization and deep learning methods on six standard real datasets.

The remainder of this paper is organized as follows. Section II presents related works for unsupervised linear hyperspectral unmixing. Section III introduces our method. Section IV presents experimental results highlighting the performance of our proposed approach. Finally, we conclude the article and underline future research directions in Section V.

Ii Related work on Unsupervised Linear Hyperspectral Unmixing

In this section, we present the framework of non-negative matrix factorization (NMF) before introducing archetypal analysis (AA). Finally, we mention widely used deep learning architectures that tackle blind hyperspectral unmixing.

Non-negative matrix factorization (NMF)

NMF [paatero_positive_1994, lee_algorithms_2000] is popular for hyperspectral unmixing. It consists of factorizing a matrix representing the HSI signal into the product of two matrices with non-negative entries, where one factor carries the endmembers whereas the other one represents the abundances for each pixel (typically with the constraint that abundances sum to one). Several variants of NMF have been proposed for this task. For instance, the method of minimum volume constrained non-negative matrix factorization (MVC-NMF) of [miao_endmember_2007] uses a minimum volume term for endmembers that effectively gets rid of the pure pixel assumption. In minimum dispersion constrained NMF (MiniDisCo) [huck_minimum_2010], the regularization function is called dispersion

, which encourages endmembers with minimum variance to avoid degenerate solutions and improve the unmixing robustness to flat spectra. Other approaches design regularization functions for abundances such as 

[zymnis_hyperspectral_2007, yang_blind_2010].

Archetypal analysis (AA)

AA was first introduced in [cutler_archetypal_1994] and consists in modeling individual pixels present in the HSI as a linear mixture of archetypes (here the endmembers). Interestingly, archetypes are defined as convex combinations of the individual pixels present in the HSI. Motivated by the good interpretability of AA, Zhao et al.  [zhao_multiple_2015] have proposed a kernelized variant, which enables greater modeling flexibility at the cost of an extra hyper-parameter (the bandwidth of the RBF kernel functions). Notably, they use the relaxation form introduced in [morup_archetypal_2012] to handle the cases where endmembers are located outside of the convex hull of the data.

Inspired by the robust AA formulation introduced in [chen_fast_2014]

that considers the Huber loss instead of the squared loss to reduce the impact of noise and outliers, Sun

et al. [sun_pure_2017] have introduced a robust kernel archetypal analysis (RKADA) method for blind hyperspectral unmixing. Their approach refines the standard AA formulation by adding a binary sparse constraint on the pixels contributions. Consequently, their method ensures that each endmember actually corresponds to real pixels rather than a sparse linear mixture of all pixels. The resulting optimization algorithm is a block-coordinate descent scheme that uses an active-set method [nocedal_numerical_1999] to solve the smooth least squares optimization subproblems with a simplicial constraint. As noted by [chen_fast_2014], the active-set algorithm can be seen as an aggressive strategy that can leverage the underlying sparsity of the subproblems solutions.

Recently, Xu et al. [xu_l1_2022] have proposed an sparsity-constrained AA algorithm to increase the sparsity of the abundances. Rather than using an active-set algorithm, they adopt a projected gradient method [lin_projected_2007] to solve the resulting optimization subproblems inside an alternating scheme. However their formulation leads to abundances that do not sum to one, which reduces the physical interpretability of the unmixing results. Moreover, their approach relies on other algorithms including PPI [boardman_mapping_1995] for pre-processing and initialization which increases the complexity of their algorithm.

Autoencoders (AE)

Interest in deep learning has been rapidly growing in many fields including remote sensing [zhang_deep_2016]

thanks to the increasing amount of available data, rising computational power, and the development of suitable algorithms. Autoencoders for hyperspectral unmixing have been widely used since the pioneering work of Licciardi

et al. [licciardi_unsupervised_2012]

. In a nutshell, the encoder transforms an input into abundances maps that are then linearly decoded by the decoder. The encoder activations correspond to the fractional abundances while the parameters of the linear decoder layer correspond to the endmembers spectra. While the AE described above corresponds to a basic fully connected architecture, refinements have been introduced such as using a loss functions that involves the spectral angle distance and a dedicated sparsity penalty term as in

[ozkan_endnet_2018], or convolutions to take advantage of the spatial structure in HSI [palsson_convolutional_2020]. See [palsson_blind_2022] for a comprehensive study on the various existing autoencoder architectures for blind hyperspectral unmixing. Finally, it is worth mentioning the recent effort by Rasti et al. [rasti_misicnet_2022] to incorporate a geometrically motivated penalty into deep learning-based approaches. Their method, called minimum simplex convolutional network (MiSiCNet), uses both geometrical and spatial information, by means of a convolutional encoder-decoder architecture, to tackle unsupervised hyperspectral unmixing.

Although the surge of deep learning methods has led to improving overall unmixing performances, training a single network per image remains costly due to the required extensive hyperparameters search. Moreover, training such networks requires GPU, and yet it can be considerably slower than traditional methods that run on CPU.

Iii Methodology

In this section, we present our model formulation before describing its optimization. Next, we mention implementation details required to run our approach. Finally, we explain how to leverage our efficient GPU implementation and propose a procedure to make our model robust to hyper-parameter choices and thus easy to use in practice.

Iii-a Model formulation

Let in be a hyperspectral image (HSI) where is the number of channels and is the number of pixels. We assume a linear mixing model such that


where in is the mixing matrix composed of the discretized spectra of endmembers over channels, and in is the abundance matrix that describes, for each pixel, the fraction relative to each endmember. Finally, represents some classical noise occuring in hyperspectral imaging.

We are interested in tackling the blind unmixing setting where both the mixing and abundance matrices are unknown. However, we assume that the number of endmembers is known. Since represents the endmembers reflectance over  channels, it follows that its elements should be non-negative, i.e. , . This is also the case for , whose columns represent the abundances for each pixel, which in addition, should sum to one. In other words, each column of belongs to the simplex defined as:


The previous model and constraint yields the classical optimization problem E,A12X- EA∥_F^2. E ≥0 a_i∈Δ_p  for  1 ≤i ≤N, which is a variant of non-negative matrix factorization.

The archetypal analysis formulation we consider adds a constraint and forces the endmembers to be convex combinations of the pixels present in . Formally, it simply means that there exists a matrix in such that and the columns of are in the simplex (their entries are non-negative and sum to one). This formulation yields the archetypal analysis formulation [cutler_archetypal_1994]: A,B12X- XBA∥_F^2, a_i∈Δ_p  for  1 ≤i ≤N b_j∈Δ_N  for  1 ≤j ≤p where and .

Iii-B Optimization

Minimizing (III-A) is difficult since the objective function is not jointly convex in . However, it is convex with respect to (w.r.t.) one of the variables when the other one is fixed [morup_archetypal_2012]. Consequently, it is natural to consider an alternating minimization scheme between and , which is guaranteed to asymptotically provide a stationary point of the problem [bertsekas_nonlinear_1997]

. Yet, because of the non-convexity of the objective, the choice of optimization algorithm is important as different stationary points may not have the same quality in terms of statistical estimation. In other words, different optimization procedures may have a different “implicit bias”, a phenomenon that is currently the focus of a lot of attention in machine learning 

[pesme_implicit_2021], especially for deep learning models, suggesting that it may also be key for blind HU.

In this paper, we adopt an optimization scheme called entropic gradient descent which has shown better theoretical properties in terms of convergence rates than gradient descent when optimizing over the simplex [beck_mirror_2003]. Our second motivation was the simplicity of the algorithm, which does not require performing orthogonal projections on the simplex, nor dealing with complicated active-set rules as in [chen_fast_2014]. This enables us to take advantage of modern GPUs. We now present mathematical details before discussing implementation.

Iii-B1 Entropic descent algorithm (EDA)

As noted in [beck_mirror_2003], the entropic descent algorithm is simply a gradient descent method with a particular choice of a Bregman-like distance [bregman_relaxation_1967] generated by a specific function, here the negative entropy. As explained in [teboulle_entropic_1992], the choice of an appropriate distance-like function tailored to the geometry of the constraints, here the simplex, provide theoretical benefits in terms of convergence rates. The minimization over the simplex is precisely the reason why we adopt the negative entropy to derive the updates of the alternating minimization scheme. Formally, the negative entropy function  is defined as follows: for in ,


with the convention that .

exhibits several desirable properties, including convexity on . This enables us to consider , the Bregman divergence [bregman_relaxation_1967] w.r.t. , defined, for and in


which is also called the Kullback-Leibler divergence. By convexity of

, we naturally have . We start by considering the following generic optimization problem:


where is a convex Lipschitz continuous function with a gradient at denoted by and corresponds to the -dimensional simplex (2).

The algorithm to solve (5) we consider performs the following iterates, given ,


If was simply a squared Euclidean norm, we would recover a projected gradient descent algorithm. By using instead the Bregman distance function derived from the negative entropy, we obtain the entropic descent method. Here

measures the distance between two vectors in

. As such, the next iterate should aim for the optimal balance between taking a gradient step and moving the least from the current iterate according to the geometry induced by , with controlling this trade-off.

We will now see how the negative entropy yields explicit steps that effectively enforce the simplicial constraints. Indeed, after a short classical calculation (see, for instance [beck_mirror_2003]), it is possible to show that the update (6) is equivalent to the following one: for all in ,


where is the -th entry of the vector and similarly, is the -th entry of .

It is thus easy to see that the iterates stay in the simplex , and it is possible to show (see [beck_mirror_2003]) that the sequence converges to the set of solutions of (5) with the appropriate step-sizes . Interestingly, the update (7) can be implemented efficiently by using the softmax function, assuming the entries of are positive:


where is the vector carrying the logarithm of each entry of . This update immediately suggests a high compatibility with GPUs.

Iii-B2 Alternating optimization scheme

We are now in shape to describe an alternating optimization scheme, consisting of performing, alternatively, updates of EDA for minimizing when is fixed, and vice versa using updates. This strategy is presented in Algorithm 1. Formally, by replacing the generic function  with the functions corresponding to the two optimization subproblems, we obtain the following updates:


where is the matrix carrying the logarithm of each entry of while softmax is applied in parallel on the columns of . Note that when and are initialized with positive values, these iterates keep them positive.

1:Input: -normalized data in ; (number of endmembers); (number of outer iterations); (number of inner iterations for ); (number of inner iterations for ).
2:Initialize using (12).
3:Initialize using (13).
4:Set according to (14).
5:Set according to (15).
6:for  do
7:     for  do
9: is applied element-wise;
10: softmax is applied along the first dimension.
11:     end for
12:     for  do
14:     end for
15:end for
17:Return , Estimated endmembers, abundances.
Algorithm 1 Entropic Descent Archetypal Analysis (EDAA)

Iii-C Implementation details


The input image is -normalized for each pixel: for all in ,


where denotes the -th pixel. This step is important to gain invariance to illumination changes.


We initialize the abundance matrix uniformly,


where denotes a -dimensional vector of ones. This corresponds to the maximal entropy configuration for each pixel. The entropy for each pixel will naturally decrease as a result of the optimization, but the high entropy of the initialization will have a regularization effect.

The initialization of the pixel contribution matrix is then also close to uniform. Nevertheless we, introduce a small random perturbation which is necessary to break the symmetry between the columns of (otherwise, the updates of and  will keep them invariant). Concretely, the entries of

are randomly sampled according to the uniform distribution on

, . Next, they are rescaled by a factor . Finally, we apply the softmax function on each column so that the columns of belong to the simplex , for in ,


where . In practice, we observe that such an initialization leads to a matrix that is very close to an uniform initialization .

Step sizes

We use constant step sizes and , for and respectively.


where is a value in and

is the largest singular value of the matrix

. We recover the classical convergence of gradient descent with fixed step size [nesterov_introductory_2003] up to the factor , since corresponds to the Lipschitz constant of the sub-problem related to (III-A) when minimizing w.r.t , being fixed. Having in allows us to use slightly different step sizes and yields better performance in practice. Note that our model selection procedure, presented later, will automatically choose the right parameter , thus removing the burden for the user of having to deal with an extra hyper-parameter. Finally, is simply a rescaled version of to account for the matrices being of transposed dimensions:


For all experiments, we set and as it provides a good trade-off between convergence speed and unmixing accuracy. Note also that these hyper-parameters are robust to different real datasets as detailed in section IV.

Iii-D Model selection procedure

As stated above, the initialization of the matrix is random, leading to different solutions for each run of the algorithm since the overall optimization problem is non convex. Besides, we allow for different step-sizes , which we draw randomly from the set in practice, Since the convergence of the algorithm is very fast (see experimental section for concrete benchmarks), we are able to provide a large diversity of solutions given a dataset by running times our method with different random seeds, while keeping the global computational complexity reasonable. A major question we address next is then how to select optimally the best solution in terms of unmixing accuracy.

For this, we take inspiration from classical model selection and sparse estimation theory [hastie_elements_2009]. First, we measure the fit of each solution in terms of residual error , choosing the -norm which is known to be more robust to outliers than the mean squared error. Second, we select the set of solutions that are in the same ball park as the best solution we have obtained in terms of fit. This selection process is illustrated by the red dotted line in Figure 4, while the precise criterion is described in Algorithm 2.

From the subset of solutions with good fit, we then choose the one whose endmembers have the best incoherence, a desired property, which is classical in the theory of sparse estimation [elad_generalized_2002, gribonval_sparse_2003, mairal2014sparse]. Indeed, dictionaries (here endmembers) with more incoherence will benefit from better theoretical guarantees in terms of estimation of abundances, making it a natural criterion for model selection in the context of unmixing. Formally, the coherence is simply defined as the maximal pairwise spectral correlation between the estimated endmembers. More precisely, for the endmembers matrix, the coherence is defined as:


To the best of our knowledge, this is the first time the coherence is used as a model selection criterion for archetypal analysis. Our experiments, see next section, show that it is highly effective.

In summary, we automatically select the model whose endmembers have the lowest maximal pairwise spectral correlation among the ones that have a good fit. This strategy is illustrated in figure 4 and described in Algorithm 2. In the experiments, the number of runs was set to 50.

(a) Samson
(b) Jasper Ridge
(c) APEX
Fig. 4: Illustration of the model selection procedure on three datasets using runs. Runs are illustrated by blue dots and the selected one is in black. The selected run is the one with lowest coherence under the dashed red line representing the fit threshold, see Alg. 2.
1:Input: (number of runs); -normalized data in ; (number of endmembers); (number of outer iterations); (number of inner iterations for ); (number of inner iterations for ).
2:for  do
3:     Set random seed .
4:      See (1)
6:     Compute coherence on . See (16)
7:end for
9: Subset of models.
11:Return: , .
Algorithm 2 Model Selection Procedure

Iv Experiments

We have performed experiments on six standard real datasets whose descriptions are given below.

Iv-a Hyperspectral data description

  1. Samson: The Samson222downloaded at hyperspectral image is a 95x95 pixels sub-region of a larger image captured using 156 bands spanning from 401 to 889 nm. Three main materials have been identified: Tree, Soil and Water. Note that we use a different ground truth from [rasti_misicnet_2022] that we selected for its sharper details on the abundances.

  2. Jasper Ridge: The Jasper Ridge22footnotemark: 2hyperspectral image is a 100x100 pixels sub-region of a larger image initially captured using 224 bands spanning from 380 to 2500 nm. In total, 198 bands remain as 26 were removed as a pre-processing step due to dense water vapor and atmospheric effects. Four main materials have been identified: Tree, Dirt, Water and Road.

  3. Urban4 and Urban6: The Urban22footnotemark: 2hyperspectral image is a 307x307 pixels image collected by the Hyperspectral Digital Image Collection Experiment (HYDICE) [rickard_hydice_1993] sensor using 210 bands spanning from 400 to 2500 nm. In total, 162 bands remain as 48 were removed as a pre-processing step due to dense water vapor and atmospheric effects. There exists three versions of this dataset w.r.t. the number of endmembers. In this study, we focus on the two extremes: Urban4 contains 4 endmembers (Asphalt Road, Grass, Tree and Roof) and Urban6 contains two additional materials: Dirt and Metal, making it more challenging.

  4. APEX: The APEX [schaepman_advanced_2015] hyperspectral image that we consider in this paper is a 111x122 pixels cropped region333downloaded at of a larger image captured over 285 bands spanning from 413 to 2420nm. Four main materials were identified: Road, Tree, Roof and Water.

  5. Washington DC Mall: The Washington DC Mall (WDC) hyperspectral dataset33footnotemark: 3consists in a 319x292 pixels image captured by the HYDICE [rickard_hydice_1993] sensor over 191 bands spanning from 400 to 2400 nm. Six main materials were identified: Grass, Tree, Roof, Road, Water and Trail.

According to [zhu_hyperspectral_2017] (Samson, Jasper Ridge and Urban) and [rasti_misicnet_2022] (APEX and WDC), the endmembers spectra were manually selected from the images and the ground truth abundances were set by FCLSU. Illustrations of the datasets and their ground truth endmembers are available in the supplementary material.

Iv-B Experimental setup

We compare our approach to five competitive methods from different unmixing categories:

  • Geometrical unmixing baseline: FCLSU [heinz_fully_2001] using VCA [nascimento_vertex_2005] to extract endmembers. Our implementation of the FCLSU algorithm uses the DecompSimplex routine implemented in SPAMS444 This method relies on the active-set algorithm [nocedal_numerical_1999] that enables significantly faster convergence than generic quadratic programming solvers by leveraging the underlying sparsity of the abundances as noted by [chen_fast_2014].

  • Deep learning unmixing: Endnet555implementation at [ozkan_endnet_2018] using VCA [nascimento_vertex_2005] to initialize the endmembers and MiSiCNet666implementation at [rasti_misicnet_2022].

  • NMF-based blind unmixing: non-negative matrix factorization quadratic minimum volume (NMF-QMV)777implementation at [zhuang_regularization_2019] using the boundary term as the quadratic minimum volume penalty and AA [cutler_archetypal_1994] using the implementation from [chen_fast_2014] developed in SPAMS.

To quantitatively evaluate the performance of the selected methods, we consider two metrics that are computed both globally and individually for each endmember. On one hand, we measure the quality of the generated abundances by means of the abundances root mean square error (RMSE) in percent between the ground truth and the estimated abundances:


On the other hand, we assess the quality of the estimated endmembers spectra by using the spectral angle distance (SAD) in degrees between the ground truth and the generated endmembers:


where denotes the inner product and denotes the -th column of , i.e. the spectrum of the -th endmember.

Iv-C Unmixing experiments

FCLSU [nascimento_vertex_2005, heinz_fully_2001] Endnet [ozkan_endnet_2018] MiSiCNet [rasti_misicnet_2022] NMF-QMV [zhuang_regularization_2019] AA EDAA
Samson Soil 11.28 11.61 6.47 13.67 6.16 5.74
Tree 9.13 7.72 5.38 8.4 4.00 3.77
Water 5.05 6.86 3.47 11.61 2.30 2.59
Overall 8.88 8.97 5.25 11.44 4.44 4.24
Jasper Ridge Dirt 21.23 18.26 21.68 19.97 10.24 7.32
Road 24.72 29.40 24.94 26.13 9.79 7.61
Tree 11.20 4.00 3.41 14.55 6.32 6.63
Water 13.61 22.38 7.07 19.81 6.77 5.69
Overall 18.52 20.70 16.98 20.53 8.46 6.85
Urban4 Road 30.54 12.04 10.30 20.25 11.12 8.62
Grass 32.99 17.94 12.35 20.22 11.55 9.28
Roof 15.40 11.28 8.01 13.29 7.28 6.37
Tree 20.02 11.69 8.78 21.56 8.62 6.27
Overall 25.78 13.52 10.00 19.11 9.80 7.75
Urban6 Road 31.61 17.44 19.18 24.81 19.23 11.39
Grass 23.62 31.47 18.84 27.97 9.73 18.61
Roof 13.00 9.35 7.41 16.39 10.92 6.01
Tree 16.14 15.50 11.72 19.95 14.45 9.85
Dirt 25.06 30.67 23.95 24.31 27.76 15.94
Metal 12.99 25.21 33.67 13.40 17.53 15.86
Overall 21.54 23.09 16.27 21.74 17.66 13.63
APEX Road 33.31 29.32 32.66 31.43 34.12 16.54
Tree 20.97 18.18 19.97 24.62 22.49 14.48
Roof 14.15 15.88 18.42 14.73 18.54 11.27
Water 18.03 17.47 16.88 17.99 16.90 16.83
Overall 22.77 20.90 22.86 23.10 23.98 14.94
WDC Grass 30.90 27.35 31.61 34.69 31.92 32.53
Tree 22.42 35.98 23.64 19.90 20.13 11.46
Road 27.90 38.49 34.91 22.49 38.87 13.97
Roof 8.71 27.04 11.98 19.81 20.89 29.31
Water 17.76 12.94 14.72 20.34 14.06 9.63
Trail 12.80 12.63 12.07 12.36 15.24 13.19
Overall 21.57 27.64 23.39 22.60 25.17 20.46
TABLE I: Abundances RMSE on six real datasets. The best results are shown in bold. The second best results are underlined.
FCLSU [nascimento_vertex_2005, heinz_fully_2001] Endnet [ozkan_endnet_2018] MiSiCNet [rasti_misicnet_2022] NMF-QMV [zhuang_regularization_2019] AA EDAA
Samson Soil 2.76 0.61 1.21 4.90 0.78 1.64
Tree 3.05 1.93 3.38 5.34 1.80 1.98
Water 7.15 1.48 5.36 11.14 1.38 1.31
Overall 4.32 1.34 3.32 7.13 1.32 1.64
Jasper Ridge Dirt 13.03 1.63 4.26 12.40 2.44 2.74
Road 40.39 32.85 20.04 45.66 8.00 3.10
Tree 11.16 1.39 1.27 14.46 4.34 4.23
Water 13.24 3.21 4.18 14.53 3.29 2.80
Overall 19.46 9.77 7.44 21.77 4.52 3.22
Urban4 Road 15.40 6.40 5.73 14.51 3.73 6.01
Grass 24.18 3.09 5.84 16.39 1.81 2.14
Roof 47.56 3.76 16.10 36.31 15.59 10.49
Tree 19.82 2.32 4.60 22.48 3.51 2.81
Overall 26.74 3.89 8.07 22.42 6.16 5.36
Urban6 Road 13.43 3.26 7.47 26.60 6.74 4.85
Grass 22.30 4.13 10.97 21.63 6.82 2.17
Roof 15.65 17.76 13.97 15.55 18.07 13.70
Tree 20.70 7.72 9.99 23.31 7.18 8.92
Dirt 69.81 17.42 19.57 23.60 11.08 13.33
Metal 39.35 7.04 9.75 68.73 40.97 4.52
Overall 30.21 9.56 11.95 29.90 15.14 8.64
APEX Road 40.23 14.46 33.02 54.53 37.32 6.83
Tree 14.13 7.53 3.16 16.06 2.39 7.68
Roof 8.25 4.36 11.31 7.98 10.20 7.50
Water 7.15 2.83 6.02 9.71 2.95 2.21
Overall 17.44 7.30 13.38 22.07 13.21 6.06
WDC Grass 17.40 3.54 17.46 34.36 4.59 8.11
Tree 23.73 12.85 12.36 17.70 10.93 1.81
Road 32.56 26.76 33.20 17.28 46.73 7.67
Roof 34.84 13.70 28.87 44.87 43.26 50.97
Water 4.78 1.75 1.57 19.74 1.23 1.08
Trail 9.94 1.49 3.24 9.60 5.32 4.75
Overall 20.54 10.02 16.11 23.93 18.68 12.40
TABLE II: Endmembers SAD on six real datasets. The best results are shown in bold. The second best results are underlined.
Fig. 5: Estimated endmembers on the Jasper Ridge dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 6: Estimated abundances on the Jasper Ridge dataset. The ground truth is displayed on the right side.
Fig. 7: Estimated endmembers on the Urban6 dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 8: Estimated abundances on the Urban6 dataset. The ground truth is displayed on the right side.
Fig. 9: Estimated endmembers on the APEX dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 10: Estimated abundances on the APEX dataset. The ground truth is displayed on the right side.
Fig. 11: Estimated endmembers on the Washington DC dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 12: Estimated abundances on the Washington DC dataset. The ground truth is displayed on the right side.

Tables IV-C and IV-C report the unmixing accuracy in terms of abundances RMSE (17) and endmembers SAD (18) on six standard real datasets. The datasets are arbitrarily ranked based on their difficulty. For a fair comparison, all methods were evaluated on the -normalized data (11) which induces slight changes compared to the results reported in [rasti_misicnet_2022].

Overall EDAA obtains the best abundances RMSE on all six datasets. As the unmixing difficulty increases, the gap between EDAA and the other methods grows larger. In particular, plain AA is not able to tackle more complicated mixing scenarios as in Urban6, APEX and WDC. We argue that our model selection technique is instrumental in avoiding collapsing runs in which endmembers spectra are highly correlated. This is underlined by the overall competitive SAD results obtained by EDAA across datasets, especially on Urban6, APEX and WDC where Endnet is the only contender.

The FCLSU baseline based on VCA obtains rather poor results except for WDC. This is likely due to the pure pixel assumption. Indeed, VCA selects a single pixel to represent the endmembers spectra, which is too stringent in real scenarios where spectral variability is ubiquitous.

Despite its quadratic minimum volume boundary term, NMF-QMV generally obtains worse results than the FCLSU baseline. Since it operates the unmixing in a subspace, NMF-QMV cannot prevent the endmembers spectra from having negative values, which breaks the physical interpretability of the estimates and subsequently harms the unmixing performance. This phenomenon can be observed in figures 5, 7, 9 and 11 for several endmembers. The associated abundances in figures 6, 8, 10 and 12 show that NMF-QMV produces maps that are too uniform and lack sparsity.

In contrast, Endnet achieves very good results in terms of SAD but tends to create overly sparse abundances which hinders its performance in terms of abundances RMSE. For instance, as can be seen in figures 11 and 12, the Road endmember is overlooked by Endnet even though EDAA recovers it neatly. Likewise, in figure 10, the Road endmember spreads too much compared to EDAA which appears closer to the ground truth.

MiSiCNet gives better unmixing results than Endnet in terms of abundances RMSE except for APEX although the SAD results falls in favor of Endnet except for Jasper Ridge. This is likely due to Endnet using the spectral angle distance on the input data in its loss which helps in achieving better SAD accuracy. However a good SAD is not sufficient to obtain good abundance maps, an area where MiSiCNet tends to shine as it incorporates spatial information by using convolutional filters and implicitly applying a regularizer on abundances.

Finally, AA is a very competitive method for the three simplest datasets yet its performance decreases drastically when dealing with more complicated mixtures. For example, in figures 5 and 6 only AA and EDAA are able to uncover the Road endmember in Jasper Ridge whereas all the other techniques fail. Yet, in figures 7 and 8, AA completely misses the Metal and Dirt endmembers in Urban6 while EDAA correctly identify all endmembers and produce reasonable abundance maps. Unlike EDAA, the random matrices initialization in AA cannot be leveraged to create an appropriate model selection procedure due to its slowness.

Additional qualitative results for the Samson and Urban4 datasets can be found in the supplementary material.

Iv-D Computational cost

Table IV-D reports the processing times for the different unmixing algorithms on the six real datasets. NMF-QMV was implemented in Matlab (2020b) while FCLSU, Endnet, MiSiCNet and the AA variants were implemented in Python (3.8). NMF-QMV, FCLSU and AA run on CPU whereas Endnet, MiSiCNet and EDAA run on GPU. The processing times were obtained using a computer with an Intel(R) Xeon(R) Silver 4110 processor (2.10 GHz), 32 cores, 64GB of memory, and a NVIDIA GeForce RTX (2080 Ti) graphical processing unit. The table shows that FCLSU is clearly faster than the other unmixing techniques, however it is a supervised method that relies on an endmembers extraction algorithm. In this case, VCA is used which is also fast. The deep learning methods are the slowest techniques despite running on GPU. Interestingly, EDAA requires a lower computational cost than NMF-QMV and AA although our approach consists in aggregating 50 runs obtained iteratively. For example, it takes on average 1.5 seconds for EDAA to perform a single unmixing task on the Urban6 dataset, which is three times faster than FCLSU. This demonstrates the efficiency of EDAA which allows us to use an adequate model selection procedure over several runs.

width=0.5 FCLSU Endnet MiSiCNet NMF-QMV AA EDAA Samson 0.3 560 80 20.1 17.6 11.2 JasperRidge 0.4 680 90 22.3 21.9 9.6 Urban4 3.6 720 411 112.5 237.6 66.3 Urban6 4.4 1000 417 158.4 343.0 75.2 APEX 0.6 720 92 27.2 44.8 15.4 WDC 4.0 1000 409 174.4 361.8 81.0

TABLE III: Processing times in seconds on six real datasets. The best results are in bold and the second best underlined. EDAA includes the model selection procedure over runs.

Iv-E Ablation study

Finally, we study the sensitivity to hyper-parameters for Algorithm 1 and 2 in figure 16 where the Y-axis corresponds to the overall abundances RMSE. Given a fixed computational budget of 1000 updates, figure 16 (a) shows that the hyper-parameters of EDAA are robust provided that the number of runs in the model selection is large enough (here 100). Only the two extremes configurations (, and , ) are slightly worse, especially on Urban6. For the remaining experiments, we use . In figure 16 (b), we see that the number of outer iterations is quite stable except for WDC which requires more updates (1000, i.e. ). Finally, we study the importance of the number of runs from which to select the best candidate in figure 16 (c). We observe that the model selection procedure requires at least 50 runs to obtain very good performances, hence we use in our unmixing experiments. On unknown datasets where real-time unmixing is not required, it is advised to use a large number of runs (at least 100) to ensure that the model selection procedure yields a good candidate. Detailed results for both abundances RMSE and SAD metrics are available in the supplementary material.

Fig. 16: Sensitivity analysis to the hyperparameters in Algorithms 1 and 2 measured in global abundances RMSE: (a) Varying inner and outer iterations and for a constant number of updates (1000) and runs , (b) Varying outer iterations using and (c) Varying number of runs using and .

V Conclusion

We have proposed a new algorithm based on archetypal analysis for blind hyperspectral unmixing. We have shown how to take advantage of its efficient GPU implementation in order to develop an adequate model selection procedure to obtain state-of-the-art performances. Remarkably, our simple and easy-to-use approach considerably improves the unmixing results on a comprehensive collection of standard real datasets. Finally, we have made our results reproducible by releasing an open source codebase which also includes the plain archetypal analysis variant presented in this study.

Acknowledgments and Funding

This project was supported by the ERC grant number 714381 (SOLARIS project) and by ANR 3IA MIAI@Grenoble Alpes (ANR-19-P3IA-0003).


Supplementary material A Detailed datasets description

In this section, we provide illustrations of the unmixing datasets used in the experiments. Each dataset is described with a false-color RGB image alongside the -normalized ground truth endmembers.

Fig. 19: Samson dataset: (a) False colors RGB image (Red: 83rd band, Green: 43, Blue: 9) (b) -normalized ground truth endmembers.
Fig. 22: JasperRidge dataset: (a) False colors RGB image (Red: 130th band, Green: 50, Blue: 5) (b) -normalized ground truth endmembers.
Fig. 25: Urban dataset: (a) False colors RGB image (Red: 130th band, Green: 70, Blue: 30) (b) -normalized ground truth endmembers for Urban6. Urban4 corresponds to the first 4 materials.
Fig. 28: APEX dataset: (a) False colors RGB image (Red: 200th band, Green: 100, Blue: 10) (b) -normalized ground truth endmembers (#0: Road, #1: Tree, #2: Roof, #3: Water).
Fig. 31: Washington DC Mall dataset: (a) False colors RGB image (Red: 150th band, Green: 75, Blue: 20) (b) -normalized ground truth endmembers (#0: Grass, #1: Tree, #2: Road, #3: Roof, #4: Water, #5: Trail).

Supplementary material B Additional results

Unmixing experiments

We provide qualitative results on the Samson and Urban4 datasets in figures 32, 33, 34 and 35.

Fig. 32: Estimated endmembers on the Samson dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 33: Estimated abundances on the Samson dataset. The ground truth is displayed on the right side.
Fig. 34: Estimated endmembers on the Urban4 dataset. Ground truth endmembers are displayed in blue while their estimates are in orange.
Fig. 35: Estimated abundances on the Urban4 dataset. The ground truth is displayed on the right side.
Ablation study

We report the detailed results obtained in the ablation study. For each dataset, the overall abundances RMSE and SAD are computed for all configurations. Table IV studies the sensitivity of the inner and outer iterations , and in EDAA given a fixed computational budget. Table B studies the sensitivity of the outer iterations in EDAA when we decrease the computational budget. Finally, table B underlines the importance of the number of runs in the model selection procedure.

width= , , , , , Samson RMSE 4.42 4.24 4.15 4.48 4.34 SAD 1.65 1.64 1.61 1.78 1.68 Jasper Ridge RMSE 6.70 6.85 7.80 7.59 8.79 SAD 3.48 3.22 4.28 3.12 4.79 Urban4 RMSE 7.40 7.51 7.43 7.90 9.49 SAD 5.72 5.87 5.53 6.05 5.46 Urban6 RMSE 17.92 13.52 12.35 12.71 17.92 SAD 11.71 7.95 8.74 8.52 8.79 APEX RMSE 14.25 14.66 14.20 14.15 16.46 SAD 8.17 6.29 7.73 8.06 5.26 WDC RMSE 22.91 20.47 22.16 20.92 21.20 SAD 12.59 12.33 16.05 13.46 13.51

TABLE IV: Sensitivity to hyperparameters of EDAA for a constant number of updates (1000). The abundances RMSE and SAD metrics are computed globally. The best results are in bold and the second best are underlined.
Samson RMSE 4.24 4.34 3.97
SAD 1.64 1.69 1.46
Jasper Ridge RMSE 6.85 5.90 6.44
SAD 3.22 3.06 3.19
Urban4 RMSE 7.51 7.46 7.72
SAD 5.87 5.01 4.67
Urban6 RMSE 13.52 13.54 13.99
SAD 7.95 7.85 7.93
APEX RMSE 14.66 14.38 14.12
SAD 6.29 6.58 6.99
WDC RMSE 20.47 22.81 22.89
SAD 12.33 16.97 16.87
TABLE V: Sensitivity to the number of outer iterations of Algorithm EDAA with . The abundances RMSE and SAD metrics are computed globally. The best results are in bold and the second best are underlined.
Samson RMSE 5.05 3.90 4.24 4.24
SAD 1.94 1.43 1.64 1.64
Jasper Ridge RMSE 10.27 10.27 6.85 6.85
SAD 3.73 3.73 3.22 3.22
Urban4 RMSE 8.17 8.06 7.75 7.51
SAD 5.43 5.41 5.36 5.87
Urban6 RMSE 16.50 16.50 13.63 13.52
SAD 8.16 8.16 7.92 7.95
APEX RMSE 23.84 24.17 14.94 14.66
SAD 12.57 12.88 6.06 6.29
WDC RMSE 24.39 24.39 20.46 20.47
SAD 11.52 11.52 12.40 12.33
TABLE VI: Sensitivity to the number of runs of the model selection procedure with and . The abundances RMSE and SAD metrics are computed globally. The best results are in bold and the second best are underlined.