I Introduction
Hyperspectral imaging (HSI) [landgrebe_hyperspectral_2002, plaza_recent_2009, schaepman_earth_2009, goetz_imaging_1985, green_imaging_1998] consists of measuring the electromagnetic spectrum in a scene by using multiple narrow spectral bands. Thanks to its richer spectral information compared to traditional RGB images, HSI enables more accurate materials identification, leading to a broad range of applications including crop monitoring in agriculture [adao_hyperspectral_2017], waste sorting [karaca_automatic_2013], food safety inspection [gowen_hyperspectral_2007], or mineralogy [fox_applications_2017].
Remote sensing [clark_imaging_2003, bioucasdias_hyperspectral_2013], such as airborne or satellite imagery, yields hyperspectral images whose pixels capture several objects or materials. As such, each pixel can include several pure spectral components (called endmembers), mixed in different proportions [ghamisi_advances_2017]
. Any further analysis hence requires identifying and disentangling endmembers present in a scene before estimating their respective proportions, or fractional
abundances, within each pixel of the HSI [parra_unmixing_1999]. Since the endmembers spectrum signatures are not known beforehand and must be estimated from data, this operation is named blind hyperspectral unmixing (HU) [keshava_spectral_2002, bioucasdias_hyperspectral_2012] owing to its link with blind source separation [comon_handbook_2010].In this paper, we adopt a linear mixing model since it is often relevant in remote sensing scenes where mixtures occur between macroscopic materials. Therefore, we assume that each observed pixel can be represented as a linear combination of endmembers and some additive noise. In other words, we are interested in tackling unsupervised linear HU
[parra_unmixing_1999].Further assumptions on the nature of endmembers are generally needed to estimate meaningful spectra. For instance, it can be assumed that there exists at least one pure pixel for each material present in the scene. The problem then requires finding these pure pixels within the original image. The pure pixel assumption is at the core of several geometrical endmember extraction methods including pixel purity index (PPI) [boardman_mapping_1995], NFINDR [winter_nfindr_1999] and vertex component analysis (VCA) [nascimento_vertex_2005]. Once endmembers have been extracted, abundances can be estimated by minimizing the least squares errors between the original input spectra and the linearly reconstructed spectra as long as the abundances fractions satisfy the two physical constraints stating that they should be nonnegative and sum to one for each pixel [heinz_fully_2001]. That being said, pure pixels are often missing in real scenarios. In the absence of pure pixels and in the case of linear models, endmembers and abundances can be simultaneously estimated by solving a constrained or penalized nonnegative matrix factorization problem (NMF) [lee_algorithms_2000]. For example, the authors of [zhuang_regularization_2019] have proposed a formulation that involves a data fidelity term and a minimum volume regularization term on endmembers, whose minimization consists in alternating between solving for endmembers and abundances.
In this work, we do not assume the existence of pure pixels as they are often missing in real data, since, for instance, the spectral signatures of endmembers in HSI can be significantly affected by various changes in atmospheric, illumination, and environmental conditions within the scene [borsoi_spectral_2021]. We mitigate the effect of spectral variability by (i) normalizing each pixel by the norm of its spectrum as a preprocessing step and (ii) modeling endmembers as convex combinations of pixels present in the scene. Not only HSI pixels are linear combinations of the estimated endmembers under the linear mixing model, but the estimated endmembers are also convex combinations of pixels. This corresponds to the archetypal analysis (AA) formulation introduced by Cutler and Breiman in [cutler_archetypal_1994]. Compared to plain NMF, AA provides greater interpretability given that each endmember can be traced back to pixels present in the HSI. In addition, since the estimated endmembers spectral signatures generally correspond to averaging the contributions of several pixels, the resulting estimation appears to be more robust to noise and spectral variability than pure pixel methods.
Ia Contributions and novelties

We propose a new hyperspectral unmixing algorithm relying on entropic gradient descent for archetypal analysis. Our approach (i) provides better solutions for hyperspectral unmixing than traditional alternating optimization schemes based on projected gradient methods or active set algorithms, and (ii) allows more efficient GPU implementations.

The efficiency of our method enables us to make a key practical contribution, consisting of an ensembling mechanism along with an appropriate model selection procedure, which makes our method almost parameterfree and thus easy to use (the only sensitive parameter is the number of endmembers we want to estimate).

Our approach, available in an opensource package^{1}^{1}1Code is available at https://github.com/inriathoth/EDAA, outperforms stateoftheart matrix factorization and deep learning methods on six standard real datasets.
The remainder of this paper is organized as follows. Section II presents related works for unsupervised linear hyperspectral unmixing. Section III introduces our method. Section IV presents experimental results highlighting the performance of our proposed approach. Finally, we conclude the article and underline future research directions in Section V.
Ii Related work on Unsupervised Linear Hyperspectral Unmixing
In this section, we present the framework of nonnegative matrix factorization (NMF) before introducing archetypal analysis (AA). Finally, we mention widely used deep learning architectures that tackle blind hyperspectral unmixing.
Nonnegative matrix factorization (NMF)
NMF [paatero_positive_1994, lee_algorithms_2000] is popular for hyperspectral unmixing. It consists of factorizing a matrix representing the HSI signal into the product of two matrices with nonnegative entries, where one factor carries the endmembers whereas the other one represents the abundances for each pixel (typically with the constraint that abundances sum to one). Several variants of NMF have been proposed for this task. For instance, the method of minimum volume constrained nonnegative matrix factorization (MVCNMF) of [miao_endmember_2007] uses a minimum volume term for endmembers that effectively gets rid of the pure pixel assumption. In minimum dispersion constrained NMF (MiniDisCo) [huck_minimum_2010], the regularization function is called dispersion
, which encourages endmembers with minimum variance to avoid degenerate solutions and improve the unmixing robustness to flat spectra. Other approaches design regularization functions for abundances such as
[zymnis_hyperspectral_2007, yang_blind_2010].Archetypal analysis (AA)
AA was first introduced in [cutler_archetypal_1994] and consists in modeling individual pixels present in the HSI as a linear mixture of archetypes (here the endmembers). Interestingly, archetypes are defined as convex combinations of the individual pixels present in the HSI. Motivated by the good interpretability of AA, Zhao et al. [zhao_multiple_2015] have proposed a kernelized variant, which enables greater modeling flexibility at the cost of an extra hyperparameter (the bandwidth of the RBF kernel functions). Notably, they use the relaxation form introduced in [morup_archetypal_2012] to handle the cases where endmembers are located outside of the convex hull of the data.
Inspired by the robust AA formulation introduced in [chen_fast_2014]
that considers the Huber loss instead of the squared loss to reduce the impact of noise and outliers, Sun
et al. [sun_pure_2017] have introduced a robust kernel archetypal analysis (RKADA) method for blind hyperspectral unmixing. Their approach refines the standard AA formulation by adding a binary sparse constraint on the pixels contributions. Consequently, their method ensures that each endmember actually corresponds to real pixels rather than a sparse linear mixture of all pixels. The resulting optimization algorithm is a blockcoordinate descent scheme that uses an activeset method [nocedal_numerical_1999] to solve the smooth least squares optimization subproblems with a simplicial constraint. As noted by [chen_fast_2014], the activeset algorithm can be seen as an aggressive strategy that can leverage the underlying sparsity of the subproblems solutions.Recently, Xu et al. [xu_l1_2022] have proposed an sparsityconstrained AA algorithm to increase the sparsity of the abundances. Rather than using an activeset algorithm, they adopt a projected gradient method [lin_projected_2007] to solve the resulting optimization subproblems inside an alternating scheme. However their formulation leads to abundances that do not sum to one, which reduces the physical interpretability of the unmixing results. Moreover, their approach relies on other algorithms including PPI [boardman_mapping_1995] for preprocessing and initialization which increases the complexity of their algorithm.
Autoencoders (AE)
Interest in deep learning has been rapidly growing in many fields including remote sensing [zhang_deep_2016]
thanks to the increasing amount of available data, rising computational power, and the development of suitable algorithms. Autoencoders for hyperspectral unmixing have been widely used since the pioneering work of Licciardi
et al. [licciardi_unsupervised_2012]. In a nutshell, the encoder transforms an input into abundances maps that are then linearly decoded by the decoder. The encoder activations correspond to the fractional abundances while the parameters of the linear decoder layer correspond to the endmembers spectra. While the AE described above corresponds to a basic fully connected architecture, refinements have been introduced such as using a loss functions that involves the spectral angle distance and a dedicated sparsity penalty term as in
[ozkan_endnet_2018], or convolutions to take advantage of the spatial structure in HSI [palsson_convolutional_2020]. See [palsson_blind_2022] for a comprehensive study on the various existing autoencoder architectures for blind hyperspectral unmixing. Finally, it is worth mentioning the recent effort by Rasti et al. [rasti_misicnet_2022] to incorporate a geometrically motivated penalty into deep learningbased approaches. Their method, called minimum simplex convolutional network (MiSiCNet), uses both geometrical and spatial information, by means of a convolutional encoderdecoder architecture, to tackle unsupervised hyperspectral unmixing.Although the surge of deep learning methods has led to improving overall unmixing performances, training a single network per image remains costly due to the required extensive hyperparameters search. Moreover, training such networks requires GPU, and yet it can be considerably slower than traditional methods that run on CPU.
Iii Methodology
In this section, we present our model formulation before describing its optimization. Next, we mention implementation details required to run our approach. Finally, we explain how to leverage our efficient GPU implementation and propose a procedure to make our model robust to hyperparameter choices and thus easy to use in practice.
Iiia Model formulation
Let in be a hyperspectral image (HSI) where is the number of channels and is the number of pixels. We assume a linear mixing model such that
(1) 
where in is the mixing matrix composed of the discretized spectra of endmembers over channels, and in is the abundance matrix that describes, for each pixel, the fraction relative to each endmember. Finally, represents some classical noise occuring in hyperspectral imaging.
We are interested in tackling the blind unmixing setting where both the mixing and abundance matrices are unknown. However, we assume that the number of endmembers is known. Since represents the endmembers reflectance over channels, it follows that its elements should be nonnegative, i.e. , . This is also the case for , whose columns represent the abundances for each pixel, which in addition, should sum to one. In other words, each column of belongs to the simplex defined as:
(2) 
The previous model and constraint yields the classical optimization problem E,A12∥X EA∥_F^2. E ≥0 a_i∈Δ_p for 1 ≤i ≤N, which is a variant of nonnegative matrix factorization.
The archetypal analysis formulation we consider adds a constraint and forces the endmembers to be convex combinations of the pixels present in . Formally, it simply means that there exists a matrix in such that and the columns of are in the simplex (their entries are nonnegative and sum to one). This formulation yields the archetypal analysis formulation [cutler_archetypal_1994]: A,B12∥X XBA∥_F^2, a_i∈Δ_p for 1 ≤i ≤N b_j∈Δ_N for 1 ≤j ≤p where and .
IiiB Optimization
Minimizing (IIIA) is difficult since the objective function is not jointly convex in . However, it is convex with respect to (w.r.t.) one of the variables when the other one is fixed [morup_archetypal_2012]. Consequently, it is natural to consider an alternating minimization scheme between and , which is guaranteed to asymptotically provide a stationary point of the problem [bertsekas_nonlinear_1997]
. Yet, because of the nonconvexity of the objective, the choice of optimization algorithm is important as different stationary points may not have the same quality in terms of statistical estimation. In other words, different optimization procedures may have a different “implicit bias”, a phenomenon that is currently the focus of a lot of attention in machine learning
[pesme_implicit_2021], especially for deep learning models, suggesting that it may also be key for blind HU.In this paper, we adopt an optimization scheme called entropic gradient descent which has shown better theoretical properties in terms of convergence rates than gradient descent when optimizing over the simplex [beck_mirror_2003]. Our second motivation was the simplicity of the algorithm, which does not require performing orthogonal projections on the simplex, nor dealing with complicated activeset rules as in [chen_fast_2014]. This enables us to take advantage of modern GPUs. We now present mathematical details before discussing implementation.
IiiB1 Entropic descent algorithm (EDA)
As noted in [beck_mirror_2003], the entropic descent algorithm is simply a gradient descent method with a particular choice of a Bregmanlike distance [bregman_relaxation_1967] generated by a specific function, here the negative entropy. As explained in [teboulle_entropic_1992], the choice of an appropriate distancelike function tailored to the geometry of the constraints, here the simplex, provide theoretical benefits in terms of convergence rates. The minimization over the simplex is precisely the reason why we adopt the negative entropy to derive the updates of the alternating minimization scheme. Formally, the negative entropy function is defined as follows: for in ,
(3) 
with the convention that .
exhibits several desirable properties, including convexity on . This enables us to consider , the Bregman divergence [bregman_relaxation_1967] w.r.t. , defined, for and in
(4) 
which is also called the KullbackLeibler divergence. By convexity of
, we naturally have . We start by considering the following generic optimization problem:(5) 
where is a convex Lipschitz continuous function with a gradient at denoted by and corresponds to the dimensional simplex (2).
The algorithm to solve (5) we consider performs the following iterates, given ,
(6) 
If was simply a squared Euclidean norm, we would recover a projected gradient descent algorithm. By using instead the Bregman distance function derived from the negative entropy, we obtain the entropic descent method. Here
measures the distance between two vectors in
. As such, the next iterate should aim for the optimal balance between taking a gradient step and moving the least from the current iterate according to the geometry induced by , with controlling this tradeoff.We will now see how the negative entropy yields explicit steps that effectively enforce the simplicial constraints. Indeed, after a short classical calculation (see, for instance [beck_mirror_2003]), it is possible to show that the update (6) is equivalent to the following one: for all in ,
(7) 
where is the th entry of the vector and similarly, is the th entry of .
It is thus easy to see that the iterates stay in the simplex , and it is possible to show (see [beck_mirror_2003]) that the sequence converges to the set of solutions of (5) with the appropriate stepsizes . Interestingly, the update (7) can be implemented efficiently by using the softmax function, assuming the entries of are positive:
(8) 
where is the vector carrying the logarithm of each entry of . This update immediately suggests a high compatibility with GPUs.
IiiB2 Alternating optimization scheme
We are now in shape to describe an alternating optimization scheme, consisting of performing, alternatively, updates of EDA for minimizing when is fixed, and vice versa using updates. This strategy is presented in Algorithm 1. Formally, by replacing the generic function with the functions corresponding to the two optimization subproblems, we obtain the following updates:
(9) 
(10) 
where is the matrix carrying the logarithm of each entry of while softmax is applied in parallel on the columns of . Note that when and are initialized with positive values, these iterates keep them positive.
IiiC Implementation details
Normalization
The input image is normalized for each pixel: for all in ,
(11) 
where denotes the th pixel. This step is important to gain invariance to illumination changes.
Initialization
We initialize the abundance matrix uniformly,
(12) 
where denotes a dimensional vector of ones. This corresponds to the maximal entropy configuration for each pixel. The entropy for each pixel will naturally decrease as a result of the optimization, but the high entropy of the initialization will have a regularization effect.
The initialization of the pixel contribution matrix is then also close to uniform. Nevertheless we, introduce a small random perturbation which is necessary to break the symmetry between the columns of (otherwise, the updates of and will keep them invariant). Concretely, the entries of
are randomly sampled according to the uniform distribution on
, . Next, they are rescaled by a factor . Finally, we apply the softmax function on each column so that the columns of belong to the simplex , for in ,(13) 
where . In practice, we observe that such an initialization leads to a matrix that is very close to an uniform initialization .
Step sizes
We use constant step sizes and , for and respectively.
(14) 
where is a value in and
is the largest singular value of the matrix
. We recover the classical convergence of gradient descent with fixed step size [nesterov_introductory_2003] up to the factor , since corresponds to the Lipschitz constant of the subproblem related to (IIIA) when minimizing w.r.t , being fixed. Having in allows us to use slightly different step sizes and yields better performance in practice. Note that our model selection procedure, presented later, will automatically choose the right parameter , thus removing the burden for the user of having to deal with an extra hyperparameter. Finally, is simply a rescaled version of to account for the matrices being of transposed dimensions:(15) 
Hyperparameters
For all experiments, we set and as it provides a good tradeoff between convergence speed and unmixing accuracy. Note also that these hyperparameters are robust to different real datasets as detailed in section IV.
IiiD Model selection procedure
As stated above, the initialization of the matrix is random, leading to different solutions for each run of the algorithm since the overall optimization problem is non convex. Besides, we allow for different stepsizes , which we draw randomly from the set in practice, Since the convergence of the algorithm is very fast (see experimental section for concrete benchmarks), we are able to provide a large diversity of solutions given a dataset by running times our method with different random seeds, while keeping the global computational complexity reasonable. A major question we address next is then how to select optimally the best solution in terms of unmixing accuracy.
For this, we take inspiration from classical model selection and sparse estimation theory [hastie_elements_2009]. First, we measure the fit of each solution in terms of residual error , choosing the norm which is known to be more robust to outliers than the mean squared error. Second, we select the set of solutions that are in the same ball park as the best solution we have obtained in terms of fit. This selection process is illustrated by the red dotted line in Figure 4, while the precise criterion is described in Algorithm 2.
From the subset of solutions with good fit, we then choose the one whose endmembers have the best incoherence, a desired property, which is classical in the theory of sparse estimation [elad_generalized_2002, gribonval_sparse_2003, mairal2014sparse]. Indeed, dictionaries (here endmembers) with more incoherence will benefit from better theoretical guarantees in terms of estimation of abundances, making it a natural criterion for model selection in the context of unmixing. Formally, the coherence is simply defined as the maximal pairwise spectral correlation between the estimated endmembers. More precisely, for the endmembers matrix, the coherence is defined as:
(16) 
To the best of our knowledge, this is the first time the coherence is used as a model selection criterion for archetypal analysis. Our experiments, see next section, show that it is highly effective.
In summary, we automatically select the model whose endmembers have the lowest maximal pairwise spectral correlation among the ones that have a good fit. This strategy is illustrated in figure 4 and described in Algorithm 2. In the experiments, the number of runs was set to 50.
Iv Experiments
We have performed experiments on six standard real datasets whose descriptions are given below.
Iva Hyperspectral data description

Samson: The Samson^{2}^{2}2downloaded at https://rslab.ut.ac.ir/data hyperspectral image is a 95x95 pixels subregion of a larger image captured using 156 bands spanning from 401 to 889 nm. Three main materials have been identified: Tree, Soil and Water. Note that we use a different ground truth from [rasti_misicnet_2022] that we selected for its sharper details on the abundances.

Jasper Ridge: The Jasper Ridge^{2}^{2}footnotemark: 2hyperspectral image is a 100x100 pixels subregion of a larger image initially captured using 224 bands spanning from 380 to 2500 nm. In total, 198 bands remain as 26 were removed as a preprocessing step due to dense water vapor and atmospheric effects. Four main materials have been identified: Tree, Dirt, Water and Road.

Urban4 and Urban6: The Urban^{2}^{2}footnotemark: 2hyperspectral image is a 307x307 pixels image collected by the Hyperspectral Digital Image Collection Experiment (HYDICE) [rickard_hydice_1993] sensor using 210 bands spanning from 400 to 2500 nm. In total, 162 bands remain as 48 were removed as a preprocessing step due to dense water vapor and atmospheric effects. There exists three versions of this dataset w.r.t. the number of endmembers. In this study, we focus on the two extremes: Urban4 contains 4 endmembers (Asphalt Road, Grass, Tree and Roof) and Urban6 contains two additional materials: Dirt and Metal, making it more challenging.

APEX: The APEX [schaepman_advanced_2015] hyperspectral image that we consider in this paper is a 111x122 pixels cropped region^{3}^{3}3downloaded at https://github.com/BehnoodRasti/MiSiCNet of a larger image captured over 285 bands spanning from 413 to 2420nm. Four main materials were identified: Road, Tree, Roof and Water.

Washington DC Mall: The Washington DC Mall (WDC) hyperspectral dataset^{3}^{3}footnotemark: 3consists in a 319x292 pixels image captured by the HYDICE [rickard_hydice_1993] sensor over 191 bands spanning from 400 to 2400 nm. Six main materials were identified: Grass, Tree, Roof, Road, Water and Trail.
According to [zhu_hyperspectral_2017] (Samson, Jasper Ridge and Urban) and [rasti_misicnet_2022] (APEX and WDC), the endmembers spectra were manually selected from the images and the ground truth abundances were set by FCLSU. Illustrations of the datasets and their ground truth endmembers are available in the supplementary material.
IvB Experimental setup
We compare our approach to five competitive methods from different unmixing categories:

Geometrical unmixing baseline: FCLSU [heinz_fully_2001] using VCA [nascimento_vertex_2005] to extract endmembers. Our implementation of the FCLSU algorithm uses the DecompSimplex routine implemented in SPAMS^{4}^{4}4http://thoth.inrialpes.fr/people/mairal/spams/. This method relies on the activeset algorithm [nocedal_numerical_1999] that enables significantly faster convergence than generic quadratic programming solvers by leveraging the underlying sparsity of the abundances as noted by [chen_fast_2014].

Deep learning unmixing: Endnet^{5}^{5}5implementation at https://github.com/burknipalsson/hu_autoencoders [ozkan_endnet_2018] using VCA [nascimento_vertex_2005] to initialize the endmembers and MiSiCNet^{6}^{6}6implementation at https://github.com/BehnoodRasti/MiSiCNet [rasti_misicnet_2022].

NMFbased blind unmixing: nonnegative matrix factorization quadratic minimum volume (NMFQMV)^{7}^{7}7implementation at https://github.com/LinaZhuang/NMFQMV_demo [zhuang_regularization_2019] using the boundary term as the quadratic minimum volume penalty and AA [cutler_archetypal_1994] using the implementation from [chen_fast_2014] developed in SPAMS.
To quantitatively evaluate the performance of the selected methods, we consider two metrics that are computed both globally and individually for each endmember. On one hand, we measure the quality of the generated abundances by means of the abundances root mean square error (RMSE) in percent between the ground truth and the estimated abundances:
(17) 
On the other hand, we assess the quality of the estimated endmembers spectra by using the spectral angle distance (SAD) in degrees between the ground truth and the generated endmembers:
(18) 
where denotes the inner product and denotes the th column of , i.e. the spectrum of the th endmember.
IvC Unmixing experiments
FCLSU [nascimento_vertex_2005, heinz_fully_2001]  Endnet [ozkan_endnet_2018]  MiSiCNet [rasti_misicnet_2022]  NMFQMV [zhuang_regularization_2019]  AA  EDAA  
Samson  Soil  11.28  11.61  6.47  13.67  6.16  5.74 
Tree  9.13  7.72  5.38  8.4  4.00  3.77  
Water  5.05  6.86  3.47  11.61  2.30  2.59  
Overall  8.88  8.97  5.25  11.44  4.44  4.24  
Jasper Ridge  Dirt  21.23  18.26  21.68  19.97  10.24  7.32 
Road  24.72  29.40  24.94  26.13  9.79  7.61  
Tree  11.20  4.00  3.41  14.55  6.32  6.63  
Water  13.61  22.38  7.07  19.81  6.77  5.69  
Overall  18.52  20.70  16.98  20.53  8.46  6.85  
Urban4  Road  30.54  12.04  10.30  20.25  11.12  8.62 
Grass  32.99  17.94  12.35  20.22  11.55  9.28  
Roof  15.40  11.28  8.01  13.29  7.28  6.37  
Tree  20.02  11.69  8.78  21.56  8.62  6.27  
Overall  25.78  13.52  10.00  19.11  9.80  7.75  
Urban6  Road  31.61  17.44  19.18  24.81  19.23  11.39 
Grass  23.62  31.47  18.84  27.97  9.73  18.61  
Roof  13.00  9.35  7.41  16.39  10.92  6.01  
Tree  16.14  15.50  11.72  19.95  14.45  9.85  
Dirt  25.06  30.67  23.95  24.31  27.76  15.94  
Metal  12.99  25.21  33.67  13.40  17.53  15.86  
Overall  21.54  23.09  16.27  21.74  17.66  13.63  
APEX  Road  33.31  29.32  32.66  31.43  34.12  16.54 
Tree  20.97  18.18  19.97  24.62  22.49  14.48  
Roof  14.15  15.88  18.42  14.73  18.54  11.27  
Water  18.03  17.47  16.88  17.99  16.90  16.83  
Overall  22.77  20.90  22.86  23.10  23.98  14.94  
WDC  Grass  30.90  27.35  31.61  34.69  31.92  32.53 
Tree  22.42  35.98  23.64  19.90  20.13  11.46  
Road  27.90  38.49  34.91  22.49  38.87  13.97  
Roof  8.71  27.04  11.98  19.81  20.89  29.31  
Water  17.76  12.94  14.72  20.34  14.06  9.63  
Trail  12.80  12.63  12.07  12.36  15.24  13.19  
Overall  21.57  27.64  23.39  22.60  25.17  20.46 
FCLSU [nascimento_vertex_2005, heinz_fully_2001]  Endnet [ozkan_endnet_2018]  MiSiCNet [rasti_misicnet_2022]  NMFQMV [zhuang_regularization_2019]  AA  EDAA  
Samson  Soil  2.76  0.61  1.21  4.90  0.78  1.64 
Tree  3.05  1.93  3.38  5.34  1.80  1.98  
Water  7.15  1.48  5.36  11.14  1.38  1.31  
Overall  4.32  1.34  3.32  7.13  1.32  1.64  
Jasper Ridge  Dirt  13.03  1.63  4.26  12.40  2.44  2.74 
Road  40.39  32.85  20.04  45.66  8.00  3.10  
Tree  11.16  1.39  1.27  14.46  4.34  4.23  
Water  13.24  3.21  4.18  14.53  3.29  2.80  
Overall  19.46  9.77  7.44  21.77  4.52  3.22  
Urban4  Road  15.40  6.40  5.73  14.51  3.73  6.01 
Grass  24.18  3.09  5.84  16.39  1.81  2.14  
Roof  47.56  3.76  16.10  36.31  15.59  10.49  
Tree  19.82  2.32  4.60  22.48  3.51  2.81  
Overall  26.74  3.89  8.07  22.42  6.16  5.36  
Urban6  Road  13.43  3.26  7.47  26.60  6.74  4.85 
Grass  22.30  4.13  10.97  21.63  6.82  2.17  
Roof  15.65  17.76  13.97  15.55  18.07  13.70  
Tree  20.70  7.72  9.99  23.31  7.18  8.92  
Dirt  69.81  17.42  19.57  23.60  11.08  13.33  
Metal  39.35  7.04  9.75  68.73  40.97  4.52  
Overall  30.21  9.56  11.95  29.90  15.14  8.64  
APEX  Road  40.23  14.46  33.02  54.53  37.32  6.83 
Tree  14.13  7.53  3.16  16.06  2.39  7.68  
Roof  8.25  4.36  11.31  7.98  10.20  7.50  
Water  7.15  2.83  6.02  9.71  2.95  2.21  
Overall  17.44  7.30  13.38  22.07  13.21  6.06  
WDC  Grass  17.40  3.54  17.46  34.36  4.59  8.11 
Tree  23.73  12.85  12.36  17.70  10.93  1.81  
Road  32.56  26.76  33.20  17.28  46.73  7.67  
Roof  34.84  13.70  28.87  44.87  43.26  50.97  
Water  4.78  1.75  1.57  19.74  1.23  1.08  
Trail  9.94  1.49  3.24  9.60  5.32  4.75  
Overall  20.54  10.02  16.11  23.93  18.68  12.40 
Tables IVC and IVC report the unmixing accuracy in terms of abundances RMSE (17) and endmembers SAD (18) on six standard real datasets. The datasets are arbitrarily ranked based on their difficulty. For a fair comparison, all methods were evaluated on the normalized data (11) which induces slight changes compared to the results reported in [rasti_misicnet_2022].
Overall EDAA obtains the best abundances RMSE on all six datasets. As the unmixing difficulty increases, the gap between EDAA and the other methods grows larger. In particular, plain AA is not able to tackle more complicated mixing scenarios as in Urban6, APEX and WDC. We argue that our model selection technique is instrumental in avoiding collapsing runs in which endmembers spectra are highly correlated. This is underlined by the overall competitive SAD results obtained by EDAA across datasets, especially on Urban6, APEX and WDC where Endnet is the only contender.
The FCLSU baseline based on VCA obtains rather poor results except for WDC. This is likely due to the pure pixel assumption. Indeed, VCA selects a single pixel to represent the endmembers spectra, which is too stringent in real scenarios where spectral variability is ubiquitous.
Despite its quadratic minimum volume boundary term, NMFQMV generally obtains worse results than the FCLSU baseline. Since it operates the unmixing in a subspace, NMFQMV cannot prevent the endmembers spectra from having negative values, which breaks the physical interpretability of the estimates and subsequently harms the unmixing performance. This phenomenon can be observed in figures 5, 7, 9 and 11 for several endmembers. The associated abundances in figures 6, 8, 10 and 12 show that NMFQMV produces maps that are too uniform and lack sparsity.
In contrast, Endnet achieves very good results in terms of SAD but tends to create overly sparse abundances which hinders its performance in terms of abundances RMSE. For instance, as can be seen in figures 11 and 12, the Road endmember is overlooked by Endnet even though EDAA recovers it neatly. Likewise, in figure 10, the Road endmember spreads too much compared to EDAA which appears closer to the ground truth.
MiSiCNet gives better unmixing results than Endnet in terms of abundances RMSE except for APEX although the SAD results falls in favor of Endnet except for Jasper Ridge. This is likely due to Endnet using the spectral angle distance on the input data in its loss which helps in achieving better SAD accuracy. However a good SAD is not sufficient to obtain good abundance maps, an area where MiSiCNet tends to shine as it incorporates spatial information by using convolutional filters and implicitly applying a regularizer on abundances.
Finally, AA is a very competitive method for the three simplest datasets yet its performance decreases drastically when dealing with more complicated mixtures. For example, in figures 5 and 6 only AA and EDAA are able to uncover the Road endmember in Jasper Ridge whereas all the other techniques fail. Yet, in figures 7 and 8, AA completely misses the Metal and Dirt endmembers in Urban6 while EDAA correctly identify all endmembers and produce reasonable abundance maps. Unlike EDAA, the random matrices initialization in AA cannot be leveraged to create an appropriate model selection procedure due to its slowness.
Additional qualitative results for the Samson and Urban4 datasets can be found in the supplementary material.
IvD Computational cost
Table IVD reports the processing times for the different unmixing algorithms on the six real datasets. NMFQMV was implemented in Matlab (2020b) while FCLSU, Endnet, MiSiCNet and the AA variants were implemented in Python (3.8). NMFQMV, FCLSU and AA run on CPU whereas Endnet, MiSiCNet and EDAA run on GPU. The processing times were obtained using a computer with an Intel(R) Xeon(R) Silver 4110 processor (2.10 GHz), 32 cores, 64GB of memory, and a NVIDIA GeForce RTX (2080 Ti) graphical processing unit. The table shows that FCLSU is clearly faster than the other unmixing techniques, however it is a supervised method that relies on an endmembers extraction algorithm. In this case, VCA is used which is also fast. The deep learning methods are the slowest techniques despite running on GPU. Interestingly, EDAA requires a lower computational cost than NMFQMV and AA although our approach consists in aggregating 50 runs obtained iteratively. For example, it takes on average 1.5 seconds for EDAA to perform a single unmixing task on the Urban6 dataset, which is three times faster than FCLSU. This demonstrates the efficiency of EDAA which allows us to use an adequate model selection procedure over several runs.
IvE Ablation study
Finally, we study the sensitivity to hyperparameters for Algorithm 1 and 2 in figure 16 where the Yaxis corresponds to the overall abundances RMSE. Given a fixed computational budget of 1000 updates, figure 16 (a) shows that the hyperparameters of EDAA are robust provided that the number of runs in the model selection is large enough (here 100). Only the two extremes configurations (, and , ) are slightly worse, especially on Urban6. For the remaining experiments, we use . In figure 16 (b), we see that the number of outer iterations is quite stable except for WDC which requires more updates (1000, i.e. ). Finally, we study the importance of the number of runs from which to select the best candidate in figure 16 (c). We observe that the model selection procedure requires at least 50 runs to obtain very good performances, hence we use in our unmixing experiments. On unknown datasets where realtime unmixing is not required, it is advised to use a large number of runs (at least 100) to ensure that the model selection procedure yields a good candidate. Detailed results for both abundances RMSE and SAD metrics are available in the supplementary material.
V Conclusion
We have proposed a new algorithm based on archetypal analysis for blind hyperspectral unmixing. We have shown how to take advantage of its efficient GPU implementation in order to develop an adequate model selection procedure to obtain stateoftheart performances. Remarkably, our simple and easytouse approach considerably improves the unmixing results on a comprehensive collection of standard real datasets. Finally, we have made our results reproducible by releasing an open source codebase which also includes the plain archetypal analysis variant presented in this study.
Acknowledgments and Funding
This project was supported by the ERC grant number 714381 (SOLARIS project) and by ANR 3IA MIAI@Grenoble Alpes (ANR19P3IA0003).
References
Supplementary material A Detailed datasets description
In this section, we provide illustrations of the unmixing datasets used in the experiments. Each dataset is described with a falsecolor RGB image alongside the normalized ground truth endmembers.
Supplementary material B Additional results
Unmixing experiments
Ablation study
We report the detailed results obtained in the ablation study. For each dataset, the overall abundances RMSE and SAD are computed for all configurations. Table IV studies the sensitivity of the inner and outer iterations , and in EDAA given a fixed computational budget. Table B studies the sensitivity of the outer iterations in EDAA when we decrease the computational budget. Finally, table B underlines the importance of the number of runs in the model selection procedure.
Samson  RMSE  4.24  4.34  3.97 

SAD  1.64  1.69  1.46  
Jasper Ridge  RMSE  6.85  5.90  6.44 
SAD  3.22  3.06  3.19  
Urban4  RMSE  7.51  7.46  7.72 
SAD  5.87  5.01  4.67  
Urban6  RMSE  13.52  13.54  13.99 
SAD  7.95  7.85  7.93  
APEX  RMSE  14.66  14.38  14.12 
SAD  6.29  6.58  6.99  
WDC  RMSE  20.47  22.81  22.89 
SAD  12.33  16.97  16.87  
Samson  RMSE  5.05  3.90  4.24  4.24 

SAD  1.94  1.43  1.64  1.64  
Jasper Ridge  RMSE  10.27  10.27  6.85  6.85 
SAD  3.73  3.73  3.22  3.22  
Urban4  RMSE  8.17  8.06  7.75  7.51 
SAD  5.43  5.41  5.36  5.87  
Urban6  RMSE  16.50  16.50  13.63  13.52 
SAD  8.16  8.16  7.92  7.95  
APEX  RMSE  23.84  24.17  14.94  14.66 
SAD  12.57  12.88  6.06  6.29  
WDC  RMSE  24.39  24.39  20.46  20.47 
SAD  11.52  11.52  12.40  12.33  