1 Introduction
Spectral data provides comprehensive knowledge about the Earth’s surface in the form of narrowband spectra. The limitations of spectral sensors induce pure materials in a scene to be mixed in different fractions. Indeed, the foremost concern is that pure materials (i.e., endmembers) and their fractions have to be estimated blindly from spectral data . Furthermore, nonlinear relations and spectral uncertainty are ubiquitous with severe scattering conditions (see Appendix A).
Even if the mixture of data can be represented with linear/nonlinear formulations, these assumptions underestimate the solution by considering pure materials as a finite set towards spectral uncertainty. To overcome the limitations, the mathematical formulation is redefined as follows in Zhou et al. (2018):
(1) 
Here is the number of endmembers, and is the spectral bands of a pixel. Also, is the additive Gaussian noise to account for possible noise sources. Note that varies for each pixel and material. Hence, this term virtually defines the uncertainty of the data.
In the scope of this paper, we will present a novel framework that broadly uses this equation by computing fractions from precomputed material signatures (i.e., only fractions are estimated from data). Specifically, we introduce a multinomial mixture kernel based on a neural network (NN) structure that obtains the fractions perpixel. The learnable parameters of this kernel are optimized with WGAN for the first time. This step achieves theoretically more accurate and stable solutions.
2 Related Work
As expected, the initial methods are based on linear models due to their simplicity. In particular, sumtoone and nonnegative constraints are conditioned for leastsquare solutions Heylen and others (2011). However, these models are insufficient to resolve complex cases such as multiple scattering effects and microscopiclevel material mixtures. Hence, bilinear models are used to handle nonlinear scenarios in which bilinear interactions of linear relations are considered in the models Halimi et al. (2011); Altmann and others (2011).
Even if the nonlinearity is retained, spectral uncertainty is still discarded in these models due to the finite set of materials. Distributionbased models represent endmember signatures as they are sampled from some distributions. In the literature, various distributions are exploited Zhang et al. (2014); Du et al. (2014)
. Notice that these models approximate each fraction to a unimodal distribution. Ultimately, this undermines the representation capacity as expected. Furthermore, the Gaussian Mixture Model (GMM) is adopted
Zhou et al. (2018). However, the complexity and assumptions limit the applicability of this method for largescale data.Recently, AutoEncoder (AE) network is redefined with additional layers and a spectral angle distance metric
Ozkan et al. (2018) to unmix data that improves robustness for spectral uncertainty. Moreover, Su et al. (2019) uses a stacked deeper AE network and a robust initialization is achieved by matrix factorization.3 Proposed Model
We first describe the basics of the proposed method (see Appendix B). The architecture is equivalent to the standard AE formulation where it encodes the input spectra to a hidden latent representation and then decodes to recompute the reconstructed version of the original spectra by optimizing the learnable parameters and
with some loss function.
3.1 Encoder Part
The dimensionality of data is one of the drawbacks, especially for unsupervised learning. The main reason is that a highdimensional space causes irregular distributions of data in the learning process. Inevitably, overfitting occurs for the parameters. For additional background, we recommend
Norouzi and Blei (2011); Jegou et al. (2011).For this purpose, highdimensional input data is mapped to a lowdimensional space where denotes the dimension of latent representations. Hence, a latent representation is computed with a set of learnable parameters as . Observe that value should be selected as
to extract sparse representations. In particular, 1D convolution layers are utilized with batch normalization and sparse activation layers in this function. Since fewer parameters are optimized, the stack of convolutions boosts the representation capacity with deep hierarchies. Details can found in Appendix
C for the NN structure with input/output relations.Later, fractions are estimated by using these latent representations with a multinomial function where
is the learnable parameters. A NN multinomial mixture model is utilized for this case. As explained, distributionbased methods reproduce materials as they are sampled from probability distributions. Initial methods employ a unimodal distribution as:
(2) 
Fractions are estimated from
by using a Gaussian distribution where
and indicate the mean and covariance matrix, respectively. In fact, this formulation underestimates the solution by fitting a unimodal distribution for each material. Therefore, Eq. 2 is restored with the Gaussian Mixture Model (GMM) by computing fractions with the mixture of Gaussian distributions:(3) 
Here, is the total number of mixture components by constraining and . Multinomial distributions are adopted in our method because fractions can be overly represented so that spectral uncertainty is adequately handled. Furthermore, should be selected as for effective representations. To this end, we mimic this approach with a set of NN models. We prefer to use NN models since the overall framework can be trained in an endtoend manner. Note that similar attempts are also made in the literature Jalali et al. (2019).
If we extend Eq. 3 to separate the term into two learnable parts, this equation is expressed as where and . As noticed, at least two sets of parameters must be optimized, thus two separate singlelayer subnetworks are employed as and .
To model these terms, the quadratic form of
is renounced in our formation. Precisely, multivariate Mahalanobis distance is normalized with a sigmoid function to calculate the probabilistic relations. To this end, the normal distribution resembles a cumulative distribution function. Similarly,
term is calculated with a NN model, and the sumtoone constraint is retained with softmax activation. In the end, all subparts of the encoder can be trained with differentiable layers.3.1.1 Parameter Updates for Multinomial Mixture Kernel
ExpectationMaximization (EM) is the most popular technique to estimate GMM parameters. This approach guarantees a stable solution by optimizing the negative loglikelihood function. Note that our loss function exploits the negative loglikelihood term for parameter updates. However, the studies show that EM approach suffers to achieve a true solution for several reasons, such as the random initialization of parameters and discontinuity. For further details, we recommend to read Kolouri et al. (2017); Jin et al. (2016).
To overcome these limitations, WGAN is employed, which provides several theoretical benefits over the classical approaches Kolouri et al. (2017). Specifically, Wasserstein distance is continuous and differentiable almost everywhere. Also, since it has a smooth loss function, the initialization is not critical. Formally, Wasserstein distance measures the distance between two distributions and where they generate variables as and . KantorovichRubinstein duality minimizes the distance as:
(4) 
where is a 1Lipschitz function. Inspired by the structure of Variational AutoEncoder (VAE) Kingma and Welling (2013), this duality can be adopted to our formulation in Eqs. 1 and 3 as:
(5) 
Here is omitted for the simplicity and value is accepted as not random. Note that this simplification is resolved in the decoder part. In addition, the generated sample from multinomial components is denoted as . With these observations, the formulation is simplified as:
(6) 
To this end, this notion mathematically proves that WGAN loss can be used for the parameter updates of terms. Furthermore, an additional gradient penalty term is used for stable solutions Gulrajani et al. (2017):
(7) 
Here, is the discriminative module and it is a 1Lipschitz function. Furthermore, is uniformly sampled data from real and reconstructed data. scales the contribution of penalty term that is set to . The details for the NN structure of discriminative module can be found in Appendix C.
3.2 Decoder Part
As stated, in Eq. 1, the formulation defines value for spectral uncertainty. However, we omit this term in Eq. 6 for the simplicity. Here, we remodel this term with extra NN layers.
For this purpose, we add an uncertainty NN model that takes the output of multinomial mixture kernel and random noise as input. This model computes an additive fullsize spectral signature. Observe that there is random noise to mimic the spectral uncertainty for different pixels in the formulation. Lastly, WGAN loss is also used for these parameters. Moreover, we utilize a refinement NN model that allows updates and nonlinearity for the precomputed endmembers.
The final output of the decoder part is reformulated by rescaling the contributions of refinement and uncertainty NN models with and , respectively:
(8) 
3.3 Implementation Details
4 Experiments
Experiments to compare the performance of the proposed method (DSCN++) with the baseline techniques are conducted on two real datasets (Urban Zhu and others (2014a) and Jasper Zhu and others (2014b) datasets) (see Appendix D for additional experiments). For this purpose, Linear Mixture Model (LMM) Heylen and Scheunders (2016), Generalized Bilinear Model (GBM) Halimi et al. (2011), PostNonlinear Mixing Model (PPNM) Altmann et al. (2012), Multilinear Mixing Model (MLM) Heylen and Scheunders (2016) and Normal Compositional Model (NCM) Stein (2003) are selected that are extensively used nonlinear and distributionbased methods in the literature. Scores of the baselines are reproduced.
Furthermore, Spatial Compositional Model (SCM) Zhou et al. (2016), DistanceMaxD (DMaxD) Heylen et al. (2011) and Sparse AutoEncoder Network (EndNet) Ozkan et al. (2018) are exploited to estimate endmember spectra from the scenes. Scores are reported in terms of Root Mean Square Error (RMSE) and tests are repeated times.
First, we test all individual steps and parameter configurations for different datasets. For clarity, EU stands for NN models that correspond to uncertainty and refinement models in the decoder part. Furthermore, WGAN indicates the use of Wasserstein GAN in parameter optimization. denotes the total number of mixture components. Experimental results are reported in Tab. 1. An important observation is that all these contributions ultimately improve performance. In particular, both WGAN and EU steps improve the performance of DmaxD methods for both datasets. A similar observation can be made for SCM on the Urban dataset and Endnet on the Jasper dataset. Lastly, the selection of value larger than the material number ultimately advances the unmixing performance.
Second, we also compare our method with the baselines. Results are demonstrated in Tab. 2. Observe that no single baseline method can continuously yield accurate results for different scenes, even if these baselines cover both nonlinear and distributionbased methods. This observation shows the superiority of our method that can capture both nonlinearity and uncertainty conditions perfectly. Of course, our method obtains stateoftheart performance compared to baselines by a large margin.
Urban Dataset  Jasper Dataset  
DMaxD  SCM  EndNet  DMaxD  SCM  EndNet  
N 
15.96 2.6  13.26 0.6  8.27 0.4  12.03 0.7  18.10 0.9  6.38 0.7 
N  15.47 1.9  13.29 0.8  8.05 0.5  12.29 0.9  18.33 1.1  6.61 0.7 
N  16.32 3.2  13.24 0.5  8.16 0.4  12.12 0.7  17.92 0.8  6.31 0.7 
N  16.56 2.1  13.23 0.8  8.04 0.3  12.27 0.6  18.03 1.1  6.62 1.1 
wo EU 
34.14 0.3  15.91 0.3  8.38 0.4  14.78 0.4  20.26 0.4  8.36 0.5 
wo WGAN  31.43 1.6  15.76 1.0  8.52 0.5  15.63 0.3  19.03 0.4  10.15 0.5 
Urban Dataset  Jasper Dataset  
DMaxD  SCM  EndNet  DMaxD  SCM  EndNet  
LMM 
31.04  28.00  27.59  15.31  19.78  22.60 
GBM  31.04  27.00  27.08  15.98  19.95  21.91 
PPNM  36.54  28.17  15.06  12.99  19.78  14.72 
MLM  31.43  16.34  9.48  16.59  20.73  8.13 
NCM  31.04  28.00  27.51  15.29  19.56  22.42 
DSCN++  15.47  13.23  8.04  12.03  17.92  6.31 

5 Conclusion
In this paper, we introduce a novel neural network (NN) framework for spectral unmixing. It is composed of 1D convolutions by addressing spectral uncertainty, which is one of the severe physical conditions frequently observed in real data. For this purpose, a novel NN layer is proposed to estimate the fractions perpixel by leveraging the multinomial mixture kernel. Wasserstein GAN is exploited for the parameter optimization, which leads to more accurate and stable performance than other loss functions. The experimental results validate that the proposed method outperforms baselines by a large margin.
References
 [1] (2012) Supervised nonlinear spectral unmixing using a postnonlinear mixing model for hyperspectral imagery. IEEE Transactions on Image Processing 21 (6), pp. 3017–3025. Cited by: §4.
 [2] (2011) Bilinear models for nonlinear unmixing of hyperspectral images. In IEEE WHISPERS, pp. 1–4. Cited by: §2.
 [3] (2014) Spatial and spectral unmixing using the beta compositional model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6), pp. 1994–2003. Cited by: §2.
 [4] (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp. 5769–5779. Cited by: §3.1.1.
 [5] (2011) Nonlinear unmixing of hyperspectral images using a generalized bilinear model. IEEE Transactions on Geoscience and Remote Sensing 49 (11), pp. 4153–4162. Cited by: §2, §4.
 [6] (2011) Nonlinear spectral unmixing by geodesic simplex volume maximization. IEEE Journal of Selected Topics in Signal Processing 5 (3), pp. 534–542. Cited by: §4.
 [7] (2011) Fully constrained least squares spectral unmixing by simplex projection. IEEE TGRS, pp. 4112–4122. Cited by: §2.
 [8] (2016) A multilinear mixing model for nonlinear spectral unmixing. IEEE transactions on geoscience and remote sensing 54 (1), pp. 240–251. Cited by: §4.
 [9] (2019) Efficient deep approximation of gmms. In Advances in Neural Information Processing Systems, pp. 4552–4560. Cited by: §3.1.
 [10] (2011) Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33 (1), pp. 117–128. Cited by: §3.1.
 [11] (2016) Local maxima in the likelihood of gaussian mixture models: structural results and algorithmic consequences. In Advances in Neural Information Processing Systems, pp. 4116–4124. Cited by: §3.1.1.
 [12] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.3.
 [13] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.1.1.
 [14] (2017) Sliced wasserstein distance for learning gaussian mixture models. arXiv preprint arXiv:1711.05376. Cited by: §3.1.1, §3.1.1.

[15]
(2011)
Minimal loss hashing for compact binary codes.
In
Proceedings of the 28th international conference on machine learning (ICML11)
, pp. 353–360. Cited by: §3.1.  [16] (2018) Endnet: sparse autoencoder network for endmember extraction and hyperspectral unmixing. IEEE Transactions on Geoscience and Remote Sensing 57 (1), pp. 482–496. Cited by: §2, §3.3, §4.
 [17] (2003) Application of the normal compositional model to the analysis of hyperspectral imagery. In Advances in techniques for analysis of remotely sensed data, 2003 IEEE Workshop on, pp. 44–51. Cited by: §4.
 [18] (2019) DAEN: deep autoencoder networks for hyperspectral unmixing. IEEE Transactions on Geoscience and Remote Sensing 57 (7), pp. 4309–4321. Cited by: §2.
 [19] (2014) PSOem: a hyperspectral unmixing algorithm based on normal compositional model. IEEE Transactions on Geoscience and Remote Sensing 52 (12), pp. 7782–7792. Cited by: §2.
 [20] (2016) A spatial compositional model for linear unmixing and endmember uncertainty estimation. IEEE Transactions on Image Processing 25 (12), pp. 5987–6002. Cited by: §4.
 [21] (2018) A gaussian mixture model representation of endmember variability in hyperspectral unmixing. IEEE Transactions on Image Processing 27 (5), pp. 2242–2256. Cited by: Appendix D, §1, §2.
 [22] (2014) Spectral unmixing via dataguided sparsity. IEEE TIP, pp. 5412–5427. Cited by: §4.
 [23] (2014) Structured sparse method for hyperspectral unmixing. ISPRS Journal of Photogrammetry and Remote Sensing, pp. 101–118. Cited by: §4.
Appendix A Appendix For Spectral Uncertainty
As explained, spectral images can be significantly affected by variations in atmospheric, illumination, or environmental conditions. Hence, spectral uncertainty ultimately influences the mixture of materials, especially for real data. Fig. 1 illustrates the pure material samples from real datasets. Observe that this uncertainty can be tedious for spectral data.
Appendix B Appendix For Model Flow
The overall flow of the proposed method is illustrated in Fig. 2. Our model takes spectral data as input and computes a latent representation . Later, this representation is modeled with a multinomial mixture model to estimate fractions . These fractions are fed to two NN models with the matrix multiplication of precomputed endmembers in the decoder part. In the end, individual outputs are summed, and the input is reconstructed.
Appendix C Appendix For NN Structures
In our method, two deep NN models are primarily utilized. For the encoder part, a stack of 1D convolution layers is used to transform highdimensional spectral data to a lowdimensional space, as illustrated in Tab. 3
. Observe that the order of convolution and normalization layers is modified. This modification enhances the selectivity of representations by feeding more statistical knowledge to the convolution layers. Furthermore, Inception module that has multiple kernels of different sizes is also integrated. Hence, a rich representation from multiple kernel sizes can be computed.
Index  Index of Inputs  Operation(s)  Output Shape 

(1)    Input Spectra  D 
(2)  (1)  Conv1D(10, 21, 1)  D 10 
(3)  (2)  PReLU + AvgP(5) + Nrm  D/5 10 
(4)  (3)  Conv1D(10, 3, 1)  D/5 10 
(5)  (3)  Conv1D(10, 5, 1)  D/5 10 
(6)  (3)  Conv1D(10, 7, 1)  D/5 10 
(7)  (4),(5),(6)  Concat  D/5 30 
(8)  (7)  PReLU + AvgP(2) + Nrm  D/10 30 
(9)  (8)  Conv1D(5, 3, 1)  D/10 10 
(10)  (9)  PReLU + AvgP(2) + Nrm  D/20 10 
(11)  (10)  Linear(M) + LReLU  M 
The NN Structure of the encoder part. The notation of Conv1D(W,K,S) indicates 1D convolution whose output dimension, kernel size and stride are W, K and S, respectively. Furthermore, Nrm and PReLU denote normalization and activation layers. AvgP(K) corresponds to the average pooling layer with kernel size and stride K.
To demonstrate the superiority of this model, Fig. 3 visualizes the distributions of input data and latent representation transformed by our method. Note that colors are discriminative based on their real fractions. Our method can blindly achieve more separable space for spectral data.
Moreover, we utilize a deep NN model for the discriminative part of WGAN. This structure is also given in Tab. 4. In particular, PatchGAN is adopted in our method. Hence, local patches are fed to the discriminative model than fulldimensional data. Local characteristics (i.e., local spectral uncertainty) are practically captured.
Index  Index of Inputs  Operation(s)  Output Shape 

(1)    Input Spectra  D 
(2)  (1)  Conv1D(5, 21, 5)  D/5 5 
(3)  (2)  Nrm + PReLU  D/5 5 
(4)  (3)  Conv1D(10, 5, 2)  D/10 10 
(5)  (4)  Nrm + PReLU  D/10 10 
(6)  (5)  Conv1D(20, 5, 2)  D/20 20 
(7)  (6)  Nrm + PReLU  D/20 20 
(8)  (7)  Linear(5)  D/20 5 
Appendix D Appendix For Additional Experiments on Synthetic Data
Our method is also tested on a synthetic dataset released for the experiments in [21]. The dataset consists of four constituent materials from the ASTER spectral library: limestone, basalt, concrete and asphalt. For the scene, limestone is set as the background material; the rest is randomly placed to the image by maintaining the Gaussianshaped distributions.
Even if synthetic data is conventional to evaluate the methods, it is not suitable to have all adverse cases exhibited on real data. Hence, it is hard to simulate nonlinearity and spectral uncertainty on the data. In particular, additive noise is inserted to simulate spectral uncertainty, which cannot reflect the actual physical conditions. Fig. 4 illustrates samples from two pure materials on this synthetic dataset. Unlike real data, these samples lead to corrupted signatures.
Experimental results are reported in Tab. 5. If we do not use EU and WGAN steps, the model achieves the best performance. This result makes sense since spectral uncertainty is simulated as random additive noise. Hence, WGAN loss eventually learns this noise characteristics instead of the real spectral uncertainty. This characteristic drastically reduces the performance and undermines the model. However, as explained, the actual capacity of these steps on real data is obvious.
We also compare our method with baselines. In Tab. 6, these results show that the proposed method performs better even if the full capacity of model is not used.
DMaxD  SCM  EndNet  
N 
8.32 1.4  4.47 0.7  9.20 1.4 
N  8.15 1.4  4.36 0.9  9.14 1.0 
N  8.18 1.0  3.97 0.5  9.06 1.1 
N  7.55 1.1  4.42 0.6  9.51 1.2 
wo EU  5.43 0.3  3.81 0.9  6.43 0.4 
wo WGAN  5.23 0.4  3.22 1.2  8.88 0.3 

DMaxD  SCM  EndNet  
LMM 
5.17  3.36  17.62 
GBM  5.62  3.26  17.39 
PPNM  7.66  4.01  9.34 
MLM  6.52  3.85  6.71 
NCM  4.09  3.85  11.17 
DSCN++  5.23  3.22  6.43 

Comments
There are no comments yet.