Spectral unmixing concerns the techniques used to decompose the spectrum of a mixed signal into a set of source spectra and their corresponding activations. The activations give the proportion of each source spectrum present in the mixed spectrum. More specifically, blind spectral unxming consists in estimating the source spectra with limited prior information; usually, the only known information is the number of sources. Spectral unmixing techniques are applied in many fields such as in audio and image processing. In this paper, we introduce a flexible framework to perform spectral unmixing by fusing the information coming from multiple data with different resolutions. We showcase its efficiency on two major applications: audio spectral unmixing and fusion of hyperspectral and multispectral images. For these applications, the input data usually results from a trade-off between two adversarial dimensions. Let us illustrate this assertion in the particular case of audio spectral unmixing that commonly uses the simultaneous time-frequency representation of an input mixed signal. The simultaneous time-frequency representation is here computed with the short-time Fourier transform (STFT). The STFT consists in dividing the time signal into short segments of the same length, in multiplying the segments element-wise by a window function of size, and then in computing the Fourier transform of each windowed segment (only half of the frequency coefficients can be retained thanks to the Hermitian symmetry). Therefore, from an input signal , we obtain a complex matrix called spectrogram. The number of rows corresponds to the frequency resolution. Letting be the sampling rate, consecutive rows correspond to frequency bands that are Hz apart. Choosing a particular value for the window length is equivalent to fixing the frequency and the time resolutions. A larger window implies a higher frequency resolution but it comes at the cost of lower temporal resolution. Moreover, the trade-off between detailed frequency and temporal information is due to the fundamental physical limit known as the Heisenberg uncertainty principle. A natural solution is to consider multiple audio spectrograms and fuse them into a product with both high frequency and high temporal resolutions. A similar idea has been studied in the hyperspectral imaging community; see for example [1, 2, 3, 4, 5, 6, 7, 8, 9]
. Indeed, hyperspectral (HS) images have high spectral resolution (typically between 100 and 200 spectral bands) but low spatial resolution, whereas the opposite is true for multispectral (MS) images. The fusion of HS and MS data, which we refer to as the HS-MS fusion problem, gives the possibility to produce fused data with both high spectral and high spatial resolutions, called the super-resolution (SR) image. The SR image can improve the precision of the unmixing.
Over the last two decades, nonnegative matrix factorization (NMF)  has emerged as a useful method to decompose the spectrogram of a mixed audio signal [12, 13, 14, 15]. Indeed, power or amplitude spectrograms of common audio signals, denoted here by such that or , show two fundamental properties: sparsity and redundancy (due to the repetition of the spectral signature of the elementary sources that compose the signal). Mathematically, we say that the spectrogram has a low-rank structure . Similar observations can be made for multi-bands images as the spectral signature of each pixel can be seen as a mixture of a few spectral signatures of pure materials present in the image and referred to as endmembers. These two fundamental properties led spectral unmixing problems to integrate low-rank approximation techniques such as NMF, which is well known to extract sparse and meaningful estimates for the source spectra [17, 18]. Given the nonnegative matrix and a positive integer (the factorization rank), NMF aims to compute a nonnegative matrix with columns and a nonnegative matrix with rows such that . The matrix is referred to as the dictionary matrix whose columns correspond to the spectral content of a source estimate. The matrix is the activation matrix specifying if a source is active at a certain time frame (or at a certain pixel location in the image for the HS-MS fusion prolem) and in which intensity.
Contribution and outline
In this paper, we propose multi-resolution -NMF (MR--NMF) for fusing the information coming from multiple audio amplitude spectrograms with different frequency resolutions. As far as we know, it is the first time such an approach is used in this context. High-frequency-resolution data and high-temporal-resolution data are jointly factorized by MR--NMF, taking into account the linear mixture model (3). Based on these audio spectrograms, we are able to generate a solution that exploits the spectral accuracy from the high-frequency-resolution data and a solution for exploiting the temporal accuracy from the high-temporal-resolution data. Both frequency and temporal reconstruction qualities are evaluated by numerical simulation using synthetic audio signals. We also show that MR--NMF is flexible and can be used in other applications. In particular we motivate and show its efficiency to deal with the HS-MS fusion problem. As far as we know, it is the first time that a HS-MS fusion model and algorithm tackles any -divergence. Most previous works focused on the case , that is, least squares, which assumes Gaussian noise as a prior. As we will see, considering -divergences for allows to obtain much better solutions in the presence of non-Gaussian noise. In particular, we show that in the presence of Poisson noise, using
(Kullback-Leibler divergence) outperforms standard approaches.
This paper is organized as follows. Section II details the problem formulation, in particular the mixture model, the motivation for using NMF, and our proposed MR--NMF approach. Section III describes the algorithm developed to tackle this problem. Section IV (resp. Section V) presents numerical results on audio datasets (resp. on the HS-MS fusion problem). MR--NMF is shown to be competitive with state-of-the-art techniques, and allows to obtain solutions with both high spectral resolution and high temporal (resp. spatial) resolution.
Ii Problem Formulation
The aim of multi-resolution unmixing, or more generally data fusion, is to estimate non-observable data with high resolutions in adversarial dimensions from observable data that show high resolution in one dimension only. In this paper, we propose a flexible framework that can be easily adapted to many applications. More particularly, we consider the blind audio spectral unmixing and the HS-MS fusion problem.
In the case of the audio spectral unmixing, the multi-resolution unmixing is based on high-frequency-resolution (HRF) data and low-frequency-resolution (LRF) data. In this paper we limit the discussion to the use of two input audio amplitude spectrograms and . We assume they are computed with STFTs based on a common input audio signals . The windows lengths are respectively and such that with where is usually referred to as the frequency downsampling ratio. Sizes and denote the number of time frames of LRF and HRF spectrograms, respectively, with as per the trade-off between frequency and temporal resolutions. Given and , we are searching for an amplitude audio spectrogram that has both high frequency and high temporal resolutions. We suppose in this paper that the observed LRF spectrogram is a frequency downsampled version of , that is,
where is the frequency downsampling operator. Similarly, the observed HRF spectrogram is a temporally downsampled version of , that is,
where is the temporal downsampling operator. For the HS-MS fusion problem, we assume that a high spatial resolution image and a high spectral resolution image are available to reconstruct the target image with high-spectral and high-spatial resolutions , the SR image. These images result from the linear spectral and spatial degradations of the SR image , given by the same equations (1) and (2). In this context, the operator from (1) is the relative spectral bandpass responses from the super-resolution image to the MS image, while the operator introduced in (2) specifies the spatial blurring and down-sampling responses that result in the HS image. In the context of HS-MS fusion, the operators and can be acquired either by cross-calibration  or by estimations from the HS and MS images [10, 20]. As far as we know, in the context of audio spectral unmixing, it is unknown how to estimate and , and we will propose an optimization strategy to do so.
Ii-1 Linear Spectral Mixture Model
A linear spectral mixture model is commonly used for the audio spectral unmixing or HS unmixing due to its physical meaning and its mathematical simplicity; see [21, 22, 23] for detailed reviews. Under this model the input data matrix has the form
where is the dictionary matrix and
is the activation matrix. In the case of audio spectral unmixing, a column of an amplitude audio spectrogram is then supposed to be a nonnegative linear combination of the amplitude spectrograms of the sources. For HS-MS data fusion, the linear mixture model makes senses as it exploits a natural feature of multi-band images: each spectral vector representing a pixel can be seen as a linear mixture of several spectral signatures, called endmembers, that correspond to the reflectance of a material at different wavelengths of light. The matrixis referred to as the endmember matrix whose columns are spectral signatures, and is the abundance matrix whose entries are the abundances of the endmembers in the pixels.
Ii-2 Multi-Resolution -Nmf
In this section, we present a new approach for spectral unmixing based on the minimization of -divergences. To solve the multi-resolution problem and estimate the signal , we need to estimate and . From (4) and (5), we propose to solve the following optimization problem, which we refer to as MR--NMF:
where means that is component-wise nonnegative, is a positive penalty parameter, and with an appropriate error measure between the scalars and . In the general case, MR--NMF is also able to estimate the downsampling operators and , which is a contribution. Note that when the downsampling operators and are known, the objective function is minimized over and only, see section III for more details. Note also that in general and have a particular structure where some entries are fixed to zero; see Section II-3. As our algorithm will rely on multiplicative updates, entries initialized at zero remain zero in the course of the optimization process. In audio spectral unmixing, a common measure of fit is the -divergence, denoted , and equal to
where and are nonnegative scalars. For , this amounts to the standard squared Euclidean distance since . For and , the -divergence corresponds to the Kullback-Leibler (KL) divergence and the Itakura-Saito (IS) divergence, respectively. The error measure should be chosen depending on the noise statistic assumed on the data. The Euclidean distance assumes i.i.d. Gaussian noise, KL divergence assumes Poisson noise, and the IS divergence assumes multiplicative Gamma noise . KL and IS divergences are usually considered for amplitude spectrogram and power spectrogram, respectively. Both KL and IS divergences are more adapted to audio spectral unmixing than Euclidean distance; see  and . The Euclidian distance is the most widely used to tackle the HS unmixing problem as well as the HS-MS data fusion problem. However, when no obvious choice of a specific divergence is available, finding the right measure of fit, namely the value for , can be seen as a model selection problem . Therefore an objective function with an adjustable
is fully justified. Moreover, divergences are often log-likelihoods in disguise and therefore choosing a divergence boils down to choosing a noise statistic as mentioned earlier. For example, sensors embedded in cameras can be seen as photon counters, and the Poisson distribution makes particular sense for count data. This assumption supports once again our motivation to consideran adjustable, in this case with . Based on numerical experiments, we will show that the KL-divergence is also well suited for the HS-MS fusion problem.
Ii-3 Downsampling operators
As mentioned earlier, for HS-MS data fusion, downsampling operators can usually be estimated and hence are assumed to be known. In the context of audio spectral unmixing, the downsampling operators in (6) are unknown. Different structures for downsampling operators and have been tested, and we report here the form for that shows the best results in practice, while is obtained in the same way. Let us illustrate this on the simple example of the frequency downsampling of a matrix with a downsampling ratio . A possible structure for the matrix is as follows:
This downsampling operator performs a weighted arithmetic mean over a set of rows of the matrix it is applied on; here, is downsampled as . The structure of the matrix relies on two parameters: and . As mentioned earlier, corresponds to the downsampling ratio. Each row of has at least non-zero values that correspond to the rows in that are combined to form the rows of ; see the underlined entries of above. The parameter controls the overlap between the linear combinations of the rows of . In the example above, and one positive value is added to the left and the right end of the non-zero entries corresponding to the downsampling parameter; see the bold entries in matrix above. These positive values allow an overlap (or coupling) within the downsampling process. If we consider two consecutive frequency bins that result from a downsampling operation, it is reasonable to consider that they share common frequency bins in the original frequency space. We imposed to avoid too much non-physical coupling. This limitation is also based on numerical experiments that show a degradation of the results when exceeds . When , the downsampling operator performs a weighted arithmetic mean over rows without overlapping. Note that these downsampling operators are sparse nonnegative matrices.
Iii Algorithm for Mr--Nmf
Most NMF algorithms are based on an iterative scheme that alternatively update for fixed and vice versa, and we adopt this approach in this paper. Note that the -divergence is only convex with respect to its second argument when . The goal in this section is to derive an algorithm to solve MR--NMF (6).
For and fixed, let us consider the subproblem in :
As we will see, the subproblems in , and for the other variables fixed are similar. To tackle this problem, we follow the standard majorization-minimization framework . We start by constructing an auxiliary function denoted which is a tight upper-bound for the objective at the current iterate. An auxiliary function for at point is defined as follows.
The function is an auxiliary function for at if the conditions for all and are satisfied.
The optimization problem with is then replaced by a sequence of simpler problems for which the objective is .The new iterate is computed by minimizing the auxiliary function at the previous iterate , either approximately or exactly. This guarantees to decrease at each iteration.
Let , and let be an auxiliary function for at . Then is non-increasing under the update .
By definition, . ∎
The most difficult part in using the majorization-minimization framework is to design an auxiliary function that is easy to optimize. Usually such auxiliary functions are separable (that is, there is no interaction between the variables so that each entry of can be updated independently) and convex. We will construct an auxiliary function for from (7) by a positive linear combination of two auxiliary functions, one for each term of .
Iii-1 Separable auxiliary function for the first term of
The function separates into , where and are the th column of and respectively. Therefore we only consider the optimization over one specific column of and of . To simplify notation, we denote the current iterate as . We now use the separable auxiliary function presented in  which consists in majorizing the convex part of the -divergence using Jensen’s inequality and majorizing the concave part by its tangent (first-order Taylor approximation). Note that the divergence can always be expressed as the sum of a convex, concave, and constant part, such that:
where is convex function of , is a concave function of and is a constant of , see  for the definition of these terms for different values of .
By denoting by and by with entries for , the auxiliary function for at is given by:
Therefore the function
is an auxiliary function (convex and separable) for at where is given by (8).
Iii-2 Separable auxiliary function for the second term of
: Let and let us use a result from :
Iii-3 Auxiliary function for multi-resolution -Nmf
Iii-4 Algorithm for MR--Nmf
Given the convexity and the separability of the auxiliary function, the optimum is obtained by canceling the gradient. The derivative of the auxiliary function with respect to a specific coefficient , with index identifying the same column specified by in (8) and specified by in (10), is given by:
For , (11) becomes:
We set (12) to zero and get the following closed-form solution for the coefficient of :
The generalization of the closed-form solution (13) for any for is given in Table I in matrix forms. Table I gives also the closed-form solution for which is derived with the same rationale. As mentioned in Section II-2, in the general case, operators and are unknown. We propose here to derive updates for and so that these operators can be learned from the data and sensible estimates for and during the optimization scheme. The updates for and have been derived in a similar fashion as for matrices and . For the update of for instance, one has simply to note it corresponds to the update of where we only keep the terms multiplied by and where the roles of , , and are exchanged with , , and , respectively.
where (resp. ) is the Hadamard product (resp. division) between and , is the element
|-wise exponent of , for , for and for .|
Loop 1: matrices and are alternatively updated with downsampling operators and kept fixed so that we obtain good estimates for and . The updates are performed for a maximum number of iterations imposed by the parameter MAXITERL1.
Loop 2: matrices , , and are alternatively updated so that the algorithm learns the downsampling operators during the optimization process. The maximum number of iterations for loop 2 is controlled by parameter MAXITERL2.
For the HS-MS fusion problem, the operators and are usually known and therefore the parameter MAXITERL2 is set to zero. In this paper, the second optimization loop is considered only for the audio spectral unmixing application since the operators and are unknown.
After and are updated, we normalize such that for all , and we normalize accordingly so that
remains unchanged. This normalization is commonly used for NMF-based methods and is mainly performed to remove the scaling degree of freedom. As a convergence condition, we consider the relative change ratio of the cost functionfrom (6), namely where is a given threshold, and is the iteration counter. We also stop the optimization process if the number of iterations exceeds the predefined maximum number of iterations.
It can be verified that the computational complexity of the MR--NMF is asymptotically equivalent to the standard MU for -NMF, that is, it requires operations per iteration.
Iii-5 Parallel computing
We remark that some of the most computationally intensive steps of the proposed algorithm can be easily ran onto a parallel computation platform. Indeed, the complexity of our multiplicative updates detailed in Table I is mainly driven by the matrix products in which matrix is involved. On Matlab for example, we can easily take of advantage of a GPU compatible with CUDA libraries by simply transforming usual arrays into GPU arrays. In our case, on a desktop equipped with a Intel Core™ i7-8700 CPU and a GeForce RTX 2070 Super GPU, the runtime can be up to 5 times shorter.
Iv Numerical experiments on audio data sets
In this section, we perform numerical experiments to validate the effectiveness of MR--NMF on two synthetic audio data sets.
Iv-a Experimental setup and evaluation
The proposed technique for joint-factorization of amplitude audio spectrograms is applied to two synthetic audio samples. A dedicated test procedure is presented in Section IV-A2 in order to evaluate the performance of MR--NMF based on quantitative criteria detailed in subsection IV-A3. The first audio sample is the first measure of “Mary had a little lamb” and composed of three notes; , and . The signal is 5 seconds long and has a sampling frequency Hz yielding samples.
The second audio sample, inspired from , is a piano sequence played from the score given in Figure 2. The piano sequence is composed of four notes; , , and , played all at once in the first measure and then played by pairs in all possible combinations in the remaining measures. The signal is 14.6 seconds long and has a sampling frequency Hz yielding samples.
Iv-A2 Experimental comparison
This section describes the test procedure elaborated to evaluate the quality of the results obtained with MR--NMF (6) that jointly factorizes two audio spectrograms and . In the following, matrices and stand for the solutions computed with Algorithm 1 that solves MR--NMF (6). We aim at showing that the factor has a high frequency resolution whereas the matrix has a high temporal resolution. To achieve this goal, we compare to a dictionary matrix denoted computed with a baseline -NMF approach that factorizes the high frequency spectrogram only. The baseline -NMF applied on solves the following optimization problem:
Due to the trade-off between detailed frequency and temporal information, the activation matrix shows a low temporal resolution. To compare the accuracy of the solutions and , we need to have access to an oracle matrix that is the reference for the comparison. For instance, for the dataset 1, each column of is supposedly the ”true” spectral signature of each of the three notes; , and . We estimated as follows:
We synthetically generate three audio signals and each one contains the sequence of one note in particular.
Based on the three audio signals, we generate three amplitude spectrograms that have high frequency resolution with the same window size as the one used to generate .
For each amplitude spectrogram, we perform a rank-1 NMF. The resulting -dimensional vectors are concatenated to form the oracle matrix .
We show the accuracy of with a similar procedure; is compared to an activation matrix obtained by solving
using multiplicative updates. The oracle matrix , that is, the reference for the comparison, is computed by performing three independent rank-1 NMF on three amplitude spectrograms that have high temporal resolution, all generated with the same window size as the one used to generate .
Iv-A3 Performance Evaluation
This section presents the qualitative criteria for evaluating the performance of the solutions obtained with Algorithm 1. We compute the following measures of reconstruction.
Activation matrices: in order to avoid the scaling and permutation ambiguities inherent to the considered NMF models, we first normalize in L-1 norm the rows of the actications matrices and solve an assignment problem w.r.t. the oracle matrix . The quality of the activation matrix is compared to w.r.t.
by computing the following signal-to-noise ratios (SNR): for all,
where and , and
The higher the SNRs (16) and (17), the better is the estimation for the activation matrix.
Dictionary matrices: The quality of the dictionary matrix is evaluated in the same fashion, except that the normalization is performed by columns.
In this section, we use the following setting:
100 random initializations for and for each NMF.
the window lengths are set to 1024 (23ms) and 4096 (93ms), then the downsampling ratio is equal to 4. For the generation of and , parameter is set to 2.
, and we consider the amplitude spectrograms as the input data.
we use in all our experiments.
Iv-B1 Dataset 1: ”Mary had a little lamb”
reports the average SNR, the standard deviation and the best SNR computed for the activations and dictionary vectors obtained with the models described in SectionIV-A2 over the 100 initializations. As it can be observed, activations are slightly better than activations , and with a significant lower standard deviation for each note. The results for the dictionary are even more conclusive; MR--NMF outperforms baseline NMF (14) for which the SNR (best case) can be up to two times larger. Moreover, the standard deviations of MR--NMF are significantly lower than those obtained with baseline NMF (14). It appears that the second term in the objective function in (6) acts as a regularizer so that MR--NMF is more robust to different initializations.
Figure 3 shows the dictionary matrices , , and . For more clarity, the frequency range is limited to 2 kHz. This limited range includes all the most significant peaks in terms of magnitude. We observe that all the frequency peaks are accurately estimated by MR--NMF for each note. Figure 3 also integrates the dictionary matrix to highlight the impact of using baseline NMF (15) that uses a higher temporal resolution.
We conclude that MR--NMF is able to obtain more robust and more accurate results than baseline -NMFs that factorize a single spectrogram.
|Note||Activation SNR’s (dB)||Basis SNR’s (dB)|
|average std||best||average std||best||average std||best||average std||best|
|12.33 0.17||12.74||3.89 8.99||12.19||21.35 1.77||22.66||7.95 7.84||12.38|
|14.50 0.08||14.62||8.57 6.44||14.38||21.25 0.35||21.61||14.71 6.06||18.23|
|19.68 0.04||19.82||15.28 5.06||19.74||22.71 0.36||23.02||19.36 2.02||20.66|
Iv-B2 Dataset 2
In this section we report the numerical results obtained for dataset 2, using MAXITERL1=500 and MAXITERL2=1500 for Algorithm 1.
Table III reports the average SNR, the standard deviation and the best SNR computed for activations and dictionary vectors obtained with the methods described in IV-A2 over 100 initializations. We observe that:
MR--NMF provides results that show high resolutions in both frequency and temporal domains,
the regularization effect of MR--NMF w.r.t. baseline NMFs is less stunning than observed for dataset 1. However the standard deviations obtained with MR--NMF for the dictionary are significantly lower than those obtained with the baseline NMFs.
by looking more accurately at the results for the dictionary, MR--NMF globally performs better than baseline NMFs. For the activations, baseline NMFs perform slightly better than MR--NMF for three scores, with an improvement of at most 1.9% (for the score).
|Note||Activation SNR’s (dB)||dictionary SNR’s (dB)|
|average std||best||average std||best||average std||best||average std||best|
|11.98 0.01||12.03||12.17 0.01||12.17||16.24 0.02||16.43||16.29 0.26||16.42|
|9.54 0.02||9.57||9.43 0.01||9.43||9.41 0.02||9.42||8.61 0.72||8.73|
|14.81 0.01||14.82||14.92 0.01||14.92||16.20 0.06||16.33||15.24 2.37||15.64|
|11.23 0.01||11.32||11.52 0.01||11.54||16.47 0.05||16.50||16.76 0.99||16.93|
V Numerical experiments on HS-MS fusion
In this section, we perform numerical experiments to validate the effectiveness of MR--NMF on the HS-MS fusion problem.
V-a Test setup and criteria
V-A1 Test data
The proposed MR--NMF algorithm is tested on semi-real datasets against several methods and algorithms widely used to tackle the HS-MS data fusion problem, namely GSA , CNMF , HySure , FUMI , GLP , MAPSMM , SFIM  and Lanaras’s method . In a nutshell: GSA, SFIM and GLP are pansharpening-based methods, the remaining methods belong to subspace-based methods that can be splitted into unmixing methods (CNMF, Lanaras’s method and HySure) and Bayesian-based approaches (FUMI, MAPSMM) .
All the algorithms are implemented and tested on a desktop computer with Intel Core firstname.lastname@example.orgGHz CPU, Geforce RTX 2070 Super GPU and 32GB memory. The codes111https://naotoyokoya.com/Download.html are written in MATLAB R2018a. The implementation for benchmarked algorithms comes from the comparative review of the recent literature for HS and MS data fusion detailed in .
We consider the following real HS datasets:
HYDICE Urban: The Urban dataset222http://lesun.weebly.com/hyperspectral-data-set.html consists of 307307 pixels and 162 spectral reflectance bands in the wavelength range 400 nm to 2500 nm. We extract a 120120 subimage from this dataset.
HYDICE Washington DC Mall: this dataset333https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html has been acquired with HYDICE HS sensor over the Washington DC Mall and consists of 1208307 pixels and 191 spectral reflectance bands in the wavelength range 400 nm to 2500 nm. We extract a 240240 subimage from this dataset.
AVIRIS Indian Pines: this dataset has been acquired with NASA Airborne Visible/Infrared Imaging (AVIRIS) Spectrometer  over the Indian Pines test site in North-western Indiana and consists of 145145 pixels and 200 spectral reflectance bands in the wavelength range 400 nm to 2500 nm. We extract a 120120 subimage from this dataset.
Note that entries of the datasets are uncalibrated relative values, also referred as Digital Numbers (DN). As the goal is to fuse data and not to perform HS unmixing and classification, we do not convert these values into reflectances.
V-A2 Test procedure
In this paper we consider semi-real data by conducting the numerical experiments based on the widely used Wald’s protocol . This protocol consists in simulating input MS and HS images from a reference high-resolution HS image. In this paper, MS image and HS image have been derived from high-resolution HS image through the models (4) and (5) respectively. Let us recall that the operator from (1) designates the relative spectral responses from the super-resolution image to the MS image. In other words, it defines how the satellite instruments measure the intensity of the wavelengths (colors) of light. We generate a six-band MS image by filtering the reference image with the Landsat 4 TM-like reflectance spectral responses444https://landsat.usgs.gov/spectral-characteristics-viewer. The Landsat 4 TM sensor  has a spectral coverage from 400 nm to 2500 nm so that it is consistent with the spectral coverage of the datasets.
The operator (5) corresponds to the process of spatial blurring and downsampling. The high spectral low spatial resolution HS image is generated by applying a 1111 Gaussian spatial filter with a standard deviation of 1.7 on each band of the reference image and downsampling every 4 pixels, both horizontally and vertically. The HS and MS images are finally both contaminated with noise. The level of noise is usually characterized by the SNR expressed in dB. Here, and refer to the noise level for the MS and HS images respectively. In this paper, we apply the same level of noise for each spectral band. Let us give more insights on the last step of the MS image generation: where the noise matrix is constructed as follows: we introduce for , some binary coefficients, and
Each entry of is generated using the Poisson distribution of parameter for all , where is a noiseless low-rank approximation of that is computed separately. More precisely, by setting where
is all-zero matrix, a solutionfor MR--NMF (6) is first computed with Algorithm 1, and the parameter for the Poisson distribution is defined as .
Each entry of
We set with . For example, if we fix , is a MS image contaminated with 5.62% of noise (that is, ) and projected onto the nonnegative orthant. The noise matrix is obtained in the same way.
The benchmarked algorithms listed in V-A1 are configured as recommended in the comparative review  with the following variations:
The number of endmembers is a key parameter for unmixing-based methods. For MR--NMF, CNMF, Lanaras’s method and HySure, is set to the 5 and 6 for HYDICE Urban and HYDICE Washington DC Mall datasets respectively as done in . For the Indian Pine dataset, as in .
The benchmarked algorithms are stopped when the relative change of the objective function is below or when the number of iterations exceeds 500. For algorithms such as CNMF that include outer and inner loops, we contacted the authors to set up the best balance for the maximum number of inner () and outer () loop iterations to fairly compare the methods, the following couples of values are considered: and and and . The couple of values that gives the best results for each dataset is considered in section V-B, that is and .
The matrix is known for all algorithms that make use of it. For MR--NMF, it means we use MAXITERL1=500 and MAXITERL2=0.
Finally, let us summarize the initialization strategy:
MR--NMF uses random nonnegative initializations for and .
CNMF starts by unmixing the HS image using VCA  to initialize the endmember signatures,
SISAL  is used to initialize the endmembers for Lanaras’s method.
Four variants of the MR--NMF are considered, namely , , and . We test the algorithms under a scenario where no noise is added (that is, = 0), and a scenario where noise is added so that the SNRs for the noise terms in and are and .
V-A3 Performance evaluation
In order to assess the fusion quantitatively, we use the following five complementary and widely used quality measurements:
Peak SNR (PSNR): the PSNR is used to assess the spatial reconstruction quality of each band. It corresponds to the ratio between the maximum power of a signal and the power of residual errors. A larger PSNR value indicates a higher quality of spatial reconstruction.
The root-mean-square error (RMSE): RMSE is a similarity measure between the super-resolution image and the fused image . The smaller the RMSE is, the better the fusion quality is.
Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS): ERGAS provides a macroscopic statistical measure of the quality of the fused data. More precisely, ERGAS calculates the amount of spectral distortion in the image . The best value is at 0.
Spectral Angle Mapper (SAM): SAM is used to quantify the spectral information preservation at each pixel. More precisely, SAM determines the spectral distance by computing the angle between two vectors of the estimated and reference spectra. The overall SAM is obtained by averaging the SAMs computed for all image pixels. The smaller the absolute value of SAM is, the better the fusion quality is.
The universal image quality index (UIQI) introduced in : UIQI evaluates the similarity between two single-band images. It is related to the correlation, luminance distortion, and contrast distortion of the estimated image w.r.t. reference image. UIQI indicator is in the range . For multiband images, the overall UIQI is computed by averaging the UIQI computed band by band. The best value for UIQI is at 1.
V-B Experimental Results
We ran 20 independent trials for each dataset detailed in V-A1. The average performance of each algorithm is shown in Tables IVto VI. Except for runtimes, MR--NMF generally rank in the fifth first for all the quality measurements. For Urban dataset with noise added, MR--NMF with , and respectively rank first, second and third for all the metrics except for SAM for which CNMF ranks first. For the condition with no noise added, MR--NMF with , ranks first and second for all metrics. MR--NMF with , FUMI and HySure give similar results. For Washington DC Mall without noise added, MR--NMF with , ranks first and second for all metrics. For Indian Pines dataset without noise added, MR--NMF with ranks second while HySure ranks first. When noise is added, Lanaras’s method ranks first while MR--NMF with , rank second and third for most criteria. In order to give more insights on the performance comparison between algorithms, Figure 4 displays the SAM maps obtained for one trial for the Urban, Washington DC Mall and Indian Pines datasets. Visually, the proposed method performs competitively with other state-of-the-art methods. Indeed, as already observed with the SAM comparison in Tables IV to VI, the variants of MR--NMF show in general lower values for SAM errors across the images. For the Urban dataset, the highest SAM errors obtained with the variants of MR--NMF are less widespread and localized at some specific spots which correspond to the edges of the roofs and trees. This observation makes sense as those regions show more atypic reflectance angles and therefore more non-linear effects in terms of spectral mixture. The same observations apply for the Washington DC Mall dataset with and without noise added. For the Indian Pines dataset without noise added, HySure and FUMI algorithms show lower SAM errors accross images, we visually confirm that MR--NMF with rank third to fifth. When the noise is added, Lanaras’s method gives the lowest SAM errors and is less widespread, while MR--NMF with appear to provide less accurate estimates than CNMF that visually looks better.
|Method||Runtime (seconds)||PSNR (dB)||RMSE||ERGAS||SAM||UIQI|
|Dataset - HYDICE Urban -|
|MR--NMF||52.25 2.45||33.88 0.10||16.26 0.19||2.48 0.03||4.13 0.06||0.97 0.00|
|MR--NMF||54.46 2.31||34.54 0.06||14.92 0.09||2.28 0.01||3.65 0.04||0.98 0.00|
|MR--NMF||52.20 2.03||34.85 0.10||14.51 0.14||2.22 0.03||3.49 0.06||0.98 0.00|
|MR--NMF||54.47 1.96||34.81 0.10||14.65 0.15||2.24 0.02||3.52 0.06||0.98 0.00|
|GSA||0.72 0.05||32.52 0.00||19.41 0.00||2.87 0.00||5.63 0.00||0.96 0.00|
|CNMF||9.73 1.84||34.33 0.50||15.45 0.85||2.37 0.17||3.64 0.27||0.98 0.00|
|HySure||31.57 2.93||33.90 0.00||16.44 0.00||2.57 0.00||4.17 0.00||0.97 0.00|
|FUMI||0.39 0.03||32.92 0.00||20.30 0.00||2.85 0.00||4.92 0.00||0.96 0.00|
|GLP||6.05 0.42||27.24 0.00||34.37 0.00||5.10 0.00||6.27 0.00||0.91 0.00|
|MAPSMM||44.12 2.60||25.57 0.00||41.95 0.00||6.15 0.00||6.82 0.00||0.87 0.00|
|SFIM||0.24 0.03||26.32 0.00||37.89 0.00||5.71 0.00||5.90 0.00||0.90 0.00|
|Lanaras’s method||8.12 8.71||29.33 0.29||26.84 0.85||4.39 0.23||4.88 0.26||0.94 0.00|
|Dataset - HYDICE Urban - No added noise|
|MR--NMF||49.55 0.31||38.10 0.40||10.94 0.31||1.67 0.07||3.28 0.10||0.99 0.00|
|MR--NMF||51.54 0.52||40.01 0.50||8.82 0.32||1.35 0.09||2.60 0.10||0.99 0.00|
|MR--NMF||49.71 0.12||41.53 0.56||7.86 0.28||1.19 0.07||2.27 0.10||0.99 0.00|
|MR--NMF||52.09 0.35||41.69 0.64||7.81 0.35||1.19 0.08||2.23 0.12||0.99 0.00|
|GSA||0.67 0.04||32.93 0.00||22.17 0.00||2.87 0.00||5.25 0.00||0.97 0.00|
|CNMF||10.56 2.02||35.35 0.64||13.91 1.81||2.18 0.32||3.26 0.53||0.98 0.00|
|HySure||28.51 1.09||40.27 0.00||9.67 0.00||1.46 0.00||2.50 0.00||0.99 0.00|
|FUMI||0.36 0.02||41.01 0.00||14.14 0.00||1.67 0.00||2.71 0.00||0.99 0.00|
|GLP||5.61 0.09||27.97 0.00||31.97 0.00||4.65 0.00||4.78 0.00||0.94 0.00|
|MAPSMM||42.19 0.84||25.92 0.00||40.56 0.00||5.89 0.00||5.66 0.00||0.89 0.00|
|SFIM||0.21 0.03||27.05 0.00||35.19 0.00||5.21 0.00||4.21 0.00||0.93 0.00|
|Lanaras’s method||4.72 4.72||29.50 0.35||26.54 0.69||4.26 0.23||4.57 0.21||0.95 0.00|
|Method||Runtime (seconds)||PSNR (dB)||RMSE||ERGAS||SAM||UIQI|
|Dataset - HYDICE Washington DC Mall -|
|MR--NMF||57.59 0.32||26.77 0.25||202.02 3.59||18.21 0.13||3.38 0.11||0.90 0.01|
|MR--NMF||60.04 0.39||26.37 0.32||194.40 6.38||18.07 0.23||3.05 0.18||0.87 0.01|
|MR--NMF||57.95 0.24||26.29 0.20||188.42 11.18||18.50 0.25||2.83 0.28||0.86 0.01|
|MR--NMF||60.38 0.20||25.68 0.28||201.62 14.05||19.46 0.41||3.06 0.30||0.83 0.01|
|GSA||0.79 0.04||23.00 0.00||235.64 0.00||32.25 0.00||4.20 0.00||0.74 0.00|
|CNMF||7.25 1.26||27.60 0.09||192.67 6.50||17.37 0.10||2.55 0.14||0.89 0.00|
|HySure||34.14 0.94||24.01 0.00||351.13 0.00||33.51 0.00||6.15 0.00||0.75 0.00|
|FUMI||0.42 0.02||24.67 0.00||243.06 0.00||19.73 0.00||4.04 0.00||0.80 0.00|
|GLP||6.42 0.24||19.85 0.00||423.89 0.00||33.64 0.00||5.28 0.00||0.67 0.00|
|MAPSMM||40.91 0.46||19.34 0.00||494.39 0.00||32.18 0.00||5.91 0.00||0.65 0.00|
|SFIM||0.24 0.01||18.08 0.00||892.35 0.00||42.23 0.00||5.45 0.00||0.64 0.00|
|Lanaras’s method||3.11 1.94||25.95 0.06||235.62 2.67||17.36 0.02||2.78 0.03||0.90 0.00|
|Dataset - HYDICE Washington DC Mall - No added noise|
|MR--NMF||58.55 1.50||32.61 0.28||128.50 5.87||5.54 0.13||2.59 0.12||0.97 0.00|
|MR--NMF||60.95 1.58||35.36 0.38||104.11 5.89||2.41 0.22||1.89 0.12||0.98 0.00|
|MR--NMF||59.01 2.02||37.80 0.75||89.20 5.43||1.76 0.27||1.47 0.07||0.99 0.00|
|MR--NMF||61.21 1.05||38.27 0.83||90.88 6.26||1.55 0.20||1.48 0.10||0.99 0.00|
|GSA||0.81 0.08||29.93 0.00||262.27 0.00||3.11 0.00||3.84 0.00||0.97 0.00|
|CNMF||7.90 2.67||31.46 1.07||152.95 14.25||5.93 8.92||2.01 0.49||0.96 0.03|
|HySure||35.85 2.19||31.23 0.00||190.57 0.10||3.21 0.00||3.21 0.00||0.96 0.00|
|FUMI||0.43 0.03||36.52 0.00||142.92 0.00||2.32 0.00||1.76 0.00||0.98 0.00|