Hyperspectral (HS) imaging devices are non-conventional sensors that sample the observed electromagnetic spectra at hundreds of contiguous wavelength intervals. This spectral resolution makes HS imaging an important tool for identification of the materials present in a scene, what has attracted interest in the past two decades. Hyperspectral images (HSIs) are now at the core of a vast and increasing number of remote sensing applications such as land use analysis, mineral detection, environment monitoring, and field surveillance [1, 2].
However, the high spectral resolution of HSIs does not come without a compromise. Since the radiated light observed at the sensor must be divided into a large number of spectral bands, the size of each HSI pixel must be large enough to attain a minimum signal to noise ratio. This leads to images with low spatial resolution . Multispectral (MS) sensors, on the other hand, provide images with much higher spatial resolution, albeit with a small number of spectral bands.
One approach to obtain images with high spatial and spectral resolution consists in combining HS and MS images (MSI) of the same scene, resulting in the so-called HS-MS image fusion problem . Several algorithms have been proposed to solve this problem. Early approaches were based on component substitution or on multiresolution analysis. In the former, a component of the HS image in some feature space is replaced by a corresponding component of the MS image . In the latter, high-frequency spatial details obtained from the MS image are injected into the HS image [6, 7]. These techniques generalize Pansharpening algorithms, which combine an HSI with a single complementary monochromatic image .
More recently, subspace-based formulations have become the leading approach to solve this problem, exploring the natural low-dimensional representation of HSIs as a linear combination of a small set of basis vectors or spectral signatures[4, 9, 10, 11]
. Many approaches have been proposed to perform image fusion under this formulation. For instance, Bayesian formulations have been proposed to solve this problem as a maximum a posteriori estimation problem, as the solution to a Sylvester equation , or by considering different forms of sparse representations on learned dictionaries [14, 15]. Other works propose a matrix factorization formulation, using for instance sparse [16, 17] or spatial regularizations , or processing local image patches individually . Other approaches use unsupervised formulations which jointly estimate the set of basis vectors and their coefficients [20, 21]
. More recently, tensor factorization methods have also been proposed to solve the image fusion problem by exploring the natural representation of HSIs as 3-dimensional tensors[22, 23].
Although platforms carrying both HS and MS imaging sensors are becoming more common in recent years, their number is still considerably limited [24, 25]. However, the increasing number of optical satellites orbiting the Earth (e.g. Sentinel, Orbview, Landsat and Quickbird missions) provides a large amount of MS data, which can be exploited to perform image fusion using HS images acquired on board of different missions . This scenario provides the greatest potential for the use of HS-MS fusion in practical applications. Furthermore, the short revisit cycles of these satellites provide MS images at short subsequent time instants. These MS images can be used to generate HS image sequences with a high temporal resolution (what has been already done with Landsat and MODIS data [26, 27, 28, 29, 30]).
Though combining images from different sensors provides a great opportunity for applying image fusion algorithms, it introduces a challenge that has been largely neglected. Since the HS and MS images are observed at different time instants, their acquisition conditions (e.g. atmospheric and illumination) can be significantly different from one another. Furthermore, seasonal changes can make the spectral signature of the materials in the HS image change significantly from those in the corresponding MSI. These factors have been ignored in image fusion algorithms proposed to date, and can have a significant impact on the solution of practical problems.
In this paper, we propose a new image fusion method accounting for spectral variability between the images, which can be caused by both acquisition (e.g. atmospheric, illumination) or seasonal changes. Differently from existing approaches, we allow the high-resolution images underlying the observed HS and MS images to be different from one another. Employing a subspace/unmixing-based formulation, we use a unique set of endmembers for each image, employing a parametric model to represent the variability of the spectral signatures. The proposed algorithm, which is called HS-MS image Fusion with spectral Variability (FuVar), estimates the subspace components (endmembers and abundance maps) of the high-resolution images using an alternating optimization approach, making use of the Alternating Direction Method of Multipliers to solve each subproblem. Simulation results with synthetic and real data show that the proposed approach performs similarly or better than state-of-the-art methods for images acquired under the same conditions, and much better when there is spectral variability between them.
The paper is organized as follows. In Section II, the HS and MS observation process is presented, and a new model to represent spectral variability between these images is proposed. The proposed FuVar algorithm is introduced in Section III. Experimental results with synthetic and real remote sensing images are presented in Section IV. Finally, Section V concludes the paper.
Ii Spectral Variability in Image Fusion
Let be an observed low spatial resolution HSI with bands and pixels, and an observed low spectral resolution MSI with bands and pixels, with and . The image fusion problem consists of estimating an underlying image with high spatial and spectral resolutions, given and .
A common approach to solve this problem is based on spectral unmixing using the linear mixture model (LMM). The LMM assumes that the spectral representation of an observed image can be decomposed as a convex combination of a small number of spectral signatures (called endmembers) of pure materials in the scene . Let
be the LMM for the underlying image . In (1), the columns of are the spectral signatures of the endmembers, and each column of is the vector of fractional abundances for the corresponding pixel of .
Unmixing-based formulations successfully exploit the reduced dimensionality of the problem since only matrices and (which are much smaller than ) need to be estimated. Furthermore, a good estimate of for the fusion problem can be obtained directly from due to its high spectral resolution . Other subspace estimation techniques, such as SVD or PCA, can also be employed to represent . However, the LMM is usually preferred for its connection with the physical mixing process and for the better performance empirically verified in practice [4, 18].
The HS-MS image fusion literature assumes that all observed images are acquired at exactly the same conditions. This consideration allows both the HS and the MS images to be directly related to a unique underlying image . This, however, is a major limitation of existing methodologies since a vast amount of data available for processing may come from sensors on board of different instruments or missions (e.g. AVIRIS and Sentinel images). In such scenarios, the acquisition conditions can be very different for HS and MS images, especially when there is a significant time interval between their capture. Such situation is commonly encountered when data provided by several different MS sensors is used to generate high temporal resolution sequences. The use of different sensors introduces variability in the spectral signatures of the observed images due to, for instance, varying illumination, different atmospheric conditions, or seasonal spectral variability of the materials in the scene (e.g. in the case of vegetation analysis) [31, 32]. These effects cannot be adequately modeled by unmixing or subspace formulations that assume a single underlying image for both the HS and MS images. These methods assume that the underlying materials have the same spectral signatures in both images, and thus can be decomposed using the same subspace components [20, 18, 33]. In the following, we propose a new model that accounts for the effects of spectral variability in HS-MS image fusion.
Ii-a The proposed model
Consider distinct high-resolution images and , both with bands and pixels, corresponding to the observed HSI and MSI, respectively. These high resolution (HR) images can be different due to seasonal variability effects. We then decompose and as
where is the fractional abundance matrix, assumed common to both images, and , are the endmember matrices of the HSI and MSI, respectively. and can be different to account for spectral variability. For simplicity we have assumed the number of endmembers to be the same for both the hyperspectral and multispectral images. However, the materials represented in and can be completely different from one another.
Using (2), we assume that the observed HSI and MSI are generated according to
where accounts for optical blurring due to the sensor point spread function, is a spatial downsampling matrix, is a matrix containing the spectral response functions (SRF) of each band of the MS instrument. and represent additive noise.
Matrix can be estimated from the observed HSI using endmember extraction or subspace decomposition. However, the same is not true for the estimation of since the MSI has low spectral resolution. To address this issue, we propose to write as a function of using a specific model for spectral variability.
Different parametric models have been recently proposed to account for spectral variability in hyperspectral unmixing. The extended LMM proposed in  allows the scaling of each endmember spectral signature by a constant. This model effectively represents illumination and topographic variations, but is limited in capturing more complex variability effects. The perturbed LMM proposed in  adds an arbitrary variation matrix to a reference endmember matrix. This model lacks a clear connection to the underlying physical phenomena. The generalized LMM (GLMM), recently proposed in , accounts for more complex spectral variability effects by considering an individual scaling factor for each spectral band of the endmembers. This model introduces a connection between the amount of spectral variability and the amplitude of the reference spectral signatures, agreeing with practical observations.
Considering the GLMM, we propose to model the multispectral endmember matrix as a function of the endmembers extracted from the HSI as
where is a matrix of positive scaling factors and denotes the Hadamard product. This model can represent spectral variability caused by seasonal changes [31, 32], global illumination [34, 37] or global atmospheric variations [38, 39]. Then, the image fusion problem can finally be formulated as the problem of recovering the matrices , and from the observed HS and MS images and .
Iii The image fusion problem
Considering the observation model in (3) and assuming that matrices , , and are known or estimated previously, the image fusion problem reduces to estimating the variability and abundance matrices and . Therefore, we propose to solve the image fusion problem through the following optimization problem:
where parameters , and balance the contributions of the different regularization terms to the cost function. The last two terms are regularizations over . The first of them controls the amount of spectral variability by constraining to be close to unity. The second enforces spectral smoothness by making a differential operator. is a regularization over to enforce spatial smoothness, and is given by
where the linear operators and compute the first-order horizontal and vertical gradients of a bidimensional signal, acting separately for each material of . The spatial regularization in the abundances is promoted by a mixed norm of their gradient, where . This norm is used to promote sparsity of the gradient across different materials (i.e. to force neighboring pixels to be homogeneous in all constituent endmembers). The norm can also be used, leading to the Total Variation regularization, where .
Although the problem defined in (III) is non-convex with respect to both and , it is convex with respect to each of the variables individually. Thus, we propose to use an alternating least squares strategy by minimizing the cost function iteratively with respect to each variable in order to find a local stationary point of (III). This solution is presented in Algorithm 1, and the individual steps are detailed in the following subsections.
Iii-a Optimizing w.r.t.
where is the indicator function of (i.e. if and if ) acting component-wise on its input, and enforces the abundance positivity constraint.
Since problem (III-A) is not separable w.r.t. the pixels neither differentiable due to the presence of the norm we resort to a distributed optimization strategy, namely the alternating direction method of multipliers (ADMM). Specifically, the cost function in (III-A) must be represented in the following form 
where and are closed, proper and convex functions, and , , , , .
with the scaled dual variable, and . The ADMM consists of updating the variables , and sequentially. Denoting the primal and dual variables at the -th iteration of the algorithm by , and , the ADMM can be expressed as
The optimization problem (III-A) is then rewritten as
where is a permutation matrix that performs the transposition of the vectorized HSI, that is, , and
where are block diagonal matrices with repeats of matrix in the main diagonal, that is,
Iii-B Optimizing w.r.t.
The optimization problem with respect to and considering fixed can be recast from (III) by considering only terms that depend on :
Then, the optimization problem (III-B) can be expressed as
which is equivalent to the ADMM problem.
The scaled augmented Lagrangian for this problem can be written as
The optimization of with respect to each of the variables , and is described in detail in Appendix B.
Iv Experimental Results
Three experiments were devised to illustrate the performance of the proposed method. The first one uses synthetically generated images with controlled spectral variability in order to assess the estimation performance. The second compares the performances of the proposed method with those of other techniques when there is no spectral variability between the images (i.e. acquired at the same time instant). Finally, the third example depicts the fusion of real HS and MS images acquired at different time instants, thus containing spectral variability. Details of the three examples and their experimental setups are described next.
Iv-a Experimental Setup
We compare the proposed method to three other techniques, namely: the unmixing-based HySure  and the CNMF  algorithms, and the multiresolution analysis-based GLP-HS algorithm . These methods provided the best overall fusion performance in extensive experiments reported in survey .
For experiments 2 and 3, which are based on real hyperspectral and multispectral images acquired at the same spatial resolution, two preprocessing steps were performed based on the same procedure described in 
. First, spectral bands with low SNR or corresponding to water absorption spectra were manually removed. Next, all bands of the hyperspectral and multispectral images were normalized such that the 0.999 intensity quantile corresponded to a value of 1. Finally, the HSI was denoised using the method described in to yield high-SNR (hyperspectral) reference images .
For all experiments, the observed HSIs were generated by blurring the reference HSI image with a Gaussian filter with unity variance, decimating it and adding noise to obtain a 30dB SNR. The observed MSIs were obtained from the reference MSI image by adding noise to obtain a 40dB SNR.
We fixed the number of endmembers at for the CNMF, HySure and FuVar methods according to . The blurring kernel was assumed to be known a priori for the HySure and FuVar methods, and was estimated from the HSI using the VCA algorithm . The regularization parameters for the proposed method were empirically set at , and . We selected a large value for to have spectrally smooth variability, and a small value of to allow the scaling factors to adequately fit the data. Nevertheless, the proposed method showed to be fairly insensitive to the choice of parameters. We ran the alternating optimization process in Algorithm 1 for at most 10 iterations or until the relative change of and was less than .
The visual assessment of the reconstructed images is performed using one true-color image at the visible spectrum (with red, green and blue corresponding to , and ) and one pseudo-color image at the infrared spectrum (with red, green and blue corresponding to , and ).
Iv-B Quality measures:
We use four quality metrics previously considered in [4, 18] to evaluate the quality of the reconstructed images, denoted by , by comparing them to the original HR images . The first one is the peak signal to noise ratio (PSNR), computed bandwise and defined as
where is the expected value operator and denotes the -th band of . Larger PSNR values indicate higher quality of spatial reconstruction, tending to infinity when the bandwise reconstruction error tends to zero.
The second metric is the Spectral Angle Mapper (SAM):
The third metric is the Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS) , which provides a global statistical measure of the quality of the fused data with the best value at 0:
where computes the sample mean value of a signal.
Finally, the fourth metric is the Universal Image Quality Index (UIQI) , extended to multiband images as
where computes the sample variance of a signal, computes the sample covariance between and , and is a projection matrix which extracts the -th patch with pixels from the images and , with being the total number of image patches. The UIQI is restricted to the interval , corresponding to a perfect reconstruction when equal to one.
Iv-C Example 1
This example was designed to evaluate the algorithms in a controlled environment. The true abundance matrix , with pixels and containing spatial correlation was generated using a Gaussian Random Fields algorithm. Three endmembers with bands were selected from the USGS spectral library and used to construct matrix . The MS endmember matrix was obtained by multiplying with a matrix of scaling factors to introduce spectral variability. Matrix was constructed by random piecewise linear functions centered around a unitary gain. The high-resolution HS and MS images and
were generated from this data using the linear mixing model. The MS imagewith bands was then generated by applying a piecewise uniform SRF to . The low-resolution HS image was generated by blurring with a Gaussian filter with unitary variance and decimating it by a factor of . Finally, white Gaussian noise was added to both images, resulting in SNRs of 30dB for and 40dB for . Both the blurring kernel and the SRF were assumed to be known a priori for all algorithms.
The quantitative results for the quality of the reconstructed images with respect to both and are shown in Table I. Figs. 1 and 2 provide a visual assessment of the results for the super resolved HS and MS images and .
|Hyperspectral HR image||Multispectral HR image|
The results presented in Table I clearly show the performance gains obtained in all metrics using the proposed FuVar method. In terms of PSNR, the FuVar algorithm yielded gains larger than 5dB and 9dB for and , respectively. Substantial performance improvements were also obtained for all other metrics, indicating that FuVar yields more accurate spatial and spectral results, resulting in better image reconstruction. Visual inspection of Figs. 1 and 2 shows that all methods yield reasonable results for both the visible and infrared regions of the spectrum. However, the most accurate reconstruction is clearly provided by FuVar, which keeps a good trade-off between the smooth parts and sharp discontinuities in the image without introducing spatial artifacts or color distortions. Such defects can be seen in the results from all other methods such as the CNMF and, especially, the HySure algorithm.
Iv-D Example 2
This example compares the performance of the different methods when there is no spectral variability between the HS and MS images. We consider a dataset consisting of HS and MS images acquired above Paris with a spatial resolution of 30m. These images were initially presented in  and captured by the Hyperion and the Advanced Land Imager instruments on board the Earth Observing-1 Mission satellite.
The reference HSI, with pixels and bands was blurred by a Gaussian filter with unity variance and decimated by a factor of . Finally, WGN was added to generate the observed HSI with a 30dB SNR. The observed MSI with bands was obtained from the reference MSI by adding WGN to achieve a 40dB SNR. The spectral response function was estimated from the observed images using the method described in .
The quantitative results for the quality of the reconstructed images with respect to are displayed in Table II. A visual assessment of the results for the super-resolved HS image is presented in Fig. 3.
Although the proposed method estimates the matrix to cope with the spectral variability often existing between the HS and MS images, its performance was still comparable with state of the art image fusion methods when there was no spectral variability. In fact, the results in Table II show that FuVar yielded slightly better than the competing methods for all metrics but the SAM, for which it yielded the second best and very close to the best result. The slight improvement obtained by the FuVar algorithm can be attributed to the fact that the matrix , originally devised to capture spectral variability, can also compensate for errors introduced by the estimation of the spectral response function . Visually, the images displayed in Fig. 3 also indicate that the results obtained with all methods are very similar. This reconstruction similarity is in agreement with the quantitative results presented in Table II.
Iv-E Example 3
This example illustrates the performance of the proposed method when fusing HS and MS images acquired at different time instants, thus presenting seasonal spectral variability. We consider four pairs of reference hyperspectral and multispectral images with a spatial resolution of 20m. The HS images were acquired by the AVIRIS instrument and contained bands after pre-processing, while the MS images containing bands was acquired by the Sentinel-2A instrument.
We divided this experiment in two parts using four different pairs of images111All HS and MS images used in this experiment are available at https://aviris.jpl.nasa.gov/alt_locator/ and https://earthexplorer.usgs.gov.. The first two pairs of images were acquired with a relatively small time difference between the HS and MS images (smaller than three months), and illustrate the performance of the methods for small spectral variability. The other two pairs of images were acquired with a larger time difference (more than one year), and illustrate a scenario with considerable spectral variability and seasonal changes.
In all cases, the observed image pairs and were generated as follows. The reference hyperspectral images were blurred by a Gaussian filter with unity variance and decimated by a factor of . Finally, WGN was added to generate the observed HSIs with a 30dB SNR. The observed MS images was obtained by adding WGN to the reference MSI to obtain a 40dB SNR. The spectral response function obtained from calibration measurements and was thus known a priori222Available at https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/document-library/-/asset_publisher/Wk0TKajiISaR/content/sentinel-2a-spectral-responses.
Iv-E1 HS and MS images with a small acquisition time difference
The first pair of images, with pixels, was acquired over the Lake Isabella area and can be seen in Fig. 4. The HS and MS images were acquired on 2018-06-27 and on 2018-08-27, respectively. The second pair of images, with pixels, was acquired near Lockwood and is also shown in Fig. 4. The HS and MS images were acquired on 2018-08-20 and on 2018-10-19, respectively. Although the HS and MS images are similar, there are slight differences in the color hues of the images. This is most clearly seen in the right part of the Lake Isabella image, and in the central ground stripe of the Lockwood image.
|Lake Isabella image||Lockwood image|
The results in Table III show a superior performance of the FuVar algorithm when compared to those of the other algorithms. Although the SAM results for both images and the ERGAS results for the Lockwood image were similar between the FuVar and CNMF algorithms, the FuVar shows a considerable performance improvement in the remaining metrics for both data sets. Visual inspection of the recovered super-resolved HSIs in Figs. 6 and 7 shows an accuracy improvement obtained by using FuVar over the competing algorithms for both the visible and infrared portions of the spectrum. Although the HySure algorithm resulted in coherent images without large color differences from the reference images in Fig. 6, slight differences can be found throughout the whole scene. The GLP-HS and CNMF showed even clearer color aberrations in the visible spectrum in Fig. 7. The FuVar algorithm results, on the other hand, are very close to the reference images for both examples.
Iv-E2 HS and MS images with a large acquisition time difference
The third pair of images, with pixels, was acquired over the Lake Tahoe area and can be seen in Fig. 5. The HS and MS images were acquired at 2014-10-04 and at 2017-10-24, respectively. The fourth pair of images, with pixels, was acquired over the Ivanpah Playa and are also shown in Fig. 5. The HS and MS images were acquired at 2015-10-26 and at 2017-12-17, respectively. A significant spectral variability between the images can be readily verified in both cases. For the Lake Tahoe image, it is evidenced by the different color hues of the ground, and by the appearence of the crop circles, which are younger and have a lighter color in the MSI. In the Ivanpah Playa image, a significant overall difference is observed between the sand colors in both images.
|Lake Tahoe image||Ivanpah Playa image|
The results in Table IV show a clear advantage of the FuVar algorithm over the competing methods, with significant performance gains in all metrics for both datasets. Visual inspection of the recovered super resolved HSIs in Figs. 8 and 9 reveals a clear accuracy improvement obtained by using FuVar over the competing algorithms for both the visible and infrared portions of the spectrum. Although the GLP-HS algorithm yielded sharper contours around the circular plantations in Fig. 8 and the road in Fig. 9, it clearly introduced many visual artifacts that do not exist in the reference images. Furthermore, the colors in the GLP-HS results are significantly different from those in the reference image. The FuVar algorithm results show a good compromise between the sharp discontinuities and the smoothness of the more continuous parts of the image.
Iv-F Execution Times
The execution times for all tested algorithms are shown in Table V. The higher FuVar execution times are expected since the other fusion methods do not account for spectral variability between the HS and MS images, thus leading to much simpler optimization problems.
Iv-G Parameter Sensitivity
In this section we explore the sensitivity of the proposed method to the choice of values for parameters , , . For the experimental setup of Example 1, we varied each parameter individually, while keeping the remaining ones fixed at the values described in Section IV-A, and computed the PSNR of the HSI reconstructed by the FuVar algorithm. The results are shown in Fig. 10 as a function of the ratio , where is the empirically selected value of the corresponding parameter. The PSNRs obtained using competing methods are also shown for reference. It is clear that even variations of parameter values by various orders of magnitude had only a relatively small effect on the resulting PSNR, which remains significantly higher than that of the remaining methods. This indicates that the performance of the proposed method is not overly sensitive to the choice of the regularization parameters.
We presented a novel hyperspectral and multispectral image fusion strategy accounting for spectral mismatches between the two images, which are often related to illumination and/or seasonal changes. The proposed strategy combines the image fusion paradigm with a parametric model for material spectra, resulting in a new algorithm named FuVar. Simulations with synthetic and real data illustrated the performance improvement obtained by incorporating spectral variability in the fusion problem. The proposed strategy was shown to be quite insensitive to the choice of hyperparameters, and to be competitive with state of the art algorithms even when spectral variability is not present. This allows for broad application of the proposed method.
Appendix A Optimization of the ADMM for the subproblem
In this section, we describe the optimization of the scaled augmented Lagrangian in (III-A) with respect to each one of the variables in .
A-a Optimizing w.r.t.
The subproblem of minimizing with respect to can be extracted from (III-A) by selecting the terms that depend on :
Alternatively, since is the vectorization of the abundance matrix , we can re-write this problem for each pixel as
Taking the gradients w.r.t. and setting them equal to zero leads to
and the solution is given by
A-B Optimizing w.r.t.
Equivalently to Section A-A, the subproblem for can be written as
Representing the last term of (A-B) in matrix form leads to
where . Problem (A-B) is equivalent to
and can be re-written individually for each pixel as
Thus, taking the derivative of (A-B) for each pixel and setting it equal to zero, we have
A-C Optimizing w.r.t.
Analogously to Section A-A, the subproblem for can be written as
Representing the variables in matrix form, problem (A-C) is equivalent to
and can be recast to each HS band as
Taking the derivative w.r.t. each band and setting it equal to zero gives
for . Since matrices and are sparse, problem (A-C) can be solved efficiently using iterative methods such as the Conjugate Gradient method, where the large matrices involved can be computed implicitly.
A-D Optimizing w.r.t.
This subproblem is equivalent to
whose solution w.r.t. leads to
The dimensionality of the matrices and in this problem makes the direct inversion of the terms involving them in (A-D) impractical. However, assuming these differential operators are block circulant, and following the same approach as in  and , these operations can be computed efficiently. This is done by performing the computations with the differential operators and their adjoint in the Fourier domain, independently for each endmember. Denoting the convolution masks associated with and by and , respectively (see details in 
), assuming circular boundary conditions and using properties of the Fourier transform, the solution in (A-D) can be computed equivalently as