I Introduction
Owing to the dense sampling in the spectral domain, hyperspectral (HS) images can provide more accurate and faithful measurements towards the realworld scenes/objects than traditional RGB images. Such rich spectral information will be beneficial to various visionbased applications, such as tracking [60], segmentation [42], and detection [38, 46]. However, the acquisition of HS images is costly, which severely limits the wide deployment of HS imagebased applications.
Instead of relying on the development of hardware, many computational methods, such as compressive sensingbased [51, 53, 52, 12, 72, 21, 40] , HS and RGB image fusion [45, 11, 64, 35, 62], [58], single RGB imagebased [15, 5, 44, 61]
, and spatial superresolution
[32, 23, 33, 31], [68], have been proposed to acquire HS images in an affordable and convenient manner. Particularly, reconstructing HS images from single RGB images, which does not require speciallydesigned acquisition hardware, is a promising direction. Owing to the strong ability of learning representations, deep neural network (DNN)based methods have recently been proposed to address this challenging task [47, 70]. For example, Zhang et al. [70] proposed pixelaware deep learning framework for spectral upsampling. Li et al. [29, 30] introduced the spectral and spatial attention mechanism into the reconstruction process. See Sec. IIfor more details. However, most of existing DNNbased spectral reconstruction methods adopt architectures for general purposes, and neglect the unique characteristics of this task, e.g., the specific relationship between HS and RGB images, which may compromise their performance. Second, the majority of them trained with RGB images acquired via a typical spectral response function (SRF) cannot handle RGB images via a different SRF during inference, which limits their use in practice to some extent. In addition, existing DNNbased methods were usually trained with pixelwise loss functions, which fail to capture the global structure of HS images, i.e., the relationship among spectral bands.
In this paper, we propose a novel DNNbased framework, which is highlighted with compact, efficient, interpretable, and effective characteristics, for the reconstruction of HS images from single RGB images in an endtoend fashion. Specifically, based on the specific relationship between RGB and HS images, we first explicitly formulate the problem as amended gradient descent (AGD) progress, which boils down to determining an initialization, a basic gradient, and an incremental gradient. Then, we propose AGDNet with a multistage structure to mimic the AGD process, in which with the initialization learned, the basic and incremental gradients are adaptively and progressively learned at each stage by embedding the spatialspectral information of input RGB images via memory and computationallyefficient convolution and novel spectral zeromean normalization. To exploit the global structure of HS images, we also propose a novel rank loss, which is optimized via a singular value weighting strategy during training. Thanks to the interpretable architecture, we extend AGDNet to enable a single network after onetime training can handle input RGB images generated with different SRFs. Extensive experimental results demonstrate the significant superiority of AGDNet over stateoftheart methods, i.e., AGDNet reconstructs HS images with much higher quality but at lower memory and computational costs.
The rest of this paper is organized as follows. Sec. II briefly reviews existing methods for HS image reconstruction. Sec. III formulates the problem. Sec. IV presents the proposed framework, followed by extensive experimental results as well as analyses in Sec. V. Finally, Sec. VI concludes this paper.
Ii Related Work
In the following, we briefly review the existing works on the reconstruction of HS images from single RGB images.
Iia Traditional Methods
Many traditional methods assume that HS images lie in a lowdimensional subspace and explore the map between RGB images and subspace coordinates. For example, Nguyen et al. [43] leveraged RGB whitebalancing to normalize the scene illumination to recover the scene reflectance. Arad et al. [4] proposed a sparse codingbased method, which learns an overcomplete dictionary of HS images to describe the novel RGB images. Then Aeschbacher et al. [1] further improved it through a shallow A+based method [49]. Jia et al. [24] exploited the 3D embedded space where the natural scene spectra reside and learned an accurate nonlinear mapping from RGB images to 3D embeddings. Heikkinen et al. [22]estimated the spectral subspace coordinates via a scalarvalued Gaussian process regression with anisotropic or combination kernels. Gao et al. [16] proposed a joint sparse and lowrank dictionary learning method for the reconstruction of HS images from single RGB images.
IiB DNNbased Methods
On the basis of the impressive representation ability of DNNs, many DNNbased methods have been proposed to reconstruct HS images from single RGB images. For example, Xiong et al. [61]
proposed a DNNbased method, namely HSCNN, for the reconstruction of HS images from RGB images or measurements obtained via compressive sensing, which mainly aims to enhance the spectral signatures constructed by a simple interpolation or CS reconstruction. Shi
et al. [47]further improved HSCNN by replacing all predefined upsampling operators with residual blocks and introduced dense connections with a crossscale fusion scheme to facilitate the feature extraction process. Gewali
et al. [17] utilized DNNs to optimize multispectral bands and hyperspectral recovery simultaneously to achieve more accurate HS image reconstruction. Fu et al. [14] modeled HS image reconstruction by exploring nonnegative structured information and utilized multiple spare dictionary to learn a more compact basis representation. Berk et al. [26] trained multiple models to reconstruct HS images from RGB images captured with different SRFs, and they also trained an additional model to select different models during realworld applications. Li et al. [29] also proposed an attentionbased method which utilizes both channel attention and spatial nonlocal attention. Based on the assumption that pixels in an HS image belong to different categories or spatial positions and often require distinct mapping functions, Zhang et al. [70] proposed a pixelaware deep functionmixture network, which learns different bias functions and then linearly mixes them up according to pixellevel weights. Aitor et al. [2]treated HS image reconstruction as an image to image mapping problem and applied a generative adversarial network to capture spatial semantics. Yan
et al. [63] introduced prior category information to generate distinct spectral data of objects via a UNetbased architecture. Zhao et al. [71] presented a hierarchical regression network with a pixel shuffle layer. Fu et al. [13] developed an SRF selection layer to retrieve the optimal response function for HS image reconstruction. Peng et al. [44] introduced a pixelwise attention module for boosting reconstruction performance. Galliani et al. [15] utilized a densely connected UNetbased architecture for HS images reconstruction. However, the performance of the abovementioned methods is still limited, due to insufficient modeling towards the problem. Besides, although these methods attempt to build reconstruction processes with physical meaning, the adopted architectures for general purposes seriously restrict their interpretability.IiC Algorithm Unrollingbased Methods
As our deep learningbased framework is driven by modelbased optimization, we also briefly review some related works under this stream. Since Gregor and LeCun [19] developed a sparse codingbased algorithm unrolling technique, a number of unrolling iterative algorithms with DNNs have been proposed for various image reconstruction, such as single RGB image superresolution [69, 8], compressive sensing [48, 37], and image fusion [59]. Generally, this kind of methods solves inverse problems via unfolding optimization steps and applying DNNs to solve them in a datadriven manner. The main differences among those methods lie in the formulation of an inverse problem as well as adopted optimization algorithms, which will result in various network architectures. For example, Lohit et al. [36] unrolled a projected gradient descent algorithm for HS image pansharpening. Wen et al. [58] utilized a deep coupled analysis and synthesis dictionarybased network for HS image superresolution. Wang et al. [51, 50] unfolded a half quadratic splitting algorithm using DNNs for coded aperture snapshot spectral imaging. We refer readers to [41] for the comprehensive survey on algorithm unrolling.
Iii Problem Formulation
Denote by the vectorial representation of an RGB image of spatial dimensions , and by the corresponding HS image with () spectral bands to be reconstructed. The relationship between and can be generally formulated as
(1) 
where is the spectral response function (SRF), and
is the noise. Simply, under the assumption that the noise is normally distributed, we can recover
from by optimizing the following problem formulated from Eq. (1):(2) 
where is Frobenius norm of a matrix. Moreover, with an initial guess , we can solve Eq. (2) with the classic gradient descent (GD) algorithm, and at the th () step, we have
(3) 
where is the step size and is the operator of computing the derivative of , i.e.,
(4) 
Unfortunately, it is almost impossible to obtain a feasible solution by means of such a simple optimization process, due to the severely illposed behavior of the problem in Eq. (2), i.e., there are numerous trivial solutions. In addition, the performance of such a scheme highly depends on the initialization. From the perspective of gradient space, the reason could be interpreted as that the gradient cannot decrease either along with an optimal path or from an appropriate starting point to the global minimum or a good local minimum during the iteration process. Therefore, to make the gradient process effective, an intuitive thought is that we can find an appropriate initialization and amend the gradient at each step of the iteration process That is, instead of Eq. (4), we can generally express the gradient at the th step as
(5) 
where is the amended gradient, and is the incremental gradient. Accordingly, we obtain the amended gradient descent process as
(6) 
Iv Proposed Method
Motivated by the intuitive and explicit formulation in Sec. III, as illustrated in Fig. 1, we propose a novel endtoend and lightweight DNNbased framework, namely AGDNet, which mimics the amended gradient descent process, to achieve the reconstruction of HS images from single RGB images. To be specific, with the initialization learned, we progressively learn the basic gradient and the incremental gradient via a multistage architecture, in which the spatialspectral information of the input RGB image is effectively and efficiently embedded. Besides, we propose a global structureaware loss function to train AGDNet endtoend. In what follows, we detail each module.
Iva Learning Initialization
This module aims to learn an appropriate initialization
as the starting point of the gradient descent process. We adopt a denselyconnected convolutional neural network (CNN) to extract spatialspectral information of
to regress . Specifically, to learn feature representations efficiently and effectively, we adopt a series of memory and computationalefficient spectralspatial separable convolution [73], which applies two kinds of sequentially connected convolution, namely 1D spectral convolution and 2D spatial convolution, with an inbetween activation function. Specifically, the former applies kernels of size 1
1 in 1D spectral/channel space for embedding spectral information, while the latter applies kernels of sizein the 2D spatial space for embedding spatial information. Moreover, to emphasize highfrequency spectral information and regularize the intermediate feature away from overfitting, we propose spectral zeromean normalization (SZMnorm), which enforces the vector formed by the features from different channels but at an identical spatial position to have a zeromean, i.e.,
(7) 
where denotes SZMnorm, is the th element of the feature map of the th channel. We will experimentally validate the effectiveness of this initialization module and the SZMnorm in the following Table V.
IvB Learning the Amended Gradient
In this module, we aim to learn an amended gradient, which is the sum of a basic gradient and an incremental gradient.
IvB1 Basic gradient
As formulated in Eq. (4), the SRF and its transpose in Eq. (4) actually act as the linear projection in pixelwise, and we thus simulate with a convolutional layer denoted as , and with a corresponding deconvolutional layer denoted as for the back projection, where and are the sets of parameters to be learned. Accordingly, the scaled basic gradient^{*}^{*}*The scaled basic gradient refers to the product of the step size and is derived as
(8) 
where is the intermediate HS image reconstructed at the th stage. Note that these two convolutional layers are not followed by an activation function in order to preserve the linear property of these transformations.
In addition, considering that the linear projection layers in all stages have the same purpose, i.e., adaptively learning the SRF, and we only explicitly supervise during training, we apply shared parameters to these layers, i.e., , to guarantee the error can be correctly calculated at all stages. We experimentally validate the effectiveness of such a weight sharing strategy in Table V.
IvB2 Incremental gradient
Considering that both the basic gradient and the incremental gradient are distributed in gradient space, we directly learn the incremental gradient from by using a subnetwork denoted as , i.e.,
(9) 
where is the set of parameters at the th stage to be learned. For simplicity, we adopt the same network architecture as that in Sec. IVA but different parameters to realize , whose architecture details are summarized in Table I.
According to Eq. (5), we can derive the amended gradient at the th stage as
(10)  
It can be seen that Eq. (10) has the same form as residual learning [20], and thus the advantages of residual learning will be inherited. Note that we remove all the bias of the convolutional layers in . The reason is that the error also measures the differences between reconstructed and groundtruth HS images, and when it reaches zero, the optimization process has found an appropriate reconstructed HS image with respect to Eq. (1). Then, the updating of the HS image should be terminated, requiring the amended gradient be zero, which is equivalent to that the subnetwork must pass through origin:
(11) 
where is a matrix with all elements equal to zero.
Kernel shape  # Input Channels  # Output Channels  Output shape  ReLU  SZMnorm  

The th Spectralspatial separable convolutional layer  
Spectral convolution  62 6211  62  62  12812862  
Spatial convolution  62133  62  62  12812862  
Spectralspatial separable convolution (without activation)  
Spectral convolution  3103111  310  31  12812831  –  
Spatial convolution  31133  31  31  12812831  – 
IvC Global Structureaware Loss Function
To train the AGDNet, basically, we adopt the following pixelwise loss function, i.e.,
(12)  
where is the norm of a matrix, which computes the sum of the absolute values of all elements of a matrix, and are the reconstructed and groundtruth HS images, respectively, is the penalty parameter, which is empirically set to 1, and is the convolutional layer projecting an HS image to the RGB image space. Many previous works have experimentally demonstrated that the formed matrix from an HS image is an approximate lowrank matrix [67, 9, 10, 7, 39, 34], i.e., the strong correlation among spectral bands. However, such a global structure of HS images cannot be captured by the pixelwise loss in Eq. (12). To this end, we propose a rank loss . Specifically, we adopt a singular value weighting strategy to enforce the singular values of reconstructed HS images to be exactly the same as those of the groundtruth HS images in a certain range , based on the following two considerations:

relatively larger singular values correspond to more principal components (or lowfrequency components of an image). However, for image reconstruction, the challenging issue lies in the recovery of highfrequency components, e.g., sharp details. Thus, we set an upper bound to promote the ability of the network in the learning of those details; and

the accuracy of eigenvectors corresponding to relatively small eigenvalues decreases. Thus, we set a lower bound
to avoid utilizing the inaccurate eigenvectors.
Algorithms 1 and 2 provide the forward and backward propagation of optimizing the rank loss during training, respectively.
The overall loss function for training AGDNet is finally written as
(13) 
where the parameter is set to 1 to balance the two terms.
IvD Flexible AGDNet
In this section, we further extend AGDNet for increasing its practicality and propose flexible AGDNet (FAGDNet), which is a single network that can handle data captured with various SRFs after only onetime training. Such an extension is enabled thanks to the interpretable architecture of AGDNet.
Specifically, to adapt various SRFs, we replace the learnable parameters of the linear projection layers in AGDNet, i.e., involved , with explicit SRFs specified by the data, while keeping the remaining settings unchanged. We train FAGDNet with RGB images acquired with various SRFs to augment its generalization ability. We carry out experiments to validate the effectiveness of FAGDNet in Sec. VD.
Methods  # Params  # FLOPs  PSNR  ASSIM  SAM  RMSE 

BI  –  –  23.71  0.6945  42.54  0.0835 
HSCNND [26]  3.61 M  5.22 T  40.55  0.9836  5.59  0.0110 
HIRNet [13]  2.10 M  2.94 T  39.80  0.9861  5.70  0.0397 
3DCNN [28]  0.78 M  8.32 T  42.25  0.9872  5.24  0.0093 
FMNet [70]  11.79 M  17.07 T  41.34  0.9881  6.09  0.0101 
AWAN [29]  17.45 M  24.63 T  43.35  0.9919  4.93  0.0089 
Ours  0.22 M  0.76 T  43.97  0.9922  4.82  0.0077 
Methods  # Params  # FLOPs  PSNR  ASSIM  SAM  RMSE 

BI  –  –  23.73  0.8278  33.81  0.0877 
HSCNND [26]  3.61 M  0.95 T  35.63  0.9733  9.63  0.0194 
HIRNet [13]  2.10 M  0.53 T  33.97  0.9456  9.40  0.0263 
3DCNN [28]  0.78 M  1.53 T  35.98  0.9739  8.89  0.0182 
FMNet [70]  11.47 M  3.09 T  36.84  0.9644  8.54  0.0179 
AWAN [29]  17.45 M  4.57 T  38.41  0.9904  8.08  0.0170 
Ours  0.26 M  0.14 T  39.68  0.9894  6.60  0.0138 
Methods  # Params  # FLOPs  PSNR  ASSIM  SAM  RMSE 

BI  –  –  30.85  0.9075  8.48  0.0394 
HSCNND [26]  3.61 M  0.890 T  41.42  0.9946  3.17  0.0120 
HIRNet [13]  2.01 M  0.532 T  35.26  0.9862  4.27  0.0190 
3DCNN [28]  0.78 M  1.440 T  40.81  0.9938  3.12  0.0124 
FMNet [70]  11.47 M  2.955 T  42.36  0.9950  3.10  0.0118 
AWAN [29]  17.45 M  4.300 T  41.99  0.9948  3.22  0.0112 
Ours  0.51 M  0.258 T  43.39  0.9953  2.75  0.0101 
V Experiments
Va Experiment Settings and Implementation Details
We used 3 widelyused benchmark datasets i.e., HARVARD^{†}^{†}†http://vision.seas.harvard.edu/hyperspec/ [6], CAVE^{‡}^{‡}‡http://www.cs.columbia.edu/CAVE/databases/ [65], and NTIRE 2020^{§}^{§}§http://www.vision.ee.ethz.ch/ntire20/ [5]:

The CAVE dataset consists of 32 HS images of spatial dimensions 512 512 and spectral bands 31 captured by a generalized assorted pixel camera with an interval wavelength of 10nm in the range of 400700nm. We randomly selected 20 HS images as the training set and the remaining 12 as the testing set. Following [57], [70], we generated input RGB images using the camera spectral response function of Nikon D700.

The HARVARD dataset contains 50 indoor and outdoor HS images of spatial dimensions and spectral bands 31 covering 420720 nm, which were captured under the daylight illumination. We utilized the first 30 HS images as the training set and the remaining 20 ones as the testing set. Following [57], [70], we generated input RGB images using the camera spectral response function of Nikon D700.

The NTIRE 2020 dataset contains 450 HS/RGB image pairs from training, 10 pairs for validation, and 20 pairs for test. The HS images have 31 spectral bands covering 400700nm. As the groundtruth images of the test set are unavailable, we adopted the validation set for evaluation.
We adopted the ADAM [27] optimizer with the exponential decay rates and
for the first and second moment estimates, respectively. We initialized the learning rate of our AGDNet as
and employed the cosine annealing decay strategy to gradually decrease it to . We empirically set , , and in Algorithm 1to 48, 48, 1e3 and 1, respectively. During training, we fixed the same number of training epochs to 500 for all experiments. We implemented the model with PyTorch, and set the batch size to 8 for CAVE and HARVARD and 6 for NTIRE 2020.
For a comprehensive quantitative evaluation, we adopted 4 commonlyused quantitative metrics, i.e., Peak SignaltoNoise Ratio (PSNR), Average Structural Similarity Index (ASSIM)
[56], Spectral Angle Mapper (SAM) [66], and Root Mean Squared Error (RMSE), which are respectively defined as:(14) 
where and are the th () spectral bands of and , respectively, computes the mean squared error between the inputs.
(15) 
where [55] computes the SSIM value of a typical spectral band.
(16) 
where and are the spectral signatures of the th () pixels of and , respectively, is norm of a vector, and calculates the inner product of two vectors.
(17) 
where and are the th elements of and , respectively,
In addition, we also added up the number of neural network parameters (# Param) and the number of floating number operations perinference (# FLOPs) of DNNbased methods to compare their efficiency.
Initialization  Incremental gradient  Sharing of  SZMnorm  PSNR  ASSIM  SAM  RMSE  

42.85  0.9950  3.01  0.0116  
42.40  0.9945  3.29  0.0115  
42.90  0.9953  2.98  0.0106  
42.93  0.9956  2.98  0.0111  
41.69  0.9952  3.18  0.0122  
43.02  0.9956  2.97  0.0110  
43.39  0.9953  2.75  0.0101 
VB Comparison with StateoftheArt Methods
We compared AGDNet with 6 methods, including the Bicubic interpolation (BI) over the spectral dimension as a baseline and 5 most recent DNNbased methods, i.e., HSCNND [47], 3D CNN [28], HIRNet [13], AWAN [29], and FMNet [70]. Note that HSCNND and AWAN are the champion models of NTIRE 2018 [3] and NTIRE 2020 [28] challenge on spectral reconstruction from an RGB image, respectively. For fair comparisons, we applied the same data preprocessing to all the methods, trained all the DNNbased methods with the same training data by using the released codes with suggested parameters, and adopted the same protocol as [9, 54] to evaluate the experimental results of all the methods.
Tables II, III and IV list quantitative comparisons of different methods on the three benchmark datasets, where it can be seen that AGDNet consistently surpasses all the compared methods in terms of all the four metrics, while consuming much fewer network parameters and FLOPs. Especially, AGDNet improves PSNR by 1.27 dB (rep. 1.4 dB) and reduces SAM by (resp. ) on the CAVE (resp. NTIRE 2020) dataset, while saving more than 67 (resp. 32) parameters and 32 (resp. 16) FLOPs, as compared with the secondbest method.
Figs. 2 and 3 visually compare different methods by showing their pseudocolor images and spectral curves, which still validate the significant superiority of our AGDNet. Particularly, the compared methods cannot well handle regions either with highfrequency details (e.g., the branches in the image of Fig. 2, the flower patterns in the image of Fig. 2, seeds of strawberries in the image of Fig. 3) or smooth textures (e.g., the wall in the image of Fig. 2, and the strawberries in image of Fig. 3 ). By contrast, our AGDNet produces much better results in these regions. Besides, the spectral curves by our method are closer to the groundtruth ones, e.g., the range of 600720nm in the image of Fig. 2 , and the range of 500720nm in image of Fig. 3. Such advantages of AGDNet are credited to that AGDNet, built on an explicit observation model, is able to easily distinguish the highfrequency and lowfrequency regions, and reconstruct them separately according to the projection errors.
VC Ablation Study
We conducted extensive ablation studies to have a comprehensive understanding of AGDNet.
First, we experimentally validated the effectiveness of the initialization module, the learning of the incremental gradient, the manner of sharing projection coefficients , the SZMnorm operation, and the loss function. As listed in Table V, we can see that compared with the complete model, the reconstruction quality decreases after removing any one of these modules/operations, convincingly validating their effectiveness. Particularly, as listed in the row, the PSNR drops about 1 dB without learning the incremental gradient, which demonstrates the rationality of our formulation of the amended gradient descent. In addition, we observe that the selfsupervised loss makes significant contributions to the reconstruction process. The reason it that such a loss not only regularizes output HS images but also forces the network to regress the SRF for the correct calculation of the error maps in each module.
We also investigated how the number of stages affects the reconstruction performance. Note that the initialization module is also considered as one stage. As shown in the Fig. 5, we can see that the performance of AGDNet in terms of all the four metrics gradually improves with the number of stages increasing and gets saturated at 6 stages. Thus, in all experiments, we set to 5, 6, and 12 stages for HARVARD, CAVE, and NTIRE datasets, respectively.
VD Evaluation of the FAGDNet
We used spectral response functions (SRFs) of 15 different cameras [25] to construct the training set, i.e., Canon1DMarkIII, Canon5DMarkII, NikonD300s, NikonD50, NokiaN900, Canon40D, Canon600D, NikonD3X, NikonD80, PhaseOne, Canon 500D, HasselbladH2, NikonD40, NikonD90, and PointGreyGrasshopper214S5C. We generated the testing RGB images using SRFs of cameras Canon20D, Canon50D, NikonD200, NikonD5100, PentaxQ, Canon300D, Canon60D, NikonD3, and NikonD700. The first 30 HS images from the HARVARD dataset were used for training, and the remaining 20 ones for testing. As the compared methods cannot utilize an SRF in an explicit manner, we projected the 30 HS images with respect to the 15 training SRFs to generate 450 pairs of HS and RGB images to train them. Note only a single network was trained for each method.
Fig. 4 shows the quantitative comparison of different methods, where it can be seen that our FAGDNet consistently exceeds the other methods to a significant extent on all 9 SRFs, e.g., the improvement of PSNR achieves 5.5 dB, and the reduction of SAM achieves about 3 on Canon300D, validating the strong flexibility or generalization ability of FAGDNet to different SRFs, which is credited to the interpretable network architecture.
Vi Conclusion
We have presented AGDNet, a novel endtoend learning framework for the reconstruction of HS images from single RGB images. As a neural network built upon an explicit formulation of using the gradient descent algorithm, AGDNet is interpretable and compact. In addition to the blind reconstruction, i.e., SRFs are unknown, AGDNet is also adapted to nonblind reconstruction by explicitly utilizing known SRFs, distinguishing itself from the deep learning peers in flexibility: trained once a single network of AGDNet is able to well handle input RGB images obtained via different SRFs. We demonstrated the significant advantages of AGDNet over stateoftheart methods by conducting extensive experiments as well as comprehensive ablation studies. That is, AGDNet improves PSNR up to 5.5 dB and reduces SAM up to 3 while saving up to 67 parameters and 32 FLOPs. We believe our new perspective will bring insights to other inverse problems in image processing, such as image superresolution, deblurring, and compressive sensing.
References

[1]
(2017)
In defense of shallow learned spectral reconstruction from rgb images.
In
Proceedings of the IEEE International Conference on Computer Vision Workshops
, pp. 471–479. Cited by: §IIA.  [2] (2017) Adversarial networks for spatial contextaware spectral image reconstruction from rgb. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 480–490. Cited by: §IIB.

[3]
(2018)
NTIRE 2018 challenge on spectral reconstruction from rgb images.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
, Vol. , pp. 1042–104209. Cited by: §VB.  [4] (2016) Sparse recovery of hyperspectral signal from natural rgb images. In Proceedings of the European Conference on Computer Vision, pp. 19–34. Cited by: §IIA.
 [5] (2020) Ntire 2020 challenge on spectral reconstruction from an rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 446–447. Cited by: §I, §VA.
 [6] (2011) Statistics of realworld hyperspectral images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 193–200. Cited by: §VA.

[7]
(2020)
Weighted lowrank tensor recovery for hyperspectral image restoration
. IEEE Transactions on Cybernetics. Cited by: §IVC.  [8] (2019) Deep coupled ista network for multimodal image superresolution. IEEE Transactions on Image Processing 29, pp. 1683–1698. Cited by: §IIC.
 [9] (2019) Learning a low tensortrain rank representation for hyperspectral image superresolution. IEEE Transactions on Neural Networks and Learning Systems 30 (9), pp. 2672–2683. Cited by: §IVC, §VB.
 [10] (2019) Hyperspectral image superresolution via subspacebased low tensor multirank regularization. IEEE Transactions on Image Processing 28 (10), pp. 5135–5146. Cited by: §IVC.
 [11] (2016) Hyperspectral image superresolution via nonnegative structured sparse representation. IEEE Transactions on Image Processing 25 (5), pp. 2337–2352. Cited by: §I.
 [12] (2018) Snapshot multiplexed imaging based on compressive sensing. In Pacific Rim Conference on Multimedia, pp. 465–475. Cited by: §I.
 [13] (2018) Joint camera spectral sensitivity selection and hyperspectral image recovery. In Proceedings of the European Conference on Computer Vision, pp. 788–804. Cited by: §IIB, TABLE II, TABLE III, TABLE IV, §VB.
 [14] (2018) Spectral reflectance recovery from a single rgb image. IEEE Transactions on Computational Imaging 4 (3), pp. 382–394. Cited by: §IIB.
 [15] (2017) Learned spectral superresolution. arXiv preprint arXiv:1703.09470. Cited by: §I, §IIB.
 [16] (2020) Spectral superresolution of multispectral imagery with joint sparse and lowrank learning. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §IIA.
 [17] (2019) Spectral superresolution with optimized bands. Remote Sensing 11 (14), pp. 1648. Cited by: §IIB.
 [18] (2013) Matrix computations, 4th. Johns Hopkins. Cited by: 3.

[19]
(2010)
Learning fast approximations of sparse coding.
In
Proceedings of International Conference on Machine Learning
, pp. 399–406. Cited by: §IIC.  [20] (2016) Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §IVB2.
 [21] (2021) Fast hyperspectral image recovery via noniterative fusion of dualcamera compressive hyperspectral imaging. IEEE Transactions on Image Processing. Cited by: §I.
 [22] (2018) Spectral reflectance estimation using gaussian processes and combination kernels. IEEE Transactions on Image Processing 27 (7), pp. 3358–3373. Cited by: §IIA.
 [23] (2017) Hyperspectral image superresolution by spectral difference learning and spatial error correction. IEEE Geoscience and Remote Sensing Letters 14 (10), pp. 1825–1829. Cited by: §I.
 [24] (2017) From rgb to spectrum for natural scenes via manifoldbased mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4705–4713. Cited by: §IIA.
 [25] (2013) What is the space of spectral sensitivity functions for digital color cameras?. In IEEE Workshop on Applications of Computer Vision, pp. 168–179. Cited by: §VD.
 [26] (2019) Towards spectral estimation from a single rgb image in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 3546–3555. Cited by: §IIB, TABLE II, TABLE III, TABLE IV.
 [27] (2015) Adam: A method for stochastic optimization. In Proceedings of 3rd International Conference on Learning Representations (ICLR), Cited by: §VA.
 [28] (2018) 2d3d cnn based architectures for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 844–851. Cited by: TABLE II, TABLE III, TABLE IV, §VB.
 [29] (2020) Adaptive weighted attention network with camera spectral sensitivity prior for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 462–463. Cited by: §I, §IIB, TABLE II, TABLE III, TABLE IV, §VB.
 [30] (2020) Hybrid 2d3d deep residual attentional network with structure tensor constraints for spectral superresolution of rgb images. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
 [31] (2021) Exploring the relationship between 2d/3d convolution for hyperspectral image superresolution. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
 [32] (2017) Hyperspectral image superresolution using deep convolutional neural network. Neurocomputing 266, pp. 29–41. Cited by: §I.
 [33] (2021) A spectral grouping and attentiondriven residual dense network for hyperspectral image superresolution. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
 [34] (2021) Globallocal balanced lowrank approximation of hyperspectral images for classification. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §IVC.
 [35] (2021) Hyperspectral restoration and fusion with multispectral imagery via lowrank tensorapproximation. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
 [36] (2019) Unrolled projected gradient descent for multispectral image fusion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7725–7729. Cited by: §IIC.
 [37] (2019) Deep tensor admmnet for snapshot compressive imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10223–10232. Cited by: §IIC.
 [38] (2003) Object detection by using” whitening/dewhitening” to transform target signatures in multitemporal hyperspectral and multispectral imagery. IEEE transactions on geoscience and remote sensing 41 (5), pp. 1136–1142. Cited by: §I.
 [39] (2018) Simultaneous spatial and spectral lowrank representation of hyperspectral images for classification. IEEE Transactions on Geoscience and Remote Sensing 56 (5), pp. 2872–2886. Cited by: §IVC.
 [40] (2020) Endtoend low cost compressive spectral imaging with spatialspectral selfattention. In Proceedings of the European Conference on Computer Vision, pp. 187–204. Cited by: §I.
 [41] (2021) Algorithm unrolling: interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine 38 (2), pp. 18–44. Cited by: §IIC.
 [42] (2019) Validating hyperspectral image segmentation. IEEE Geoscience and Remote Sensing Letters 16 (8), pp. 1264–1268. Cited by: §I.
 [43] (2014) Trainingbased spectral reconstruction from a single rgb image. In Proceedings of the European Conference on Computer Vision, pp. 186–201. Cited by: §IIA.
 [44] (2020) Residual pixel attention network for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 486–487. Cited by: §I, §IIB.
 [45] (2018) Unsupervised sparse dirichletnet for hyperspectral image superresolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2511–2520. Cited by: §I.
 [46] (2020) Hyperspectral reconstruction from rgb images for vein visualization. In Proceedings of the 11th ACM Multimedia Systems Conference, pp. 77–87. Cited by: §I.
 [47] (2018) Hscnn+: advanced cnnbased hyperspectral recovery from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 939–947. Cited by: §I, §IIB, §VB.
 [48] (2016) Deep admmnet for compressive sensing mri. In Advances in neural information processing systems, pp. 10–18. Cited by: §IIC.
 [49] (2014) A+: adjusted anchored neighborhood regression for fast superresolution. In Asian Conference on Computer Vision, pp. 111–126. Cited by: §IIA.
 [50] (2019) Hyperspectral image reconstruction using a deep spatialspectral prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8032–8041. Cited by: §IIC.
 [51] (2020) DNU: deep nonlocal unrolling for computational spectral imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1661–1671. Cited by: §I, §IIC.
 [52] (2018) Highspeed hyperspectral video acquisition by combining nyquist and compressive sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (4), pp. 857–870. Cited by: §I.
 [53] (2016) Adaptive nonlocal sparse representation for dualcamera compressive hyperspectral imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (10), pp. 2104–2111. Cited by: §I.
 [54] (2019) Deep blind hyperspectral image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4150–4159. Cited by: §VB.
 [55] (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §VA.
 [56] (2002) A universal image quality index. IEEE Signal Processing Letters 9 (3), pp. 81–84. Cited by: §VA.

[57]
(2020)
Boosting oneshot spectral superresolution using transfer learning
. IEEE Transactions on Computational Imaging 6, pp. 1459–1470. Cited by: 1st item, 2nd item.  [58] (2018) DeepCASD: an endtoend approach for multispectral image superresolution. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6503–6507. Cited by: §I, §IIC.
 [59] (2019) Multispectral and hyperspectral image fusion by ms/hs fusion net. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1585–1594. Cited by: §IIC.
 [60] (2020) Material based object tracking in hyperspectral videos. IEEE Transactions on Image Processing 29, pp. 3719–3733. Cited by: §I.
 [61] (2017) Hscnn: cnnbased hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 518–525. Cited by: §I, §IIB.
 [62] (2019) Nonlocal coupled tensor cp decomposition for hyperspectral and multispectral image fusion. IEEE Transactions on Geoscience and Remote Sensing 58 (1), pp. 348–362. Cited by: §I.
 [63] (2020) Reconstruction of hyperspectral data from rgb images with prior category information. IEEE Transactions on Computational Imaging 6, pp. 1070–1081. Cited by: §IIB.
 [64] (2020) Crossattention in coupled unmixing nets for unsupervised hyperspectral superresolution. In Proceedings of the European Conference on Computer Vision, pp. 208–224. Cited by: §I.
 [65] (2010) Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE Transactions on Image Processing 19 (9), pp. 2241–2253. Cited by: §VA.
 [66] (1992) Discrimination among semiarid landscape endmembers using the spectral angle mapper (sam) algorithm. In Proc. Summaries 3rd Annu. JPL Airborne Geosci. Workshop, Vol. 1, pp. 147–149. Cited by: §VA.
 [67] (2013) Hyperspectral image restoration using lowrank matrix recovery. IEEE Transactions on Geoscience and Remote Sensing 52 (8), pp. 4729–4743. Cited by: §IVC.
 [68] (2012) A superresolution reconstruction algorithm for hyperspectral images. Signal Processing 92 (9), pp. 2082–2096. Cited by: §I.
 [69] (2019) Deep plugandplay superresolution for arbitrary blur kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1671–1681. Cited by: §IIC.

[70]
(2020)
Pixelaware deep functionmixture network for spectral superresolution.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 34, pp. 12821–12828. Cited by: §I, §IIB, TABLE II, TABLE III, TABLE IV, 1st item, 2nd item, §VB.  [71] (2020) Hierarchical regression network for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 422–423. Cited by: §IIB.
 [72] (2021) Deep plugandplay priors for spectral snapshot compressive imaging. Photonics Research 9 (2), pp. B18–B29. Cited by: §I.
 [73] (2021) Hyperspectral image superresolution via deep progressive zerocentric residual learning. IEEE Transactions on Image Processing 30, pp. 1423–1438. Cited by: §IVA.