Deep Amended Gradient Descent for Efficient Spectral Reconstruction from Single RGB Images

08/12/2021 ∙ by Zhiyu Zhu, et al. ∙ City University of Hong Kong Shenzhen University 4

This paper investigates the problem of recovering hyperspectral (HS) images from single RGB images. To tackle such a severely ill-posed problem, we propose a physically-interpretable, compact, efficient, and end-to-end learning-based framework, namely AGD-Net. Precisely, by taking advantage of the imaging process, we first formulate the problem explicitly based on the classic gradient descent algorithm. Then, we design a lightweight neural network with a multi-stage architecture to mimic the formed amended gradient descent process, in which efficient convolution and novel spectral zero-mean normalization are proposed to effectively extract spatial-spectral features for regressing an initialization, a basic gradient, and an incremental gradient. Besides, based on the approximate low-rank property of HS images, we propose a novel rank loss to promote the similarity between the global structures of reconstructed and ground-truth HS images, which is optimized with our singular value weighting strategy during training. Moreover, AGD-Net, a single network after one-time training, is flexible to handle the reconstruction with various spectral response functions. Extensive experiments over three commonly-used benchmark datasets demonstrate that AGD-Net can improve the reconstruction quality by more than 1.0 dB on average while saving 67× parameters and 32× FLOPs, compared with state-of-the-art methods. The code will be publicly available at https://github.com/zbzhzhy/GD-Net.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Owing to the dense sampling in the spectral domain, hyperspectral (HS) images can provide more accurate and faithful measurements towards the real-world scenes/objects than traditional RGB images. Such rich spectral information will be beneficial to various vision-based applications, such as tracking [60], segmentation [42], and detection [38, 46]. However, the acquisition of HS images is costly, which severely limits the wide deployment of HS image-based applications.

Instead of relying on the development of hardware, many computational methods, such as compressive sensing-based [51, 53, 52, 12, 72, 21, 40] , HS and RGB image fusion [45, 11, 64, 35, 62], [58], single RGB image-based [15, 5, 44, 61]

, and spatial super-resolution

[32, 23, 33, 31], [68], have been proposed to acquire HS images in an affordable and convenient manner. Particularly, reconstructing HS images from single RGB images, which does not require specially-designed acquisition hardware, is a promising direction. Owing to the strong ability of learning representations, deep neural network (DNN)-based methods have recently been proposed to address this challenging task [47, 70]. For example, Zhang et al. [70] proposed pixel-aware deep learning framework for spectral upsampling. Li et al. [29, 30] introduced the spectral and spatial attention mechanism into the reconstruction process. See Sec. II

for more details. However, most of existing DNN-based spectral reconstruction methods adopt architectures for general purposes, and neglect the unique characteristics of this task, e.g., the specific relationship between HS and RGB images, which may compromise their performance. Second, the majority of them trained with RGB images acquired via a typical spectral response function (SRF) cannot handle RGB images via a different SRF during inference, which limits their use in practice to some extent. In addition, existing DNN-based methods were usually trained with pixel-wise loss functions, which fail to capture the global structure of HS images, i.e., the relationship among spectral bands.

In this paper, we propose a novel DNN-based framework, which is highlighted with compact, efficient, interpretable, and effective characteristics, for the reconstruction of HS images from single RGB images in an end-to-end fashion. Specifically, based on the specific relationship between RGB and HS images, we first explicitly formulate the problem as amended gradient descent (AGD) progress, which boils down to determining an initialization, a basic gradient, and an incremental gradient. Then, we propose AGD-Net with a multi-stage structure to mimic the AGD process, in which with the initialization learned, the basic and incremental gradients are adaptively and progressively learned at each stage by embedding the spatial-spectral information of input RGB images via memory- and computationally-efficient convolution and novel spectral zero-mean normalization. To exploit the global structure of HS images, we also propose a novel rank loss, which is optimized via a singular value weighting strategy during training. Thanks to the interpretable architecture, we extend AGD-Net to enable a single network after one-time training can handle input RGB images generated with different SRFs. Extensive experimental results demonstrate the significant superiority of AGD-Net over state-of-the-art methods, i.e., AGD-Net reconstructs HS images with much higher quality but at lower memory and computational costs.

The rest of this paper is organized as follows. Sec. II briefly reviews existing methods for HS image reconstruction. Sec. III formulates the problem. Sec. IV presents the proposed framework, followed by extensive experimental results as well as analyses in Sec. V. Finally, Sec. VI concludes this paper.

Ii Related Work

In the following, we briefly review the existing works on the reconstruction of HS images from single RGB images.

Ii-a Traditional Methods

Many traditional methods assume that HS images lie in a low-dimensional subspace and explore the map between RGB images and subspace coordinates. For example, Nguyen et al. [43] leveraged RGB white-balancing to normalize the scene illumination to recover the scene reflectance. Arad et al. [4] proposed a sparse coding-based method, which learns an over-complete dictionary of HS images to describe the novel RGB images. Then Aeschbacher et al. [1] further improved it through a shallow A+-based method [49]. Jia et al. [24] exploited the 3D embedded space where the natural scene spectra reside and learned an accurate non-linear mapping from RGB images to 3D embeddings. Heikkinen et al. [22]estimated the spectral subspace coordinates via a scalar-valued Gaussian process regression with an-isotropic or combination kernels. Gao et al. [16] proposed a joint sparse and low-rank dictionary learning method for the reconstruction of HS images from single RGB images.

Ii-B DNN-based Methods

On the basis of the impressive representation ability of DNNs, many DNN-based methods have been proposed to reconstruct HS images from single RGB images. For example, Xiong et al. [61]

proposed a DNN-based method, namely HSCNN, for the reconstruction of HS images from RGB images or measurements obtained via compressive sensing, which mainly aims to enhance the spectral signatures constructed by a simple interpolation or CS reconstruction. Shi

et al. [47]

further improved HSCNN by replacing all predefined upsampling operators with residual blocks and introduced dense connections with a cross-scale fusion scheme to facilitate the feature extraction process. Gewali

et al. [17] utilized DNNs to optimize multispectral bands and hyperspectral recovery simultaneously to achieve more accurate HS image reconstruction. Fu et al. [14] modeled HS image reconstruction by exploring non-negative structured information and utilized multiple spare dictionary to learn a more compact basis representation. Berk et al. [26] trained multiple models to reconstruct HS images from RGB images captured with different SRFs, and they also trained an additional model to select different models during real-world applications. Li et al. [29] also proposed an attention-based method which utilizes both channel attention and spatial non-local attention. Based on the assumption that pixels in an HS image belong to different categories or spatial positions and often require distinct mapping functions, Zhang et al. [70] proposed a pixel-aware deep function-mixture network, which learns different bias functions and then linearly mixes them up according to pixel-level weights. Aitor et al. [2]

treated HS image reconstruction as an image to image mapping problem and applied a generative adversarial network to capture spatial semantics. Yan

et al. [63] introduced prior category information to generate distinct spectral data of objects via a U-Net-based architecture. Zhao et al. [71] presented a hierarchical regression network with a pixel shuffle layer. Fu et al. [13] developed an SRF selection layer to retrieve the optimal response function for HS image reconstruction. Peng et al. [44] introduced a pixel-wise attention module for boosting reconstruction performance. Galliani et al. [15] utilized a densely connected U-Net-based architecture for HS images reconstruction. However, the performance of the above-mentioned methods is still limited, due to insufficient modeling towards the problem. Besides, although these methods attempt to build reconstruction processes with physical meaning, the adopted architectures for general purposes seriously restrict their interpretability.

Ii-C Algorithm Unrolling-based Methods

As our deep learning-based framework is driven by model-based optimization, we also briefly review some related works under this stream. Since Gregor and LeCun [19] developed a sparse coding-based algorithm unrolling technique, a number of unrolling iterative algorithms with DNNs have been proposed for various image reconstruction, such as single RGB image super-resolution [69, 8], compressive sensing [48, 37], and image fusion [59]. Generally, this kind of methods solves inverse problems via unfolding optimization steps and applying DNNs to solve them in a data-driven manner. The main differences among those methods lie in the formulation of an inverse problem as well as adopted optimization algorithms, which will result in various network architectures. For example, Lohit et al. [36] unrolled a projected gradient descent algorithm for HS image pan-sharpening. Wen et al. [58] utilized a deep coupled analysis and synthesis dictionary-based network for HS image super-resolution. Wang et al. [51, 50] unfolded a half quadratic splitting algorithm using DNNs for coded aperture snapshot spectral imaging. We refer readers to [41] for the comprehensive survey on algorithm unrolling.

Fig. 1: Illustration of the flowchart of the proposed AGD-Net, a compact, interpretable, and end-to-end neural network with a multi-stage architecture, for the reconstruction of HS images from single RGB images. AGD-Net mimics an amended gradient descent process to solve the formed ill-posed inverse problem. Each stage consists of two modules, namely learning initialization and learning amended gradient.

Iii Problem Formulation

Denote by the vectorial representation of an RGB image of spatial dimensions , and by the corresponding HS image with () spectral bands to be reconstructed. The relationship between and can be generally formulated as

(1)

where is the spectral response function (SRF), and

is the noise. Simply, under the assumption that the noise is normally distributed, we can recover

from by optimizing the following problem formulated from Eq. (1):

(2)

where is Frobenius norm of a matrix. Moreover, with an initial guess , we can solve Eq. (2) with the classic gradient descent (GD) algorithm, and at the -th () step, we have

(3)

where is the step size and is the operator of computing the derivative of , i.e.,

(4)

Unfortunately, it is almost impossible to obtain a feasible solution by means of such a simple optimization process, due to the severely ill-posed behavior of the problem in Eq. (2), i.e., there are numerous trivial solutions. In addition, the performance of such a scheme highly depends on the initialization. From the perspective of gradient space, the reason could be interpreted as that the gradient cannot decrease either along with an optimal path or from an appropriate starting point to the global minimum or a good local minimum during the iteration process. Therefore, to make the gradient process effective, an intuitive thought is that we can find an appropriate initialization and amend the gradient at each step of the iteration process That is, instead of Eq. (4), we can generally express the gradient at the -th step as

(5)

where is the amended gradient, and is the incremental gradient. Accordingly, we obtain the amended gradient descent process as

(6)

Iv Proposed Method

Motivated by the intuitive and explicit formulation in Sec. III, as illustrated in Fig. 1, we propose a novel end-to-end and lightweight DNN-based framework, namely AGD-Net, which mimics the amended gradient descent process, to achieve the reconstruction of HS images from single RGB images. To be specific, with the initialization learned, we progressively learn the basic gradient and the incremental gradient via a multi-stage architecture, in which the spatial-spectral information of the input RGB image is effectively and efficiently embedded. Besides, we propose a global structure-aware loss function to train AGD-Net end-to-end. In what follows, we detail each module.

Iv-a Learning Initialization

This module aims to learn an appropriate initialization

as the starting point of the gradient descent process. We adopt a densely-connected convolutional neural network (CNN) to extract spatial-spectral information of

to regress . Specifically, to learn feature representations efficiently and effectively, we adopt a series of memory- and computational-efficient spectral-spatial separable convolution [73]

, which applies two kinds of sequentially connected convolution, namely 1D spectral convolution and 2D spatial convolution, with an in-between activation function. Specifically, the former applies kernels of size 1

1 in 1D spectral/channel space for embedding spectral information, while the latter applies kernels of size

in the 2D spatial space for embedding spatial information. Moreover, to emphasize high-frequency spectral information and regularize the intermediate feature away from overfitting, we propose spectral zero-mean normalization (SZM-norm), which enforces the vector formed by the features from different channels but at an identical spatial position to have a zero-mean, i.e.,

(7)

where denotes SZM-norm, is the -th element of the feature map of the -th channel. We will experimentally validate the effectiveness of this initialization module and the SZM-norm in the following Table V.

Iv-B Learning the Amended Gradient

In this module, we aim to learn an amended gradient, which is the sum of a basic gradient and an incremental gradient.

Iv-B1 Basic gradient

As formulated in Eq. (4), the SRF and its transpose in Eq. (4) actually act as the linear projection in pixel-wise, and we thus simulate with a convolutional layer denoted as , and with a corresponding deconvolutional layer denoted as for the back projection, where and are the sets of parameters to be learned. Accordingly, the scaled basic gradient***The scaled basic gradient refers to the product of the step size and is derived as

(8)

where is the intermediate HS image reconstructed at the -th stage. Note that these two convolutional layers are not followed by an activation function in order to preserve the linear property of these transformations.

In addition, considering that the linear projection layers in all stages have the same purpose, i.e., adaptively learning the SRF, and we only explicitly supervise during training, we apply shared parameters to these layers, i.e., , to guarantee the error can be correctly calculated at all stages. We experimentally validate the effectiveness of such a weight sharing strategy in Table V.

Iv-B2 Incremental gradient

Considering that both the basic gradient and the incremental gradient are distributed in gradient space, we directly learn the incremental gradient from by using a sub-network denoted as , i.e.,

(9)

where is the set of parameters at the -th stage to be learned. For simplicity, we adopt the same network architecture as that in Sec. IV-A but different parameters to realize , whose architecture details are summarized in Table I.

According to Eq. (5), we can derive the amended gradient at the -th stage as

(10)

It can be seen that Eq. (10) has the same form as residual learning [20], and thus the advantages of residual learning will be inherited. Note that we remove all the bias of the convolutional layers in . The reason is that the error also measures the differences between reconstructed and ground-truth HS images, and when it reaches zero, the optimization process has found an appropriate reconstructed HS image with respect to Eq. (1). Then, the updating of the HS image should be terminated, requiring the amended gradient be zero, which is equivalent to that the sub-network must pass through origin:

(11)

where is a matrix with all elements equal to zero.

Kernel shape # Input Channels # Output Channels Output shape ReLU SZM-norm
The -th Spectral-spatial separable convolutional layer
 Spectral convolution 62 6211 62 62 12812862
 Spatial convolution 62133 62 62 12812862
Spectral-spatial separable convolution (without activation)
 Spectral convolution 3103111 310 31 12812831
 Spatial convolution 31133 31 31 12812831
TABLE I: The architecture details of .

Iv-C Global Structure-aware Loss Function

To train the AGD-Net, basically, we adopt the following pixel-wise loss function, i.e.,

(12)

where is the norm of a matrix, which computes the sum of the absolute values of all elements of a matrix, and are the reconstructed and ground-truth HS images, respectively, is the penalty parameter, which is empirically set to 1, and is the convolutional layer projecting an HS image to the RGB image space. Many previous works have experimentally demonstrated that the formed matrix from an HS image is an approximate low-rank matrix [67, 9, 10, 7, 39, 34], i.e., the strong correlation among spectral bands. However, such a global structure of HS images cannot be captured by the pixel-wise loss in Eq. (12). To this end, we propose a rank loss . Specifically, we adopt a singular value weighting strategy to enforce the singular values of reconstructed HS images to be exactly the same as those of the ground-truth HS images in a certain range , based on the following two considerations:

  • relatively larger singular values correspond to more principal components (or low-frequency components of an image). However, for image reconstruction, the challenging issue lies in the recovery of high-frequency components, e.g., sharp details. Thus, we set an upper bound to promote the ability of the network in the learning of those details; and

  • the accuracy of eigenvectors corresponding to relatively small eigenvalues decreases. Thus, we set a lower bound

    to avoid utilizing the inaccurate eigenvectors.

Algorithms 1 and 2 provide the forward and backward propagation of optimizing the rank loss during training, respectively.

The overall loss function for training AGD-Net is finally written as

(13)

where the parameter is set to 1 to balance the two terms.

0:  , , , , , and ()
0:  
1:  Partition and into patches of spatial dimensions , denoted as and (), respectively.
2:  for  do
3:      and , where , , , and

performs the singular value decomposition

[18]. Note , , , and will be saved for the reuse in the backward propagation in Algorithm 2.
4:     Initialize .
5:     for  do
6:        if  &  then
7:           , where is the -th diagonal entry of , and is the -th diagonal entry of . Note will be saved for the reuse in the backward propagation.
8:        else
9:           .
10:        end if
11:     end for
12:  end for
13:  .
Algorithm 1 Forward Propagation
0:  , , , , and
0:  Gradient
1:  for  do
2:     , where is the Hadamard product operator, and is the inverse matrix of .
3:     
4:  end for
Algorithm 2 Backward Propagation

Iv-D Flexible AGD-Net

In this section, we further extend AGD-Net for increasing its practicality and propose flexible AGD-Net (FAGD-Net), which is a single network that can handle data captured with various SRFs after only one-time training. Such an extension is enabled thanks to the interpretable architecture of AGD-Net.

Specifically, to adapt various SRFs, we replace the learnable parameters of the linear projection layers in AGD-Net, i.e., involved  , with explicit SRFs specified by the data, while keeping the remaining settings unchanged. We train FAGD-Net with RGB images acquired with various SRFs to augment its generalization ability. We carry out experiments to validate the effectiveness of FAGD-Net in Sec. V-D.

Methods # Params # FLOPs PSNR ASSIM SAM RMSE
BI 23.71 0.6945 42.54 0.0835
HSCNN-D [26] 3.61 M 5.22 T 40.55 0.9836 5.59 0.0110
HIR-Net [13] 2.10 M 2.94 T 39.80 0.9861 5.70 0.0397
3D-CNN [28] 0.78 M 8.32 T 42.25 0.9872 5.24 0.0093
FM-Net [70] 11.79 M 17.07 T 41.34 0.9881 6.09 0.0101
AWAN [29] 17.45 M 24.63 T 43.35 0.9919 4.93 0.0089
Ours 0.22 M 0.76 T 43.97 0.9922 4.82 0.0077
TABLE II: Quantitative comparisons of different methods on the HARVARD dataset. “ (resp. )” indicates that the larger (resp. smaller), the better. For # Params and # FLOPs, the smaller, the more compact and efficient. The best results are highlighted in bold.
Methods # Params # FLOPs PSNR ASSIM SAM RMSE
BI 23.73 0.8278 33.81 0.0877
HSCNN-D [26] 3.61 M 0.95 T 35.63 0.9733 9.63 0.0194
HIR-Net [13] 2.10 M 0.53 T 33.97 0.9456 9.40 0.0263
3D-CNN [28] 0.78 M 1.53 T 35.98 0.9739 8.89 0.0182
FM-Net [70] 11.47 M 3.09 T 36.84 0.9644 8.54 0.0179
AWAN [29] 17.45 M 4.57 T 38.41 0.9904 8.08 0.0170
Ours 0.26 M 0.14 T 39.68 0.9894 6.60 0.0138
TABLE III: Quantitative comparisons of different methods on the CAVE dataset. “ (resp. )” indicates that the larger (resp. smaller), the better. For # Params and # FLOPs, the smaller, the more compact and efficient. The best results are highlighted in bold.
Methods # Params # FLOPs PSNR ASSIM SAM RMSE
BI 30.85 0.9075 8.48 0.0394
HSCNN-D [26] 3.61 M 0.890 T 41.42 0.9946 3.17 0.0120
HIR-Net [13] 2.01 M 0.532 T 35.26 0.9862 4.27 0.0190
3D-CNN [28] 0.78 M 1.440 T 40.81 0.9938 3.12 0.0124
FM-Net [70] 11.47 M 2.955 T 42.36 0.9950 3.10 0.0118
AWAN [29] 17.45 M 4.300 T 41.99 0.9948 3.22 0.0112
Ours 0.51 M 0.258 T 43.39 0.9953 2.75 0.0101
TABLE IV: Quantitative comparisons of different methods on the NTIRE 2020 dataset. “ (resp. )” indicates that the larger (resp. smaller), the better. For # Params and # FLOPs, the smaller, the more compact and efficient. The best results are highlighted in bold.

V Experiments

V-a Experiment Settings and Implementation Details

We used 3 widely-used benchmark datasets i.e., HARVARDhttp://vision.seas.harvard.edu/hyperspec/ [6], CAVEhttp://www.cs.columbia.edu/CAVE/databases/ [65], and NTIRE 2020§§§http://www.vision.ee.ethz.ch/ntire20/ [5]:

  • The CAVE dataset consists of 32 HS images of spatial dimensions 512 512 and spectral bands 31 captured by a generalized assorted pixel camera with an interval wavelength of 10nm in the range of 400-700nm. We randomly selected 20 HS images as the training set and the remaining 12 as the testing set. Following [57], [70], we generated input RGB images using the camera spectral response function of Nikon D700.

  • The HARVARD dataset contains 50 indoor and outdoor HS images of spatial dimensions and spectral bands 31 covering 420-720 nm, which were captured under the daylight illumination. We utilized the first 30 HS images as the training set and the remaining 20 ones as the testing set. Following [57], [70], we generated input RGB images using the camera spectral response function of Nikon D700.

  • The NTIRE 2020 dataset contains 450 HS/RGB image pairs from training, 10 pairs for validation, and 20 pairs for test. The HS images have 31 spectral bands covering 400-700nm. As the ground-truth images of the test set are unavailable, we adopted the validation set for evaluation.

We adopted the ADAM [27] optimizer with the exponential decay rates and

for the first and second moment estimates, respectively. We initialized the learning rate of our AGD-Net as

and employed the cosine annealing decay strategy to gradually decrease it to . We empirically set , , and in Algorithm 1

to 48, 48, 1e-3 and 1, respectively. During training, we fixed the same number of training epochs to 500 for all experiments. We implemented the model with PyTorch, and set the batch size to 8 for CAVE and HARVARD and 6 for NTIRE 2020.

For a comprehensive quantitative evaluation, we adopted 4 commonly-used quantitative metrics, i.e., Peak Signal-to-Noise Ratio (PSNR), Average Structural Similarity Index (ASSIM)

[56], Spectral Angle Mapper (SAM) [66], and Root Mean Squared Error (RMSE), which are respectively defined as:

(14)

where and are the -th () spectral bands of and , respectively, computes the mean squared error between the inputs.

(15)

where [55] computes the SSIM value of a typical spectral band.

(16)

where and are the spectral signatures of the -th () pixels of and , respectively, is norm of a vector, and calculates the inner product of two vectors.

(17)

where and are the -th elements of and , respectively,

In addition, we also added up the number of neural network parameters (# Param) and the number of floating number operations per-inference (# FLOPs) of DNN-based methods to compare their efficiency.

Fig. 2: Visual comparison of 3 HS images from the HARVARD dataset reconstructed by 7 different methods and the ground-truth images. To visualize HS images, we extracted the , , and spectral bands from an HS image as the red, green, and blue channels, respectively, to generate a pseudo-color image. (a) Ground-truth HS images, (b) Bicubic interpolation, (c) 3D-CNN, (d) HSCNN, (e) HIR-Net, (f) FM-Net, (g) AWAN, (h) Ours. For each subfigure, the bottom-left is the zoomed-in patch indicated by the green frame in the pseudo-color image, and the bottom-right is the spectral curves of a typical pixel (red line) and its ground-truth (green line) with the position of the selected pixel marked by the red rectangle in the pseudo-color image.
Fig. 3: Visual comparison of 3 HS images reconstructed by 7 different methods and the ground-truth HS images. To visualize HS images, we extracted the , , and spectral bands from an HS image as the red, green, and blue channels, respectively, to generate a pseudo-color image. (a) Ground-truth HS images, (b) Bicubic interpolation, (c) 3D-CNN, (d) HSCNN, (e) HIR-Net, (f) FM-Net, (g) AWAN, (h) Ours. For each subfigure, the bottom-left is the zoomed-in patch indicated by the green frame in the pseudo-color image, the bottom-right is the spectral curves of a typical pixel (red line) and its ground-truth (green line) with the position of the selected pixel marked by the red rectangle in the pseudo-color image. The top 2 and bottom 1 testing images from the CAVE and NTIRE 2020 datasets, respectively.
Initialization Incremental gradient Sharing of SZM-norm PSNR ASSIM SAM RMSE
42.85 0.9950 3.01 0.0116
42.40 0.9945 3.29 0.0115
42.90 0.9953 2.98 0.0106
42.93 0.9956 2.98 0.0111
41.69 0.9952 3.18 0.0122
43.02 0.9956 2.97 0.0110
43.39 0.9953 2.75 0.0101
TABLE V: Results of ablation studies on the NTIRE 2020 dataset. indicates that the corresponding component was removed when training AGD-Net. The bottom row corresponds to the complete AGD-Net. row: we utilized a linear convolutional layer of kernel size 11 to replace the learned initialization module to learn the mapping from to ; row: we removed all sub-modules ; row: we learned the parameters of the convolutional layers involved in different steps independently; row: we removed all SZM-norm layers; row: we removed the loss term during training; row: we removed the loss term during training; row: the full model.
(a) PSNR ()
(b) ASSIM ()
(c) SAM ()
(d) RMSE ()
Fig. 4: Quantitative comparison of different methods tested on RGB images obtained with 9 different SRFs listed in the horizontal axis to evaluate their flexibility. For each method, a single network was trained with RGB images obtained with 15 different SRFs. The four subfigures share the same legend shown in (d) (resp. ) indicates the lower (resp. the higher), the better.

V-B Comparison with State-of-the-Art Methods

We compared AGD-Net with 6 methods, including the Bicubic interpolation (BI) over the spectral dimension as a baseline and 5 most recent DNN-based methods, i.e., HSCNN-D [47], 3-D CNN [28], HIR-Net [13], AWAN [29], and FM-Net [70]. Note that HSCNN-D and AWAN are the champion models of NTIRE 2018 [3] and NTIRE 2020 [28] challenge on spectral reconstruction from an RGB image, respectively. For fair comparisons, we applied the same data pre-processing to all the methods, trained all the DNN-based methods with the same training data by using the released codes with suggested parameters, and adopted the same protocol as [9, 54] to evaluate the experimental results of all the methods.

Tables II, III and IV list quantitative comparisons of different methods on the three benchmark datasets, where it can be seen that AGD-Net consistently surpasses all the compared methods in terms of all the four metrics, while consuming much fewer network parameters and FLOPs. Especially, AGD-Net improves PSNR by 1.27 dB (rep. 1.4 dB) and reduces SAM by (resp. ) on the CAVE (resp. NTIRE 2020) dataset, while saving more than 67 (resp. 32) parameters and 32 (resp. 16) FLOPs, as compared with the second-best method.

Figs. 2 and 3 visually compare different methods by showing their pseudo-color images and spectral curves, which still validate the significant superiority of our AGD-Net. Particularly, the compared methods cannot well handle regions either with high-frequency details (e.g., the branches in the image of Fig. 2, the flower patterns in the image of Fig. 2, seeds of strawberries in the image of Fig. 3) or smooth textures (e.g., the wall in the image of Fig. 2, and the strawberries in image of Fig. 3 ). By contrast, our AGD-Net produces much better results in these regions. Besides, the spectral curves by our method are closer to the ground-truth ones, e.g., the range of 600-720nm in the image of Fig. 2 , and the range of 500-720nm in image of Fig. 3. Such advantages of AGD-Net are credited to that AGD-Net, built on an explicit observation model, is able to easily distinguish the high-frequency and low-frequency regions, and reconstruct them separately according to the projection errors.

V-C Ablation Study

We conducted extensive ablation studies to have a comprehensive understanding of AGD-Net.

First, we experimentally validated the effectiveness of the initialization module, the learning of the incremental gradient, the manner of sharing projection coefficients , the SZM-norm operation, and the loss function. As listed in Table V, we can see that compared with the complete model, the reconstruction quality decreases after removing any one of these modules/operations, convincingly validating their effectiveness. Particularly, as listed in the row, the PSNR drops about 1 dB without learning the incremental gradient, which demonstrates the rationality of our formulation of the amended gradient descent. In addition, we observe that the self-supervised loss makes significant contributions to the reconstruction process. The reason it that such a loss not only regularizes output HS images but also forces the network to regress the SRF for the correct calculation of the error maps in each module.

We also investigated how the number of stages affects the reconstruction performance. Note that the initialization module is also considered as one stage. As shown in the Fig. 5, we can see that the performance of AGD-Net in terms of all the four metrics gradually improves with the number of stages increasing and gets saturated at 6 stages. Thus, in all experiments, we set to 5, 6, and 12 stages for HARVARD, CAVE, and NTIRE datasets, respectively.

Fig. 5: Investigation on the performance of AGD-Net with different number of stages on the CAVE dataset.

V-D Evaluation of the FAGD-Net

We used spectral response functions (SRFs) of 15 different cameras [25] to construct the training set, i.e., Canon1DMarkIII, Canon5DMarkII, NikonD300s, NikonD50, NokiaN900, Canon40D, Canon600D, NikonD3X, NikonD80, PhaseOne, Canon 500D, HasselbladH2, NikonD40, NikonD90, and PointGreyGrasshopper214S5C. We generated the testing RGB images using SRFs of cameras Canon20D, Canon50D, NikonD200, NikonD5100, PentaxQ, Canon300D, Canon60D, NikonD3, and NikonD700. The first 30 HS images from the HARVARD dataset were used for training, and the remaining 20 ones for testing. As the compared methods cannot utilize an SRF in an explicit manner, we projected the 30 HS images with respect to the 15 training SRFs to generate 450 pairs of HS and RGB images to train them. Note only a single network was trained for each method.

Fig. 4 shows the quantitative comparison of different methods, where it can be seen that our FAGD-Net consistently exceeds the other methods to a significant extent on all 9 SRFs, e.g., the improvement of PSNR achieves 5.5 dB, and the reduction of SAM achieves about 3 on Canon300D, validating the strong flexibility or generalization ability of FAGD-Net to different SRFs, which is credited to the interpretable network architecture.

Vi Conclusion

We have presented AGD-Net, a novel end-to-end learning framework for the reconstruction of HS images from single RGB images. As a neural network built upon an explicit formulation of using the gradient descent algorithm, AGD-Net is interpretable and compact. In addition to the blind reconstruction, i.e., SRFs are unknown, AGD-Net is also adapted to non-blind reconstruction by explicitly utilizing known SRFs, distinguishing itself from the deep learning peers in flexibility: trained once a single network of AGD-Net is able to well handle input RGB images obtained via different SRFs. We demonstrated the significant advantages of AGD-Net over state-of-the-art methods by conducting extensive experiments as well as comprehensive ablation studies. That is, AGD-Net improves PSNR up to 5.5 dB and reduces SAM up to 3 while saving up to 67 parameters and 32 FLOPs. We believe our new perspective will bring insights to other inverse problems in image processing, such as image super-resolution, deblurring, and compressive sensing.

References

  • [1] J. Aeschbacher, J. Wu, and R. Timofte (2017) In defense of shallow learned spectral reconstruction from rgb images. In

    Proceedings of the IEEE International Conference on Computer Vision Workshops

    ,
    pp. 471–479. Cited by: §II-A.
  • [2] A. Alvarez-Gila, J. Van De Weijer, and E. Garrote (2017) Adversarial networks for spatial context-aware spectral image reconstruction from rgb. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 480–490. Cited by: §II-B.
  • [3] B. Arad, O. Ben-Shahar, and R. Timofte (2018) NTIRE 2018 challenge on spectral reconstruction from rgb images. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    ,
    Vol. , pp. 1042–104209. Cited by: §V-B.
  • [4] B. Arad and O. Ben-Shahar (2016) Sparse recovery of hyperspectral signal from natural rgb images. In Proceedings of the European Conference on Computer Vision, pp. 19–34. Cited by: §II-A.
  • [5] B. Arad, R. Timofte, O. Ben-Shahar, Y. Lin, and G. D. Finlayson (2020) Ntire 2020 challenge on spectral reconstruction from an rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 446–447. Cited by: §I, §V-A.
  • [6] A. Chakrabarti and T. Zickler (2011) Statistics of real-world hyperspectral images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 193–200. Cited by: §V-A.
  • [7] Y. Chang, L. Yan, X. Zhao, H. Fang, Z. Zhang, and S. Zhong (2020)

    Weighted low-rank tensor recovery for hyperspectral image restoration

    .
    IEEE Transactions on Cybernetics. Cited by: §IV-C.
  • [8] X. Deng and P. L. Dragotti (2019) Deep coupled ista network for multi-modal image super-resolution. IEEE Transactions on Image Processing 29, pp. 1683–1698. Cited by: §II-C.
  • [9] R. Dian, S. Li, and L. Fang (2019) Learning a low tensor-train rank representation for hyperspectral image super-resolution. IEEE Transactions on Neural Networks and Learning Systems 30 (9), pp. 2672–2683. Cited by: §IV-C, §V-B.
  • [10] R. Dian and S. Li (2019) Hyperspectral image super-resolution via subspace-based low tensor multi-rank regularization. IEEE Transactions on Image Processing 28 (10), pp. 5135–5146. Cited by: §IV-C.
  • [11] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, and X. Li (2016) Hyperspectral image super-resolution via non-negative structured sparse representation. IEEE Transactions on Image Processing 25 (5), pp. 2337–2352. Cited by: §I.
  • [12] Y. Fu, C. Sun, L. Wang, and H. Huang (2018) Snapshot multiplexed imaging based on compressive sensing. In Pacific Rim Conference on Multimedia, pp. 465–475. Cited by: §I.
  • [13] Y. Fu, T. Zhang, Y. Zheng, D. Zhang, and H. Huang (2018) Joint camera spectral sensitivity selection and hyperspectral image recovery. In Proceedings of the European Conference on Computer Vision, pp. 788–804. Cited by: §II-B, TABLE II, TABLE III, TABLE IV, §V-B.
  • [14] Y. Fu, Y. Zheng, L. Zhang, and H. Huang (2018) Spectral reflectance recovery from a single rgb image. IEEE Transactions on Computational Imaging 4 (3), pp. 382–394. Cited by: §II-B.
  • [15] S. Galliani, C. Lanaras, D. Marmanis, E. Baltsavias, and K. Schindler (2017) Learned spectral super-resolution. arXiv preprint arXiv:1703.09470. Cited by: §I, §II-B.
  • [16] L. Gao, D. Hong, J. Yao, B. Zhang, P. Gamba, and J. Chanussot (2020) Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §II-A.
  • [17] U. B. Gewali, S. T. Monteiro, and E. Saber (2019) Spectral super-resolution with optimized bands. Remote Sensing 11 (14), pp. 1648. Cited by: §II-B.
  • [18] G. H. Golub and C. F. Van Loan (2013) Matrix computations, 4th. Johns Hopkins. Cited by: 3.
  • [19] K. Gregor and Y. LeCun (2010) Learning fast approximations of sparse coding. In

    Proceedings of International Conference on Machine Learning

    ,
    pp. 399–406. Cited by: §II-C.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §IV-B2.
  • [21] W. He, N. Yokoya, and X. Yuan (2021) Fast hyperspectral image recovery via non-iterative fusion of dual-camera compressive hyperspectral imaging. IEEE Transactions on Image Processing. Cited by: §I.
  • [22] V. Heikkinen (2018) Spectral reflectance estimation using gaussian processes and combination kernels. IEEE Transactions on Image Processing 27 (7), pp. 3358–3373. Cited by: §II-A.
  • [23] J. Hu, Y. Li, and W. Xie (2017) Hyperspectral image super-resolution by spectral difference learning and spatial error correction. IEEE Geoscience and Remote Sensing Letters 14 (10), pp. 1825–1829. Cited by: §I.
  • [24] Y. Jia, Y. Zheng, L. Gu, A. Subpa-Asa, A. Lam, Y. Sato, and I. Sato (2017) From rgb to spectrum for natural scenes via manifold-based mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4705–4713. Cited by: §II-A.
  • [25] J. Jiang, D. Liu, J. Gu, and S. Süsstrunk (2013) What is the space of spectral sensitivity functions for digital color cameras?. In IEEE Workshop on Applications of Computer Vision, pp. 168–179. Cited by: §V-D.
  • [26] B. Kaya, Y. B. Can, and R. Timofte (2019) Towards spectral estimation from a single rgb image in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 3546–3555. Cited by: §II-B, TABLE II, TABLE III, TABLE IV.
  • [27] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In Proceedings of 3rd International Conference on Learning Representations (ICLR), Cited by: §V-A.
  • [28] S. Koundinya, H. Sharma, M. Sharma, A. Upadhyay, R. Manekar, R. Mukhopadhyay, A. Karmakar, and S. Chaudhury (2018) 2d-3d cnn based architectures for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 844–851. Cited by: TABLE II, TABLE III, TABLE IV, §V-B.
  • [29] J. Li, C. Wu, R. Song, Y. Li, and F. Liu (2020) Adaptive weighted attention network with camera spectral sensitivity prior for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 462–463. Cited by: §I, §II-B, TABLE II, TABLE III, TABLE IV, §V-B.
  • [30] J. Li, C. Wu, R. Song, W. Xie, C. Ge, B. Li, and Y. Li (2020) Hybrid 2-d-3-d deep residual attentional network with structure tensor constraints for spectral super-resolution of rgb images. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
  • [31] Q. Li, Q. Wang, and X. Li (2021) Exploring the relationship between 2d/3d convolution for hyperspectral image super-resolution. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
  • [32] Y. Li, J. Hu, X. Zhao, W. Xie, and J. Li (2017) Hyperspectral image super-resolution using deep convolutional neural network. Neurocomputing 266, pp. 29–41. Cited by: §I.
  • [33] D. Liu, J. Li, and Q. Yuan (2021) A spectral grouping and attention-driven residual dense network for hyperspectral image super-resolution. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
  • [34] H. Liu, Y. Jia, J. Hou, and Q. Zhang (2021) Global-local balanced low-rank approximation of hyperspectral images for classification. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §IV-C.
  • [35] N. Liu, L. Li, W. Li, R. Tao, J. E. Fowler, and J. Chanussot (2021) Hyperspectral restoration and fusion with multispectral imagery via low-rank tensor-approximation. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
  • [36] S. Lohit, D. Liu, H. Mansour, and P. T. Boufounos (2019) Unrolled projected gradient descent for multi-spectral image fusion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7725–7729. Cited by: §II-C.
  • [37] J. Ma, X. Liu, Z. Shou, et al. (2019) Deep tensor admm-net for snapshot compressive imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10223–10232. Cited by: §II-C.
  • [38] R. Mayer, F. Bucholtz, and D. Scribner (2003) Object detection by using” whitening/dewhitening” to transform target signatures in multitemporal hyperspectral and multispectral imagery. IEEE transactions on geoscience and remote sensing 41 (5), pp. 1136–1142. Cited by: §I.
  • [39] S. Mei, J. Hou, J. Chen, et al. (2018) Simultaneous spatial and spectral low-rank representation of hyperspectral images for classification. IEEE Transactions on Geoscience and Remote Sensing 56 (5), pp. 2872–2886. Cited by: §IV-C.
  • [40] Z. Meng, J. Ma, and X. Yuan (2020) End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In Proceedings of the European Conference on Computer Vision, pp. 187–204. Cited by: §I.
  • [41] V. Monga, Y. Li, and Y. C. Eldar (2021) Algorithm unrolling: interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine 38 (2), pp. 18–44. Cited by: §II-C.
  • [42] J. Nalepa, M. Myller, and M. Kawulok (2019) Validating hyperspectral image segmentation. IEEE Geoscience and Remote Sensing Letters 16 (8), pp. 1264–1268. Cited by: §I.
  • [43] R. M. Nguyen, D. K. Prasad, and M. S. Brown (2014) Training-based spectral reconstruction from a single rgb image. In Proceedings of the European Conference on Computer Vision, pp. 186–201. Cited by: §II-A.
  • [44] H. Peng, X. Chen, and J. Zhao (2020) Residual pixel attention network for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 486–487. Cited by: §I, §II-B.
  • [45] Y. Qu, H. Qi, and C. Kwan (2018) Unsupervised sparse dirichlet-net for hyperspectral image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2511–2520. Cited by: §I.
  • [46] N. Sharma and M. Hefeeda (2020) Hyperspectral reconstruction from rgb images for vein visualization. In Proceedings of the 11th ACM Multimedia Systems Conference, pp. 77–87. Cited by: §I.
  • [47] Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu (2018) Hscnn+: advanced cnn-based hyperspectral recovery from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 939–947. Cited by: §I, §II-B, §V-B.
  • [48] J. Sun, H. Li, Z. Xu, et al. (2016) Deep admm-net for compressive sensing mri. In Advances in neural information processing systems, pp. 10–18. Cited by: §II-C.
  • [49] R. Timofte, V. De Smet, and L. Van Gool (2014) A+: adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision, pp. 111–126. Cited by: §II-A.
  • [50] L. Wang, C. Sun, Y. Fu, et al. (2019) Hyperspectral image reconstruction using a deep spatial-spectral prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8032–8041. Cited by: §II-C.
  • [51] L. Wang, C. Sun, M. Zhang, Y. Fu, and H. Huang (2020) DNU: deep non-local unrolling for computational spectral imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1661–1671. Cited by: §I, §II-C.
  • [52] L. Wang, Z. Xiong, H. Huang, G. Shi, F. Wu, and W. Zeng (2018) High-speed hyperspectral video acquisition by combining nyquist and compressive sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (4), pp. 857–870. Cited by: §I.
  • [53] L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng (2016) Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (10), pp. 2104–2111. Cited by: §I.
  • [54] W. Wang, W. Zeng, Y. Huang, X. Ding, and J. Paisley (2019) Deep blind hyperspectral image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4150–4159. Cited by: §V-B.
  • [55] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §V-A.
  • [56] Z. Wang and A. C. Bovik (2002) A universal image quality index. IEEE Signal Processing Letters 9 (3), pp. 81–84. Cited by: §V-A.
  • [57] W. Wei, Y. Sun, L. Zhang, J. Nie, and Y. Zhang (2020)

    Boosting one-shot spectral super-resolution using transfer learning

    .
    IEEE Transactions on Computational Imaging 6, pp. 1459–1470. Cited by: 1st item, 2nd item.
  • [58] B. Wen, U. S. Kamilov, D. Liu, H. Mansour, and P. T. Boufounos (2018) DeepCASD: an end-to-end approach for multi-spectral image super-resolution. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6503–6507. Cited by: §I, §II-C.
  • [59] Q. Xie, M. Zhou, Q. Zhao, et al. (2019) Multispectral and hyperspectral image fusion by ms/hs fusion net. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1585–1594. Cited by: §II-C.
  • [60] F. Xiong, J. Zhou, and Y. Qian (2020) Material based object tracking in hyperspectral videos. IEEE Transactions on Image Processing 29, pp. 3719–3733. Cited by: §I.
  • [61] Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu (2017) Hscnn: cnn-based hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 518–525. Cited by: §I, §II-B.
  • [62] Y. Xu, Z. Wu, J. Chanussot, P. Comon, and Z. Wei (2019) Nonlocal coupled tensor cp decomposition for hyperspectral and multispectral image fusion. IEEE Transactions on Geoscience and Remote Sensing 58 (1), pp. 348–362. Cited by: §I.
  • [63] L. Yan, X. Wang, M. Zhao, M. Kaloorazi, J. Chen, and S. Rahardja (2020) Reconstruction of hyperspectral data from rgb images with prior category information. IEEE Transactions on Computational Imaging 6, pp. 1070–1081. Cited by: §II-B.
  • [64] J. Yao, D. Hong, J. Chanussot, D. Meng, X. Zhu, and Z. Xu (2020) Cross-attention in coupled unmixing nets for unsupervised hyperspectral super-resolution. In Proceedings of the European Conference on Computer Vision, pp. 208–224. Cited by: §I.
  • [65] F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar (2010) Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE Transactions on Image Processing 19 (9), pp. 2241–2253. Cited by: §V-A.
  • [66] R. H. Yuhas, A. F. Goetz, and J. W. Boardman (1992) Discrimination among semi-arid landscape endmembers using the spectral angle mapper (sam) algorithm. In Proc. Summaries 3rd Annu. JPL Airborne Geosci. Workshop, Vol. 1, pp. 147–149. Cited by: §V-A.
  • [67] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan (2013) Hyperspectral image restoration using low-rank matrix recovery. IEEE Transactions on Geoscience and Remote Sensing 52 (8), pp. 4729–4743. Cited by: §IV-C.
  • [68] H. Zhang, L. Zhang, and H. Shen (2012) A super-resolution reconstruction algorithm for hyperspectral images. Signal Processing 92 (9), pp. 2082–2096. Cited by: §I.
  • [69] K. Zhang, W. Zuo, and L. Zhang (2019) Deep plug-and-play super-resolution for arbitrary blur kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1671–1681. Cited by: §II-C.
  • [70] L. Zhang, Z. Lang, P. Wang, W. Wei, S. Liao, L. Shao, and Y. Zhang (2020) Pixel-aware deep function-mixture network for spectral super-resolution. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 34, pp. 12821–12828. Cited by: §I, §II-B, TABLE II, TABLE III, TABLE IV, 1st item, 2nd item, §V-B.
  • [71] Y. Zhao, L. Po, Q. Yan, W. Liu, and T. Lin (2020) Hierarchical regression network for spectral reconstruction from rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 422–423. Cited by: §II-B.
  • [72] S. Zheng, Y. Liu, Z. Meng, M. Qiao, Z. Tong, X. Yang, S. Han, and X. Yuan (2021) Deep plug-and-play priors for spectral snapshot compressive imaging. Photonics Research 9 (2), pp. B18–B29. Cited by: §I.
  • [73] Z. Zhu, J. Hou, J. Chen, H. Zeng, and J. Zhou (2021) Hyperspectral image super-resolution via deep progressive zero-centric residual learning. IEEE Transactions on Image Processing 30, pp. 1423–1438. Cited by: §IV-A.