Hierarchical Regression Network for Spectral Reconstruction from RGB Images

05/10/2020
by   Yuzhi Zhao, et al.
City University of Hong Kong
4

Capturing visual image with a hyperspectral camera has been successfully applied to many areas due to its narrow-band imaging technology. Hyperspectral reconstruction from RGB images denotes a reverse process of hyperspectral imaging by discovering an inverse response function. Current works mainly map RGB images directly to corresponding spectrum but do not consider context information explicitly. Moreover, the use of encoder-decoder pair in current algorithms leads to loss of information. To address these problems, we propose a 4-level Hierarchical Regression Network (HRNet) with PixelShuffle layer as inter-level interaction. Furthermore, we adopt a residual dense block to remove artifacts of real world RGB images and a residual global block to build attention mechanism for enlarging perceptive field. We evaluate proposed HRNet with other architectures and techniques by participating in NTIRE 2020 Challenge on Spectral Reconstruction from RGB Images. The HRNet is the winning method of track 2 - real world images and ranks 3rd on track 1 - clean images. Please visit the project web page https://github.com/zhaoyuzhi/Hierarchical-Regression-Network-for-Spectral-Reconstruction-from-RGB-Images to try our codes and pre-trained models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 7

page 8

05/07/2020

NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image

This paper reviews the second challenge on spectral reconstruction from ...
04/15/2020

Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images

Hyperspectral Imaging is the acquisition of spectral and spatial informa...
06/30/2021

Learnable Reconstruction Methods from RGB Images to Hyperspectral Imaging: A Survey

Hyperspectral imaging enables versatile applications due to its competen...
05/19/2020

AdaptiveWeighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images

Recent promising effort for spectral reconstruction (SR) focuses on lear...
08/15/2021

Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB Images in the Wild

This paper investigates the problem of reconstructing hyperspectral (HS)...
07/29/2020

3D Fusion of Infrared Images with Dense RGB Reconstruction from Multiple Views – with Application to Fire-fighting Robots

This project integrates infrared and RGB imagery to produce dense 3D env...
01/15/2018

Hyperspectral recovery from RGB images using Gaussian Processes

Hyperspectral cameras preserve the fine spectral details of scenes that ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hyperspectral (HS) imaging technology refers to the spectral signature is densely sampled to many narrow bands. It combines imaging technology with spectral technology to detect the two-dimensional geometric space and one-dimensional spectral information of the target to obtain continuous, narrow-band images with high spectral resolution. Normally, most of the civil cameras capture only three primary colors. However, HS spectrometers can obtain the spectrum of each pixel in the scene and collect the information into a set of images. To visualize HS images, a response function is adopted to transform HS images into RGB format. Conversely, we can acquire HS images from the visible format by learning the inverse function. In this paper, we propose a general hierarchical regression network (HRNet) for spectral reconstruction from RGB images.

HS imaging technology has many advantages and particular characteristics. There have been many applications based on HS imaging technology, e.g, remote sensing technology [25], pedestrian detection [17, 23], food processing [29], medical imaging [2]. However, in recent years, the development of HS imaging has encountered a bottleneck since it mainly depends on spectrometers. The traditional spectrometers saves images with huge volume and need long operation time, which restricts HS imaging technology applied to portable platforms and high-speed moving scenes [28]. Although researchers have continuously optimized the traditional pipeline [7, 35], these hardware devices are still expensive and of high complexity. Thus, we present a low cost and automate approach only based on RGB cameras. To address the problem, we propose a HRNet that learns the process of RGB images to corresponding HS projections.

In general, spectral reconstruction is an ill-posed problem. Moreover, there is unknown noise in environment leading to degraded RGB images. However, there is dense correspondence between RGB images and HS images, making it possible to exploit the correlation from many RGB-HS pairs. Since the information of RGB image is much less than HS image, there may be many reasonable HS image combinations corresponding to a same RGB image. The algorithm needs to learn a reasonable mapping function that produces high-quality HS images. With the development of deep convolutional neural network (CNN), it is eligible to learn the blind mapping for spectral construction.

The previous methods [32, 21, 33, 6, 36] mainly utilize an auto-encoder structure with residual blocks [14]. The network often performs convolution at low spatial resolution since the features are more compact and the computation is more efficient. However, as the network goes deeper, it fails to remain the original pixel information due to performing down-sampling by convolutions. To address this problem, we introduce a lossless and learnable sampling operator PixelShuffle [31]. To further boost the quality of generated images, we propose a hierarchical architecture that extracts the features of different scales. At each level, the input is obtained by the reverse PixelShuffle (PixelUnShuffle) that no pixel is lost. Moreover, we propose to use residual dense block and residual global block in HRNet for removing artifacts and noise and modelling remote pixel correlation, respectively.

In general, there are three main contributions of this paper:

(1) We propose a HRNet that utilizes PixelUnShuffle and Pixelshuffle layers for downsampling and upsampling without information loss. We also propose residual dense block with residual global block to enlarge perceptive field and boost generation quality;

(2) We propose a 8-setting ensemble strategy to further enhance the generalization of HRNet;

(3) We evaluate proposed HRNet on NTIRE 2020 HS dataset. The HRNet is winning method of track 2 - real world images and ranks 3rd on track 1 - clean images.

2 Related work

Hyperspectral image acquisition. Conventional methods for hyperspectral image acquisition often adopt spectrograph with spatial scanning or spectral scanning technology. There are several types of scanner utilized for capturing images including pushbroom scanner, whiskbroom scanner, and band sequential scanner. They have been widely used to many applications such as detector, environmental monitoring and remote sensor for decades. For instance, pushbroom scanner and whiskbroom scanner are used for photogrammetric and remote sensing by satellite sensors [28, 5]. However, those devices need to capture the spectral information of single points or bands separately, then scan the whole scene to get a fully HS image, which is difficult to capture scenes with moving objects. In addition, they are too large physically and not suitable for portable platforms. In order to address the problems, many kinds of non-scanning spectrometers have been developed to adapt the application of dynamic scenes [10, 7, 35].

Hyperspectral image reconstruction from RGB images.

Since the traditional methods for hyperspectral image acquisition are not portable or time-consuming for many applications, current methods attempt to reconstruct hyperspectral image from RGB image. By learning the mapping from RGB images to hyperspectral images on a big RGB-HS dataset, it is more convenient to obtain many HS images. Recent years have witnessed various studies including sparse coding and deep learning. In 2008, Parmar et al.

[27] proposed a data sparsity expanding method to recover the spatial spectral data cube. Arad et al. [3] first leveraged HS prior in order to create a sparse dictionary of HS signatures and their corresponding RGB projections. While Aeschbacher et al. [1] pushed the performance of Arad et al.’s method for better accuracy and runtime based on A+ framework [34].

Beyond the dataset provided by Arad et al. [3], many approaches proposed their own dataset. For instance, Yasuma et al. [37] utilized a CCD camera (Apogee Alta U260) to captured 31-band multispectral images (400–700 nm, at 10 nm intervals) of several static scenes. Nguyen et al. [26] captured a dataset by Specim’s PFD-CL-65-V10E (400 nm to 1000 nm) spectral camera and there were total 64 images. Chakrabarti et al. [8] explored a statistical model based on 55 HS images of indoor and outdoor scenes. With the improvement of the scale and resolution of natural HS dataset, the training of deep learning method becomes more feasible, a number of algorithms based on convolutional neural network were proposed [21, 33]. Simon et al. [20] proposed a fully convolutional densely connected “Tiramisu” network with one hundred layers for semantic segmentation. Galliani et al. [11]

enhanced it for spectral image super-resolution. Can et al.

[6] improved it to avoid overfitting to the training data and obtain faster inference speed. Moreover, Xiong et al. [36] proposed a unified HSCNN framework for hyperspectral recovery from both RGB and compressive measurements. To boost the performance, they developed a deep residual network named HSCNN-R, and another distinct architecture that replaces the residual block by the dense block with a novel fusion scheme, named HSCNN-D, collectively called HSCNN+ [32].

Convolutional neural networks.

The convolutional neural networks have been successfully applied in many low-level vision tasks, e.g. colorization

[39, 19], inpainting [18, 38], deblurring [22], denoising [13, 9], and demosaicking [9, 40]. Hyperspectral reconstruction, as one of low-level task, has gained great improvement of performance recently by deep convolutional neural networks. In order to facilitate convergence and extract features effectively, many well-known basic blocks are utilized in those frameworks such as residual block and dense block. He et al. [14] proposed a residual network initially for image classification. It improves the accuracy obviously compared with traditional cascade convolutional structure. Then, the residual block has been widely used in image enhancement region for maintaining low-level features by the short connection. It was enhanced by densenet proposed by Huang et al. [16] to improve the feature fusion ability. Moreover, Hu et al. [15] strengthened them by a squeeze-and-excitation network including a feature attention mechanism. It was implemented by MLP layers for modelling connections of pixels in different spatial location. In general, our HRNet combines the advantages of above methods and provides a more effective and accurate solution for HS reconstruction.

Figure 1: Visualization of NTIRE 2020 HS dataset. For each group, from top to bottom and left to right, they represent clean RGB images, real world RGB images, HS images with 400 nm, 410 nm, 420 nm, 500 nm, 600 nm, and 700 nm channels, respectively.
Figure 2: Illustration of the architecture of HRNet. Please visit the project web page https://github.com/zhaoyuzhi/Hierarchical-Regression-Network-for-Spectral-Reconstruction-from-RGB-Images to try our codes and pre-trained models.
Figure 3: Illustration of the architecture of residual dense block (ResDB) and residual global block (ResGB).

3 Methodology

3.1 Dataset

We train our approach on the HS dataset provided by NTIRE Challenge 2020. This dataset consists of three parts: spectral images, clean RGB images (for track 1) and real world RGB images (for track 2). There are overall 450 RGB-HS pairs in training for both tracks involving different scenes. Each spectral image has the information of 31 bands in range of 400 nm to 700 nm. It is of spatial resolution. To generate its corresponding RGB image, there is a fixed response function applied to HS bands. The rendering process can be defined as:

(1)

The RGB images and HS images include 3 and 31 channels, respectively. The maps each HS band to visible channel R, G, and B by 93 parameters. For clean RGB images, they are constructed by a known response function and saved as uncompressed format. However, the real world RGB images are acquired by unknown response function with additional blind noise and demosaicking operation. Some examples are illustrated in Figure 1 (e.g. 1st band approximately covers the 395-405 nm range).

3.2 HRNet architecture

Generally, we propose a 4-level network architecture for high-quality spectral reconstruction from RGB images, as shown in Figure 2. The PixelUnShuffle layers [31]

are utilized to downsample the input to each level without adding parameters. Therefore, the number of pixels of input is fixed while the spatial resolution decreases. Conversely, the learnable PixelShuffle layers are adopted to upsample feature maps and reduce channels for inter-level connection. The PixelShuffle only reshapes feature maps and does not introduces interpolation like bilinear upsampling. It allows the network to learn upsampling operation adaptively.

For each level, the process is decomposed to inter-level integration, artifacts reduction, and global feature extraction. For inter-level learning, the output features of subordinate level are pixel shuffled, then concatenated to current level, finally processed by an additional convolutional layer to unify channel number. In order to effectively reduce artifacts, we adopt residual dense block

[14, 16], containing 5 dense-connected convolutional layers and a residual. Moreover, the residual global block [14, 15] with short-cut connection of input is used to extract attention for every remote pixels by MLP layers.

Since the features are most compact in bottom level, there is a convolutional layer attached to the last of bottom level in order to enhance tone mapping by weighting all channels. The two mid levels process features at different scales. Moreover, the top level uses the most blocks to effectively integrate features and reduce artifacts thus produce high-quality spectral images. The illustration of these blocks are in Figure 3.

3.3 Implementation details

We only use L1 loss in the training process, which is a PSNR-oriented optimization for the system. The L1 loss is defined as:

(2)

where and are input and output, respectively. The is the proposed HRNet. Note that, we utilize the local patches for efficient training. The input RGB image and output spectral images are cropped in same spatial region.

For network architecture, all the layers are LeakyReLU [24]

activated except output layer. We do not use any normalization in HRNet to maintain the data distribution. The reflect padding is adopted for each convolutional layer in order to reduce border effect. The weights of VCGAN are initialized by Xavier algorithm

[12].

For training details, we use the entire NTIRE 2020 HS dataset (450 HS-RGB pairs for both tracks) at training. The whole HRNet is trained for 10000 epochs overall. The initial learning rate is

and halved every 3000 epochs. For optimization, we use Adam optimizer with , and batch size equals to 8. The image pairs are randomly cropped to region and normalized to range [0, 1]. All the experiments are implemented using 2 NVIDIA Titan Xp GPUs. It takes approximately 7 days for whole training process.

3.4 Ensemble strategy

Since the solution space of spectral reconstruction is often large, there may be multiple settings that achieve same performance on the training set. Therefore, a single network may lead to poor generalization performance since it tends to fall into local minima. However, we can minimize this risk by combining multiple network settings to enhance generalization and fuse the knowledge. In order to perform ensemble strategy, we use 4 other hyper-parameter settings and train HRNet from scratch for both tracks. These settings can be summarized as:

  • Re-train the HRNet using baseline training setting.

  • Exchange the position of residual dense block and residual global block in HRNet, and use baseline training setting.

  • Train the network with different batch size (2 or 4) and keep other hyper-parameter settings, network architecture.

  • Train the network with different cropping patch size ( or ) and keep other hyper-parameter settings, network architecture.

Figure 4: The MRAE between ground truth spectral images and the generated images of different hyper-parameter settings for ensemble.

Therefore, there are 8 kinds of training methods. All the methods used for ensemble are trained for 10000 epochs. We record the MRAE (mean absolute value between all bands of generated spectral images and ground truth ) every 1000 epochs, as shown in Table 1 and Figure 4. Finally, we utilize the epoch with best MRAE value of 8 methods for computing average.

Setting track 1 track 2
Baseline 0.042328 0.068245
Re-train baseline (1st) 0.043408 0.071044
Re-train baseline (2nd) 0.043487 0.070668
Exchange position of blocks 0.042418 0.071798
Change batch size 8 to 4 0.041936 0.071259
Change batch size 8 to 2 0.041507 0.072797
Change patch size 256 to 320 0.042810 0.070502
Change patch size 256 to 384 0.042166 0.072313
Ensemble 0.039893 0.068081
Table 1: The best MRAE value of both tracks for HRNet settings used for ensemble.

4 Experiment

4.1 Experimental settings

We evaluate proposed HRNet by comparing with other network architectures and conducting ablation study on NTIRE 2020 HS dataset. For each track, there are 10 validation RGB images. The evaluation metrics are defined as:

  • MRAE. It computes the pixel-wise disparity (mean absolute value) between all bands of generated spectral images and ground truth . It explicitly represents the construction quality of network. It is defined as:

    (3)

    where denotes the overall pixels of spectral images.

  • RMSE. It computes the root mean square error between the generated and ground truth spectral images with 31 bands. It is defined as:

    (4)
  • Back Projection MRAE (BPMRAE). It evaluates the colorimetric accuracy of recovered RGB images from the generated and ground truth spectral images by a fixed camera response function. It is defined as:

    (5)

    where denotes the function .

4.2 Comparison with other architectures

Method U-Net U-ResNet HRNet
MRAE track 1 0.047507 0.045242 0.042328
track 2 0.074230 0.078892 0.068245
RMSE track 1 0.014154 0.013927 0.013537
track 2 0.018647 0.020630 0.017859
BPMRAE track 1 0.007926 0.007171 0.006064
track 2 0.044966 0.055876 0.042105
Table 2: The quantitative comparison results of different architectures and HRNet on NTIRE 2020 HS validation set.

We utilize two common network architectures for comparison: U-Net [30] and U-ResNet [30, 14]. Both of them have been widely used in many previous low-level tasks [19, 18, 38, 22, 9, 40]. The first convolutional layer and last convolutional layer utilize

convolution without changing spatial resolution. The training scheme for all methods are same. Other details are concluded as: (1) U-Net. The encoder layers perform convolution with stride of 2. The spatial resolution of bottom feature map equals to

. There are short concatenations between each encoder layer and decoder layer with same resolution; (2) U-ResNet. The total number of encoder layers and decoder layers are half of U-Net. Instead, there are 4 residual blocks attached to the last layer of encoder. The concatenations are reserved.

We train both networks using same hyper-parameters of HRNet until convergence. There is no ensemble strategy used. We generate the reconstructed spectral images using the best epoch of them. The results are summarized in Table 2. We also visualize each method in Figure 5 and 6 by pseudo-color map. The first three rows show the data distribution of 3 methods and last row indicates ground truth. We recommend readers to compare textures of background.

There are two reasons that proposed HRNet outperforms other two methods. The first is that HRNet utilizes PixelShuffle to connect each level. Traditional nearest or bilinear upsampling will introduce redundancy information to features, which is unnecessary for feature extraction. However, by the combination of PixelUnShuffle and PixelShuffle, HRNet could process high-level features more efficiently. The second is that HRNet adopts two residual-based blocks, which facilitate convergence and assist each level to exploit different scales of features. Moreover, the blocks with residual learning helps remove artifacts. The residual global block enhances context information since it models correlation for every two pixels.

Figure 5: Visualization of generated results from U-ResNet, U-Net, and proposed HRNet on NTIRE 2020 HS validation set track 1.
Figure 6: Visualization of generated results from U-ResNet, U-Net, and proposed HRNet on NTIRE 2020 HS validation set track 2.
Method w/o ResDB w/o ResGB w/o both HRNet
MRAE 0.042448 0.042565 0.048033 0.042328
RMSE 0.014216 0.014092 0.015740 0.013537
BPMRAE 0.009507 0.007669 0.015502 0.006064
Table 3: The comparison results of ablation study on NTIRE 2020 HS validation set track 1 - clean images.
Method HRNet () HRNet () HRNet ()
MACs (G) 46.413 12.017 3.212
Params (Mb) 8.185 2.176 0.6088
Weights (Mb) 32.006 8.532 2.410
MRAE 0.042457 0.046424 0.048443
RMSE 0.015147 0.015459 0.015659
BPMRAE 0.006886 0.007806 0.009891
Table 4: The comparison results of compressed HRNet model (the number of channels decreased to , , and of the original) on NTIRE 2020 HS validation set track 1 - clean images.
Team MRAE Runtime / Image (seconds) Compute Platform
Deep-imagelab 0.03010476377 0.56 2NVIDIA 2080Ti
ppplang 0.03075687151 16 NVIDIA 1080Ti
HRNet 0.03231183605 3.748 2NVIDIA Titan Xp
ZHU_ zy 0.03475963089  1 Unknown
sunnyvick 0.03516495956 0.7 Tesla K80 12GB
Table 5: The final testing results of NTIRE 2020 Spectral Reconstruction from RGB Images Challenge track 1 - clean images.
Team MRAE Runtime / Image (seconds) Compute Platform
HRNet 0.06200744887 3.748 2NVIDIA Titan Xp
ppplang 0.06212710705 16 NVIDIA 1080Ti
Deep-imagelab 0.06216655487 0.56 2NVIDIA 2080Ti
PARASITE 0.06514769779  30 NVIDIA Titan Xp
Tasti 0.06732598306 Unknown NVIDIA 2080Ti
Table 6: The final testing results of NTIRE 2020 Spectral Reconstruction from RGB Images Challenge track 2 - real world images.

4.3 Ablation study

In order to demonstrate the effectiveness of both residual dense block (ResDB) and residual global block (ResGB), we replace them by plain convolution layers with similar FLOPs. The results in track 1 - clean images is shown in Table 3. The baseline of HRNet is shown in Table 1, which has better performance comparing with all ablation settings. If we delete all ResDB or ResGB in HRNet, the MRAE decreases the most, which demonstrates the combination of both blocks is significant for spectral reconstruction.

We conduct another experiment that shrinks the HRNet model size by decreasing channels of each convolutional layer to half, one fourth, and one eighth of original numbers. It will compress model size greatly by sacrificing pixel fidelity. To better compare these settings, we conclude the multiply–accumulate operation (MACs), total network parameters (Params), model size saved on machine (Weights) and 3 quantitative metrics results in Table 4. The MACs, Params, and Weights of baseline HRNet are 182.347 Gb, 31.705 Mb, and 123.879 Mb, respectively. Users can choose high-quality HRNet to obtain high pixel fidelity of spectral images (MRAE 0.042328) or high-efficiency HRNet with small size (Weights 2.410 Mb).

4.4 Testing result on NTIRE 2020 challenge

The proposed HRNet ranks 3rd and 1st on track 1 and track 2, respectively, of NTIRE 2020 Spectral Reconstruction from RGB Images Challenge [4]. The comparison results on testing set are summarized in Table 5 and 6. Moreover, the HRNet has better performance on track 2 since it adopts two effective blocks for removing artifacts while utilizes learnable PixelShuffle upsampling operator. The ensemble strategy works obviously on both tracks that improves the MRAE from 0.042328 to 0.039893 since it avoids the HRNet to fall into local minima. In conclusion, both HRNet architecture and ensemble strategy contribute to spectral reconstruction performance.

5 Conclusion

In this paper, we presented a 4-level HRNet for automatically generating spectrum from RGB images. For each level, it adopts both residual dense block and residual global block for effectively extracting features. While the PixelShuffle is utilized for inter-level connection. Then, we proposed a novel 8-setting ensemble strategy to further enhance the quality of predicted spectral images. Finally, we validated the HRNet outperforms the well-known low-level vision frameworks such as U-Net and U-ResNet on NTIRE 2020 HS dataset. Furthermore, we presented 3 types of compressed HRNets and analyzed their reconstruction performance and computing efficiency. The proposed HRNet is the winning method of track 2 - real world images and ranks 3rd on track 1 - clean images.

References

  • [1] J. Aeschbacher, J. Wu, and R. Timofte (2017) In defense of shallow learned spectral reconstruction from rgb images. In

    Proceedings of the IEEE International Conference on Computer Vision Workshops

    ,
    pp. 471–479. Cited by: §2.
  • [2] P. Andersson, S. Montan, and S. Svanberg (1987) Multispectral system for medical fluorescence imaging. IEEE Journal of Quantum Electronics 23 (10), pp. 1798–1805. Cited by: §1.
  • [3] B. Arad and O. Ben-Shahar (2016) Sparse recovery of hyperspectral signal from natural rgb images. In Proceedings of the European Conference on Computer Vision, pp. 19–34. Cited by: §2, §2.
  • [4] B. Arad, R. Timofte, O. Ben-Shahar, Y. Lin, G. Finlayson, et al. (2020) NTIRE 2020 challenge on spectral reconstruction from an rgb image. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    ,
    Cited by: §4.4.
  • [5] M. Breuer and J. Albertz (2000) Geometric correction of airborne whiskbroom scanner imagery using hybrid auxiliary data. International Archives of Photogrammetry and Remote Sensing 33 (B3/1; PART 3), pp. 93–100. Cited by: §2.
  • [6] Y. B. Can and R. Timofte (2018) An efficient cnn for spectral reconstruction from rgb images. arXiv preprint arXiv:1804.04647. Cited by: §1, §2.
  • [7] X. Cao, H. Du, X. Tong, Q. Dai, and S. Lin (2011) A prism-mask system for multispectral video acquisition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (12), pp. 2423–2435. Cited by: §1, §2.
  • [8] A. Chakrabarti and T. Zickler (2011) Statistics of real-world hyperspectral images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 193–200. Cited by: §2.
  • [9] C. Chen, Q. Chen, J. Xu, and V. Koltun (2018) Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3291–3300. Cited by: §2, §4.2.
  • [10] M. Descour and E. Dereniak (1995) Computed-tomography imaging spectrometer: experimental calibration and reconstruction results. Applied Optics 34 (22), pp. 4817–4826. Cited by: §2.
  • [11] S. Galliani, C. Lanaras, D. Marmanis, E. Baltsavias, and K. Schindler (2017) Learned spectral super-resolution. arXiv preprint arXiv:1703.09470. Cited by: §2.
  • [12] X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In

    Proceedings of the thirteenth international conference on artificial intelligence and statistics

    ,
    pp. 249–256. Cited by: §3.3.
  • [13] S. Gu, Y. Li, L. V. Gool, and T. Radu (2019) Self-guided network for fast image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2511–2520. Cited by: §2.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §1, §2, §3.2, §4.2.
  • [15] J. Hu, L. Shen, and G. Sun (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2, §3.2.
  • [16] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §2, §3.2.
  • [17] S. Hwang, J. Park, N. Kim, Y. Choi, and I. S. Kweon (2015) Multispectral pedestrian detection: benchmark dataset and baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
  • [18] S. Iizuka, E. Simo-Serra, and H. Ishikawa (2017) Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) 36 (4), pp. 1–14. Cited by: §2, §4.2.
  • [19] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134. Cited by: §2, §4.2.
  • [20] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 11–19. Cited by: §2.
  • [21] S. Koundinya, H. Sharma, M. Sharma, A. Upadhyay, R. Manekar, R. Mukhopadhyay, A. Karmakar, and S. Chaudhury (2018) 2d-3d cnn based architectures for spectral reconstruction from rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 844–851. Cited by: §1, §2.
  • [22] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas (2018) Deblurgan: blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8183–8192. Cited by: §2, §4.2.
  • [23] J. Liu, S. Zhang, S. Wang, and D. N. Metaxas (2016) Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644. Cited by: §1.
  • [24] A. L. Maas, A. Y. Hannun, and A. Y. Ng (2013) Rectifier nonlinearities improve neural network acoustic models. In

    Proceedings of the International Conference on Machine Learning

    ,
    Vol. 30, pp. 3. Cited by: §3.3.
  • [25] F. Melgani and L. Bruzzone (2004)

    Classification of hyperspectral remote sensing images with support vector machines

    .
    IEEE Transactions on Geoscience and Remote Sensing 42 (8), pp. 1778–1790. Cited by: §1.
  • [26] R. M. Nguyen, D. K. Prasad, and M. S. Brown (2014) Training-based spectral reconstruction from a single rgb image. In Proceedings of the European Conference on Computer Vision, pp. 186–201. Cited by: §2.
  • [27] M. Parmar, S. Lansel, and B. A. Wandell (2008) Spatio-spectral reconstruction of the multispectral datacube using sparse recovery. In IEEE International Conference on Image Processing, pp. 473–476. Cited by: §2.
  • [28] D. Poli and T. Toutin (2012) Review of developments in geometric modelling for high resolution satellite pushbroom sensors. The Photogrammetric Record 27 (137), pp. 58–73. Cited by: §1, §2.
  • [29] J. Qin, K. Chao, M. S. Kim, R. Lu, and T. F. Burks (2013) Hyperspectral and multispectral imaging for evaluating food safety and quality. Journal of Food Engineering 118 (2), pp. 157–171. Cited by: §1.
  • [30] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Cited by: §4.2.
  • [31] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883. Cited by: §1, §3.2.
  • [32] Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu (2018) Hscnn+: advanced cnn-based hyperspectral recovery from rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 939–947. Cited by: §1, §2.
  • [33] T. Stiebel, S. Koppers, P. Seltsam, and D. Merhof (2018) Reconstructing spectral images from rgb-images using a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 948–953. Cited by: §1, §2.
  • [34] R. Timofte, V. De Smet, and L. Van Gool (2014) A+: adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision, pp. 111–126. Cited by: §2.
  • [35] L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng (2016) Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (10), pp. 2104–2111. Cited by: §1, §2.
  • [36] Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu (2017) Hscnn: cnn-based hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 518–525. Cited by: §1, §2.
  • [37] F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar (2010) Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE Transactions on Image Processing 19 (9), pp. 2241–2253. Cited by: §2.
  • [38] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2019)

    Free-form image inpainting with gated convolution

    .
    In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471–4480. Cited by: §2, §4.2.
  • [39] R. Zhang, I. Phillip, and E. Alexei A. (2016) Colorful image colorization. In Proceedings of the European Conference on Computer Vision, pp. 649–666. Cited by: §2.
  • [40] Y. Zhao, L. Po, T. Zhang, Z. Liao, X. Shi, Y. Zhang, W. Ou, P. Xian, J. Xiong, C. Zhou, et al. (2019) Saliency map-aided generative adversarial network for raw to rgb mapping. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3449–3457. Cited by: §2, §4.2.