MultiscaleSuperSpectral
Spectral Superresolution from Single RGB Image Using Multiscale CNN
view repo
Different from traditional hyperspectral superresolution approaches that focus on improving the spatial resolution, spectral superresolution aims at producing a highresolution hyperspectral image from the RGB observation with superresolution in spectral domain. However, it is challenging to accurately reconstruct a highdimensional continuous spectrum from three discrete intensity values at each pixel, since too much information is lost during the procedure where the latent hyperspectral image is downsampled (e.g., with x10 scaling factor) in spectral domain to produce an RGB observation. To address this problem, we present a multiscale deep convolutional neural network (CNN) to explicitly map the input RGB image into a hyperspectral image. Through symmetrically downsampling and upsampling the intermediate feature maps in a cascading paradigm, the local and nonlocal image information can be jointly encoded for spectral representation, ultimately improving the spectral reconstruction accuracy. Extensive experiments on a large hyperspectral dataset demonstrate the effectiveness of the proposed method.
READ FULL TEXT VIEW PDF
Recently, single gray/RGB image superresolution reconstruction task has...
read it
We describe a novel method for blind, singleimage spectral superresolu...
read it
Spectral superresolution (SSR) aims at generating a hyperspectral image...
read it
In recent times, CNNs have made significant contributions to application...
read it
In this paper, we describe a novel deep convolutional neural network (CN...
read it
This work studies Hyperspectral image (HSI) superresolution (SR). HSI S...
read it
Intraoperative measurements of tissue shape and multi/ hyperspectral
in...
read it
Spectral Superresolution from Single RGB Image Using Multiscale CNN
Hyperspectral imaging encodes the reflectance of the scene from hundreds or thousands of bands with a narrow wavelength interval (e.g., 10nm) into a hyperspectral image. Different from conventional images, each pixel in the hyperspectral image contains a continuous spectrum, thus allowing the acquisition of abundant spectral information. Such information has proven to be quite useful for distinguishing different materials. Therefore, hyperspectral images have been widely exploited to facilitate various applications in computer vision community, such as visual tracking
[20], image segmentation [18][14][5], and anomaly detection
[10].The acquisition of spectral information, however, comes at the cost of decreasing the spatial resolution of hyperspectral images. This is because a fewer number of photons are captured by each detector due to the narrower width of the spectral bands. In order to maintain a reasonable signaltonoise ratio (SNR), the instantaneous field of view (IFOV) needs to be increased, which renders it difficult to produce hyperspectral images with high spatial resolution. To address this problem, many efforts have been made for the hyperspectral imagery superresolution.
Most of the existing methods mainly focus on enhancing the spatial resolution of the observed hyperspectral image. According to the input images, they can be divided into two categories: fusion based methods where a highresolution conventional image (, RGB image) and a lowresolution hyperspectral image are fused together to produce a highresolution hyperspectral image [22, 11] single image superresolution which directly increases the spatial resolution of a hyperspectral image [12, 24, 27, 25]. Although these methods have shown effective performance, the acquisition of the input hyperspectral image often requires specialized hyperspectral sensors as well as extensive imaging cost. To mitigate this problem, some recent literature [4, 2, 13, 7] turn to investigate a novel hyperspectral imagery superresolution scheme, termed spectral superresolution, which aims at improving the spectral resolution of a given RGB image. Since the input image can be easily captured by conventional RGB sensors, imaging cost can be greatly reduced.
However, it is challenging to accurately reconstruct a hyperspectral image from a single RGB observation, since mapping three discrete intensity values to a continuous spectrum is a highly illposed linear inverse problem. To address this problem, we propose to learn a complicated nonlinear mapping function for spectral superresolution with deep convolutional neural networks (CNN). It has been shown that the 3dimensional color vector for a specific pixel can be viewed as the downsampled observation of the corresponding spectrum. Moreover, for a candidate pixel, there often exist abundant locally and nolocally similar pixels (
exhibiting similar spectra) in the spatial domain. As a result, the color vectors corresponding to those similar pixels can be viewed as a group of downsampled observations of the latent spectra for the candidate pixel. Therefore, accurate spectral reconstruction requires to explicitly consider both the local and nonlocal information from the input RGB image. To this end, we develop a novel multiscale CNN. Our method jointly encodes the local and nonlocal image information through symmetrically downsampling and upsampling the intermediate feature maps in a cascading paradigm, thus enhancing the spectral reconstruction accuracy. We experimentally show that the proposed method can be easily trained in an endtoend scheme and beat several stateoftheart methods on a large hyperspectral image dataset with respect to various evaluation metrics.
Our contributions are twofold:
We design a novel CNN architecture that is able to encode both local and nonlocal information for spectral reconstruction.
We perform extensive experiments on a large hyperspectral dataset and obtain the stateoftheart performance.
This section gives a brief review of the existing spectral superresolution methods, which can be divided into the following two categories.
Statistic based methods This line of research mainly focus on exploiting the inherent statistical distribution of the latent hyperspectral image as priors to guide the superresolution [26, 21]. Most of these methods involve building overcomplete dictionaries and learning sparse coding coefficients to linearly combine the dictionary atoms. For example, in [4], Arad leveraged image priors to build a dictionary using KSVD [3]. At test time, orthogonal matching pursuit [15] was used to compute a sparse representation of the input RGB image. [2] proposed a new method inspired by A+ [19]
, where sparse coefficients are computed by explicitly solving a sparse least square problem. These methods directly exploit the whole image to build image prior, ignoring local and nonlocal structure information. What’s more, since the image prior is often handcrafted or heuristically designed with shallow structure, these methods fail to generalize well in practice.
Learning based methods These methods directly learn a certain mapping function from the RGB image to a corresponding hyperspectral image. For example, [13]
proposed a training based method using a radial basis function network. The input data was preprocessed with a white balancing function to alleviate the influence of different illumination. The total reconstruction accuracy is affected by the performance of this preprocessing stage. Recently, witnessing the great success of deep learning in many other illposed inverse problems such as image denoising
[23] and single image superresolution [6], it is natural to consider using deep networks (especially convolutional neural networks) for spectral superresolution. In [7], Galliani exploited a variant of fully convolutional DenseNets (FCDenseNets [9]) for spectral superresolution. However, this method is sensitive to the hyperparameters and its performance can still be further improved.In this section, we will introduce the proposed multiscale convolution neural network in details. Firstly, we introduce some building blocks which will be utilized in our network. Then, we will illustrate the architecture of the proposed network.


There are three basic building blocks in our network. Their structures are shown in Table 1.
Double convolution (Double Conv) block consists of two convolutions. Each of them is followed by batch normalization, leaky ReLU and dropout. We exploit batch normalization and dropout to deal with overfitting.
Downsample block contains a regular maxpooling layer. It reduces the spatial size of the feature map and enlarges the receptive field of the network.
Upsample block is utilized to upsample the feature map in the spatial domain. To this end, much previous literature often adopts the transposed convolution. However, it is prone to generate checkboard artifacts. To address this problem, we use the pixel shuffle operation [17]. It has been shown that pixel shuffle alleviates the checkboard artifacts. In addition, due to not introducing any learnable parameters, pixel shuffle also helps improve the robustness against overfitting.
Our method is inspired by the well known UNet architecture for image segmentation [16]. The overall architecture of the proposed multiscale convolution neural network is depicted in Figure 1. The network follows the encoderdecoder pattern. For the encoder part, each downsampling step consists of a “Double Conv” with a downsample block. The spatial size is progressively reduced, and the number of features is doubled at each step. The decoder is symmetric to the encoder path. Every step in the decoder path consists of an upsampling operation followed by a “Double Conv” block. The spatial size of the features is recovered, while the number of features is halved every step. Finally, a convolution maps the output features to the reconstructed 31channel hyperspectral image. In addition to the feedforward path, skip connections are used to concatenate the corresponding feature maps of the encoder and decoder.
Our method naturally fits the task of spectral reconstruction. The encoder can be interpreted as extracting features from RGB images. Through downsampling in a cascade way, the receptive field of the network is constantly increased, which allows the network to “see” more pixels in an increasingly larger field of view. By doing so, both the local and nonlocal information can be encoded to better represent the latent spectra. The symmetric decoder procedure is employed to reconstruct the latent hyperspectral images based on these deep and compact features. The skip connections with concatenations are essential for introducing multiscale information and yielding better estimation of the spectra.
In this study, all experiments are performed on the NTIRE2018 dataset [1]. This dataset is extended from the ICVL dataset [4]. The ICVL dataset includes images captured using Specim PS Kappa DX4 hyperspectral camera. Each image is of size in spatial resolution and contains spectral bands in the range of . In experiments, successive bands ranging from with interval are extracted from each image for evaluation. In the NTIRE2018 challenge, this dataset is further extended by supplementing extra images of the same spatial and spectral resolution. As a result, highresolution hyperspectral images are collected as the training data. In addition, another hyperspectral images are further introduced as the test set. In the NTIRE2018 dataset, the corresponding RGB rendition is also provided for each image. In the following, we will employ the RGBhyperspectral image pairs to evaluate the proposed method.
BGU_00257  BGU_00259  BGU_00261  BGU_00263  BGU_00265  Average  

Interpolation  1.8622  1.7198  2.8419  1.3657  1.9376  1.9454 
Arad  1.7930  1.4700  1.6592  1.8987  1.2559  1.6154 
A+  1.3054  1.3572  1.3659  1.4884  0.9769  1.2988 
Galliani  0.7330  0.7922  0.8606  0.5786  0.8276  0.7584 
Our  0.6172  0.6865  0.9425  0.5049  0.8375  0.7177 
BGU_00257  BGU_00259  BGU_00261  BGU_00263  BGU_00265  Average  
Interpolation  3.0774  2.9878  4.1453  2.0874  3.9522  3.2500 
Arad  3.4618  2.3534  2.6236  2.5750  2.0169  2.6061 
A+  2.1911  1.9572  1.9364  2.0488  1.3344  1.8936 
Galliani  1.2381  1.2077  1.2577  0.8381  1.6810  1.2445 
Ours  0.9768  1.3417  1.6035  0.7396  1.7879  1.2899 
BGU_00257  BGU_00259  BGU_00261  BGU_00263  BGU_00265  Average  
Interpolation  0.0658  0.0518  0.0732  0.0530  0.0612  0.0610 
Arad  0.0807  0.0627  0.0624  0.0662  0.0560  0.0656 
A+  0.0580  0.0589  0.0612  0.0614  0.0457  0.0570 
Galliani  0.0261  0.0268  0.0254  0.0237  0.0289  0.0262 
Ours  0.0235  0.0216  0.0230  0.0205  0.0278  0.0233 
BGU_00257  BGU_00259  BGU_00261  BGU_00263  BGU_00265  Average  
Interpolation  0.1058  0.0933  0.1103  0.0759  0.1338  0.1038 
Arad  0.1172  0.0809  0.0819  0.0685  0.0733  0.0844 
A+  0.0580  0.0589  0.0612  0.0614  0.0457  0.0610 
Galliani  0.0453  0.0372  0.0331  0.0317  0.0562  0.0407 
Ours  0.0357  0.0413  0.0422  0.0280  0.0598  0.0414 
(degree)  
BGU_00257  BGU_00259  BGU_00261  BGU_00263  BGU_00265  Average  
Interpolation  3.9620  3.0304  4.2962  3.1900  3.9281  3.6813 
Arad  4.2667  3.7279  3.4726  3.3912  3.3699  3.6457 
A+  3.2952  3.5812  3.2952  3.0256  3.2952  3.2985 
Galliani  1.4725  1.5013  1.4802  1.4844  1.8229  1.5523 
Ours  1.3305  1.2458  1.7197  1.1360  1.9046  1.4673 
To demonstrate the effectiveness of the proposed method, we compare it with four spectral superresolution methods, including spline interpolation, the sparse recovery method in [4] (Arad ), A+ [2], and the deep learning method in [7] (Galliani ). [4, 2] are implemented by the codes released by the authors. Since there is no code released for [7], we reimplement it in this study. In the following, we will give the implementation details of each method.
Spline interpolation The interpolation algorithm serves as the most primitive baseline in this study. Specifically, for each RGB pixel , we use spline interpolation to upsample it and obtain a dimensional spectrum (). According to the visible spectrum^{1}^{1}1http://www.gamonline.com/catalog/colortheory/visible.php, the , , values of an RGB pixel are assigned to , , and , respectively.
Arad and A+ The low spectral resolution image is assumed to be a directly downsampled version of the corresponding hyperspectral image using some specific linear projection matrix. In [4, 2]
this matrix is required to be perfectly known. In our experiments, we fit the projection matrix using training data with conventional linear regression.
Galliani and our method We experimentally find the optimal set of hyperparameters for both methods. dropout is applied to Galliani , while our method utilizes
dropout rate. All the leaky ReLU activation functions are applied with a negative slope of 0.2. We train the networks for 100 epochs using Adam optimizer with
regularization. Weight initialization and learning rate vary for different methods. For Galliani , the weights are initialized via HeUniform [8], and the learning rate is set to for the first 50 epochs, decayed to for the next 50 epochs. As for our method, we use HeNormal initialization [8]. The initial learning rate is and is multiplied by 0.93 every 10 epochs. We perform data augmentation by extracting patches of sizewith a stride of 40 pixels from training data. The total amount of training samples is over
. At the test phase, we directly feed the whole image to the network and get the estimated hyperspectral image in one single forward pass.To quantitatively evaluate the performance of the proposed method, we adopt the following two categories of evaluation metrics.
Pixellevel reconstruction error We follow [2] to use absolute and relative rootmeansquare error (RMSE and rRMSE) as quantitative measurements for reconstruction accuracy. Let and denote the th element of the real and estimated hyperspectral images, is the average of , and is the total number of elements in one hyperspectral image. There are two formulas for RMSE and rRMSE respectively.
Spectral similarity Since the key for spectral superresolution is to reconstruct the spectra, we also use spectral angle mapper () to evaluate the performance of different methods. calculates the average spectral angle between the spectra of real and estimated hyperspectral images. Let represents the spectra of the th hyperspectral pixel in real and estimated hyperspectral images ( is the number of bands), and is the total number of pixels within an image. The value can be computed as follows.
Convergence Analysis We plot the curve of loss on the training set and the curves of five evaluation metrics computed on the test set in Figure 3. It can be seen that both the training loss and the value of metrics gradually decrease and ultimately converge with the proceeding of the training. This demonstrates that the proposed multiscale convolution neural network converges well.
Quantitative Results Table 2 provides the quantitative results of our method and all baseline methods. It can be seen that our model outperforms all competitors with regards to and , and produces comparable results to Galliani on and . More importantly, our method surpasses all the others with respect to spectral angle mapper. This clearly proves that our model reconstructs spectra more accurately than other competitors. It is worth pointing out that reconstruction error (absolute and relative ) is not necessarily positively correlated with spectral angle mapper (). For example, when the pixels of an image are shuffled, and will remain the same, while will change completely. According to the results in Table 2, we can find that our finely designed network enhances spectral superresolution from both aspects, , yielding better results on both average rootmeansquare error and spectral angle similarity.
Visual Results To further clarify the superiority in reconstruction accuracy. We show the absolute reconstruction error of test images in Figure 4. The error is summarized over all bands of the hyperspectral image. Since A+ outperforms Arad in terms of any evaluation metric, we use A+ to represent the sparse coding methods. It can be seen that our method yields smoother reconstructed images as well as lower reconstruction error than other competitors.
In addition, we randomly choose three test images and plot the real and reconstructed spectra for four pixels in Figure 2 to further demonstrate the effectiveness of the proposed method in spectrum reconstruction. It can be seen that only slight difference exists between the reconstructed spectra and the ground truth.
According to these results above, we can conclude that the proposed method is effective in spectral superresolution and outperforms several stateoftheart competitors.
In this study, we show that leveraging both the local and nonlocal information of input images is essential for the accurate spectral reconstruction. Following this idea, we design a novel multiscale convolutional neural network, which employs a symmetrically cascaded downsamplingupsampling architecture to jointly encode the local and nonlocal image information for spectral reconstruction. With extensive experiments on a large hyperspectral images dataset, the proposed method clearly outperforms several stateoftheart methods in terms of reconstruction accuracy and spectral similarity.
This work was supported in part by the National Natural Science Foundation of China (No. 61671385, 61571354), Natural Science Basis Research Plan in Shaanxi Province of China (No. 2017JM6021, 2017JM6001) and China Postdoctoral Science Foundation under Grant (No. 158201).
Aeschbacher, J., Wu, J., Timofte, R., CVL, D., ITET, E.: In defense of shallow learned spectral reconstruction from rgb images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 471–479 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp. 1026–1034 (2015)
Comments
There are no comments yet.