1 Introduction
Hyperspectral imaging is the technique that captures the reflectance of scenes with extremely high spectral resolution (, 10) [7]. The captured hyperspectral image (HSI) often contains hundreds or thousands of spectral bands, and each pixel has a spectrum [7, 41]. Profiting from the abundant spectral information, HSIs have been widely applied to various tasks, , classification [3], detection [26] and tracking [35] However, the expense of obtaining such spectral information is to increase the pixel size on the sensor, which inevitably limits the spatial resolution of HSIs [25]. Thus, it is crucial to investigate how to generate highspatialresolution (HSR) HSIs.
Different from convnetioanl HSIs superresolution [27, 40] that directly improves the spatial resolution of a given HSI, spectral superresolution (SSR) [5, 37] adopts an alternative way and attempts to produce an HSR HSI by increasing the spectral resolution of a given RGB image with satisfactory spatial resolution. Early SSR methods [5, 1, 20] often formulate SSR as a linear inverse problem, and exploit the inherent lowlevel statistic of HSR HSIs as priors. However, due to the limited expressive capacity of their handcrafted prior models, these methods fail to well generalize to challenging cases. Recently, witnessing the great success of deep convolutional neural networks (DCNNs) in a wide range of tasks [33, 17, 16], increasing efforts have been invested to learn a DCNN based mapping function to directly transform the RGB image into an HSI [4, 6, 32, 13]. These methods essentially involve mapping the RGB context within a sizespecific receptive field centered at each pixel to its spectrum in the HSI, as shown in Figure 1. The focus thereon is to appropriately determine the receptive field size and establish the mapping function from RGB context to the corresponding spectrum. Due to the difference in category or spatial position, pixels in HSIs often necessitate collecting different RGB information and adopting various recovery schemes for SSR. Therefore, to obtain an accurate DCNN based SSR approach, it is crucial to adaptively determine the receptive field size and the RGBtospectrum mapping function for each pixel. However, most existing DCNN based SSR methods treat all pixels in HSIs equally and learn a universal mapping function with a fixedsized receptive field, as shown in Figure 1.
In this study, we present a pixelaware deep functionmixture network for SSR, which is flexible to pixelwisely determine the receptive field size and the mapping function. Specifically, we first develop a new module, termed the functionmixture (FM) block. Each FM block consists of some parallel DCNN based subnets, among which one is termed the mixing function and the remaining are termed basis functions. The basis functions take differentsized receptive fields and learn distinct mapping schemes; while the mixture function generates pixelwise weights to linearly mix the outputs of the basis functions. In this way, the pixelwise weights can determine a specific information flow for each pixel and consequently benefit the network to choose appropriate RGB context as well as the mapping function for spectrum recovery. Then, we stack several such FM blocks to further improve the flexibility of the network in learning the pixelwise mapping. Furthermore, to encourage feature reuse, the intermediate features generated by the FM blocks are fused in late stage, which proves to be effective for boosting the SSR performance. Experimental evaluation on three benchmark HSI datasets shows the superiority of the proposed approach for SSR.
In summary, we mainly contribute in three aspects. i) We present an effective pixelaware deep functionmixture network for SSR, which is flexible to learn the pixelwise RGBtospectrum mapping. To our best knowledge, this is the first attempt to explore this in SSR. ii) We design a new FM module, which is flexible to plug in any modern DCNN architectures; iii) We demonstrate new stateoftheart performance on three benchmark SSR datasets.
2 Related Work
We first review the existing approaches for SSR and then introduce some techniques related to this work.
Spectral Superresolution
Early methods mainly focus on exploiting appropriate image priors to regularize the linear inverse SSR problem. For example, Arad and Aeschbacher [5, 1] investigated the sparsity of the latent HSI on a pretrained overcomplete spectral dictionary. Jia [20] considered the manifold structure of HSIs in a lowdimensional space. Recently, most methods turn to learning a deep mapping function from the RGB image to an HSI. For example, AlvarezGila et al. [4] implemented the mapping function using an UNet architecture [29] and trained it based on both the meansquareerror (MSE) loss and the adversarial loss [14]. Shi [32] developed a deep residual network consisting of residual blocks to learn the mapping function. Despite obtaining impressive performance for SSR, these methods are limited by learning a universal RGBtospectrum mapping function for all pixels in HSIs. This leaves space for learning more flexible and adaptive mapping function.
Receptive Field in DCNNs
Receptive field is an important concept in the DCNN, which determines the sensing space of a convolutional neuron. There are many efforts dedicating to adjusting the size or shape of the receptive field
[39, 36, 10] to meet the requirement of specific tasks at hand. Thereinto, dilated convolution [39] or kernel separation [31] were often utilized to enlarge the receptive field. Recently, Wei [36] changed the receptive field by inflating or shrinking the feature maps using two affine transformation layers. Dai [10]proposed to adaptively determine the context within the receptive field by estimating the offsets of pixels to the central pixel using an additional convolution layer. In contrast, we take a totally different direction and learn the pixelwise receptive field size by mixing some basis function with different receptive field sizes.
Multicolumn Network
Multicolumn network [8] is a specicial type of network that feeds the input into several parallel DCNNs (, columns), and then aggregates their outputs for final prediction. With the ability of using more context information, the multicolumn network (MCNet) often shows better generalization capacity than that with only a single column in various tasks, , classification [8], image processing [2], counting [43] . Although we also adopt a similar multicolumn structure in our module design, the proposed network is obviously different from these existing multicolumn networks [8, 43, 2]. First, MCNet employs a separationandaggregation architecture which processes the input with parallel columns and then aggregates the outputs of all columns for output. In contrast, we adopt a recursive separationandaggregation architecture by stacking multiple FM modules, each of which can be viewed as an enhanced multicolumn module, as shown in Figure 1, 3. Second, when applied to SSR, MCNet still learns a universal mapping function and fails to flexibly handle each pixel in an explicit way. In contrast, the proposed FM block incorporates a mixing function to generate pixelwise weights and mix the outputs of all basis functions. This enables to flexibly customize the pixelwise mapping function. In addition, we fuse the intermediate feature generated by FM blocks in the network for feature reuse.
3 Proposed Network
In this section, we present the technical details of the proposed pixelaware deep functionmixture network, as shown in Figure 2. The proposed network adopts a global residual architecture as [22]. Its backbone is constructed by stacking multiple FM blocks and fusing the intermediate features generated by previous FM block with skip connections. In the following, we will first introduce the basic FM block. Then, we will introduce how to incorporate multiple FM blocks and the intermediate features fusion into the proposed network for performance enhancement.
3.1 Functionmixture Block
The proposed network essentially establishes an endtoend mapping function from an RGB image to the HSI counterpart, and thus each FM block plays the role of a mapping subfunction. In this study, we attempt to utilize the FM block to adaptively determine the receptive field size and the mapping function for each pixel, , to obtain a pixeldependent mapping subfunction. To this end, a direct solution is to introduce an additional hypernetwork [15, 19] to adaptively generate the subfunction parameters for each pixel. However, this will greatly increase the computational complexity as well as the training difficulty [15]. To avoid this problem, we borrow the idea in function approximation [9] and assume that all pixeldependent subfunctions can be accurately approximated by mixing some basis functions with pixelwise weights. Due to being shared by all subfunctions, these basis functions can be learned by DCNNs. While the pixelwise mixing weights can be viewed as the pixelwise channel attention [30], which also can be directly generated by a DCNN.
Following this idea, we construct the FM block with a separationandaggregation structure, as shown in Figure 3
. First, a convolutional block, a convolutional layer followed by a Rectified Linear Unit (ReLu)
[28], is utilized for initial feature representation. Then, the obtained features are fed into multiple parallel subnets. Thereinto, one subnet is utilized to generate the pixelwise mixing weights. For simplicity, we term it the mixing function. While the remaining subnets represent the basis functions. Finally, the outputs of all basis functions are linearly mixed based on the generated pixelwise weights. Let denote the input for the th FM block and denote the number of basis functions in . The output of can be formulated as(1)  
where denotes the th basis function parameterized by and represents the mixing function parameterized by . When is of size (, channel height width), is of size , and represents the mixing weights of size generated for all pixels corresponding to the th basis function. denotes the point product. denotes the features output by the convolutional block in , and represents the convolutional filters. Inspired by [12], we also require the mixing weights to be nonnegative and the summation across all basis functions is equal to 1, as shown in Eq. (1).
In this study, we implement the basis functions as well as the mixing function by stacking consecutive convolutional blocks, as shown in Figure 3. Moreover, we equip these basis functions with differentsized convolutional filters to ensure they take differentsized receptive fields and learn distinct mapping schemes. For the mixing function, we introduce a Softmax unit at the end to comply with the constraints in Eq. (1). Apparently, profiting from such a pixelwise mixture architecture, the proposed FM block is able to determine the appropriate receptive field size and the mapping function for each pixel.
3.2 Multiple FM Blocks
As shown in Figure 2, in the proposed network, we first introduce an individual convolutional block, and then stack multiple FM blocks for the intermediate feature representation and the ultimate output. For an input RGB image , the output of the network with FM blocks can be given as
(2)  
where denotes the generated HSI and represents the output of the first convolutional block parameterized by . It is worth noting that in this study we increase the spectral resolution of to the same as that of
by the bilinear interpolation. In addition,
show the same architecture, while the output of will be adjusted according to the number of spectral bands in .It has been shown that the layers in an DCNN from bottom to top take increasingly larger receptive fields and extract different levels of features from the input signal [44]. Therefore, by stacking multiple FM blocks, we can further increase the flexibility of the proposed network in learning the pixelwise mapping, viz., adjust the receptive field size and the mapping function for each pixel at multiple levels. In addition, considering that each FM block defines the mapping subfunction for each pixel, the ultimate mapping function obtained by stacking FM blocks can be viewed as a composition function of subfunctions. Since each subfunction is approximated by the mixture of basis functions, the ultimate mapping function can be viewed as the mixture of basis functions, which show much larger expressive capacity than a single FM block in pixelwisely fitting an appropriate mapping function.
3.3 Intermediate Features Fusion
As previously mentioned, the FM blocks in the porposed network extract different levels of features from the input. Inspired by [23, 42], to reuse these intermediate features for performance enhancement, we employ skip connections to aggregate the intermediate features generated by each FM block before the ultimate output block with a concatenation operation, as shown in Figure 2. To better utilize all of these features for pixelwise representation, we introduce an extra FM block to fuse the concatenation result. With such an intermediate feature fusion operation, the output of the proposed network can be reformulated as
(3) 
Methods  NTIRE2018  CAVE  Harvard  

RMSE  PSNR  SAM  SSIM  RMSE  PSNR  SAM  SSIM  RMSE  PSNR  SAM  SSIM  
BI [18]  15.41  25.73  15.30  0.8397  26.60  21.49  34.38  0.7382  30.86  19.44  39.04  0.5887 
Arad [5]  4.46  35.63  5.90  0.9082  10.09  28.96  19.54  0.8695  7.85  31.30  8.32  0.8490 
Aitor [4]  1.97  43.30  1.80  0.9907  6.80  32.53  17.50  0.8768  3.29  39.21  4.93  0.9671 
HSCNN+ [37]  1.55  45.38  1.63  0.9931  4.97  35.66  8.73  0.9529  2.87  41.05  4.28  0.9741 
DCNN  1.23  47.40  1.30  0.9939  5.77  34.09  11.35  0.9275  2.88  40.83  4.24  0.9724 
MCNet  1.11  48.43  1.13  0.9951  4.84  35.92  8.98  0.9555  2.83  40.70  4.26  0.9689 
Ours  1.03  49.29  1.05  0.9955  4.54  36.33  7.07  0.9611  2.54  41.54  3.76  0.9796 
4 Experiment
In this section, we will conduct extensive comparison experiments and carry out an ablation study to demonstrate the effectiveness of the proposed method in SSR.
4.1 Experimental Setting
Datasets
In this study, we adopt three benchmark HSI datasets, including NTIRE2018 [34], CAVE [38] and Harvard [7]. NTIRE2018 dataset is the benchmark for the SSR challenge in NTIRE2018. In NTIRE2018 dataset, there are 255 paired HSIs and RGB images which have the same spatial resolution, , 1392 1300. Each HSI consists of successive spectral bands ranging from 400 to 700 with a 10 interval. CAVE dataset contains 32 HSIs of indoor objects. Similar to NTIRE2018, each HSI contains spectral bands ranging from 400 to 700 with a 10 interval but with the spatial resolution 512 512. Harvard dataset is another common benchmark for HSIs. It consists of 50 HSIs with spatial resolution 13921040. Each image contains 31 spectral bands captured from 420 to 720 with a 10 interval. For the CAVE and Havard datasets, inspird by [11, 40], we adopt the spectral response function of Nikon D700 camera [11] to generate the corresponding RGB image for each HSI. In the following experiments, we randomly select 200 paired images from the NTIRE2018 dataset as the training set and the remaining 55 paired images for testing. For the CAVE dataset, we randomly choose 22 paired images for training and the remaining 10 paired images for testing. While in the Harvard dataset, 30 paired images are randomly chosen as the training set and the remaining 20 paired images are utilized for testing.
Comparison Methods
In this study, we compare the proposed method with 6 existing methods including the bilinear interpolation (BI) [18], Arad [5], Aitor [4], HSCNN+ [37], deep convolution neural network (DCNN) and the multicolumn network (MCNet). Among them, the BI utilizes the bilinear interpolation to increase the spectral resolution of the input RGB image. The Arad is a sparsity induced conventional SSR method. The Aitor and HSCNN+ are two recent DCNN based stateoftheart SSR methods. The DCNN and MCNet are two baselines for the proposed method. The DCNN is a variant of the proposed method that is implemented by replacing each FM block in the proposed method with a convolutional block. For the MCNet, we implement it following the basic architecture in [8, 43] with the convolutional blocks. Moreover, the column number is set as and the convolutions in columns are equipped with kinds of differentsized filters, which is similar as the proposed method. We further control the depth of each column to make sure the model complexity of the MCNet is comparable to the proposed method. By doing this, the only difference between the MCNet and the proposed network is the network architecture. For fair comparison, all these DCNN based competitors and the spectral dictionary in the Arad [5] are retrained on the training set utilized in the experiments.
Evaluation Metrics
To objectively evaluate the SSR performance of each method, we employ four commonly utilized metrics, including the rootmeansquare error (RMSE), peak signaltonoise ratio (PSNR), spectral angle sapper (SAM) and structural similarity index (SSIM). The RMSE and PSNR measure the numerical difference between the reconstructed image and the reference image. The SAM computes the average spectral angle between two spectra from the reconstructed image and the reference image at the same spatial position to indicate the reconstruction accuracy of spectrum. The SSIM is often utilized to measure the spatial structure similarity between two images. In general, a larger PSNR or SSIM and a smaller RMSE or SAM indicate better performance.
Implementation Details
In the proposed method, we adopt 4 FM blocks (, including for feature fusion and =3), and each block contains basis functions. The basis functions and the mixing functions consist of =2 convolutional blocks. Each convolutional block contains 64 filters. In each FM block, three basis functions are equipped with three differentsized filters for convolution, , 33, 77 and 1111. While the filter size in all other convolutional blocks is fixed as 33.
In this study, we implement the proposed method on the Pytorch platform
[21] and train the network using the following model(4) 
where denotes the th paired HSI and RGB image, respectively. denotes the number of training pairs. denotes the ultimate mapping function defined by the proposed network and represents all involved parameters. represents the norm based loss. In the training stage, we employ the Adam optimizer [24]
with the weight decay 1e6. The learning rate is initially set as 1e4 and halved in every 20 epochs. The batch size is 128. We terminate the optimization at the
th epoch.Visual superresolution results of the 31th band and the reconstruction error maps of an example image from the NTIRE2018 dataset for different methods. The reconstruction error is obtained by computing the meansquare error between two spectrum vectors from the superresolution result and the ground truth at each pixel. Best view on the screen.
4.2 Performance Evaluation
Performance comparison
Under the same experimental settings, we evaluate all those methods on the testing set from each benchmark dataset. Their numerical results are reported in Table 1. It can be seen that these DCNN based comparison methods often produce more accurate results than the interpolation or the sparsity induced SSR method. For example, on the NTIRE2018 dataset, the RMSE of the Aitor and HSCNN+ are less than 2.0 while that of the BI and Arad are higher than 4.0. Nevertheless, the proposed method obviously outperforms these DCNN based competitors. For example, compared with the stateoftheart HSCNN+, the proposed method reduces the RMSE by 0.43 and improves the PSNR by 0.67db on the CAVE dataset. On the NTIRE2018 dataset, the decrease on RMSE is even up to 0.52 and the improvement on PSNR is up to 3.19db. This profits from the ability of the proposed method in adaptively determining the receptive field size and the mapping function for each pixel. With such an ability, the proposed method is able to handle each pixel more flexibly. Moreover, since various mapping functions can be approximated by the mixture of the learned basis functions, the proposed method can better generalize to the unknown pixels.
In addition, as shown in Table 1, the proposed method also performs better than two baselines, , DCNN and MCNet. For example, on the NTIRE2018 dataset, the PSNR obtained by the proposed method is higher than that of DCNN by 1.89db and higher than that of MCNet by 0.86. Since the only difference between the proposed method and DCNN is the discrepancy between the convolutional block and the proposed FM block, the superiority of the proposed method demonstrates that the proposed FM block is much powerful than the convolutional block for SSR. Similarly, the advantage of the proposed method over MCNet clarifies that the proposed network architecture is more effective than the multicolumn architecture in SSR.
To further clarify the above conclusions, we plot some visual superresolution results of different methods on three datasets in Figure 4, Figure 5 and Figure 6. As can be seen, the superresolution results of the proposed method have more details and show less reconstruction error than other competitors. In addition, we also sketch the recovered spectrum curves of the proposed method in Figure 7. It can be seen that the spectra produced by the proposed method are very close to the ground truth.
Pixelwise mixing weights
In this study, we mix the outputs of the basis functions with pixelwise weights to adaptively learn the pixelwise mapping. To validate that the proposed method can effectively produce the pixelwise weights as expected, we choose an example image from the NTIRE2018 and visualize the produced pixelwise weights in each FM block, as shown in Figure 8. We can find that, i) pixels from different categories or spatial positions are often given different weights. For example, in the second weight map generated by , the weights for the pixels from ’road’ are obviously smaller than that for the pixels from ’tree’. ii) Pixels from the same category are pone to be given similar weights. For example, pixels from ’road’ are given similar weights in each weight map in Figure 8 (a)(b). To further clarify these two aspects of observations, we visualize the weight maps of some other images generated by the FM block in Figure 9, where similar phenomenon can be observed. iii) In the intermediate FM blocks (, and in Figure 8), the high level block (, ) can distinguish finer difference between pixels than the low level block (, ), viz., only highly similar pixels will be assigned to similar weights. iv) Due to being forced to match the output, in the weight maps generated by the ultimate output block , the weight difference between pixels from various categories is not as obvious as that in previous FM block (, and ), as shown in Figure 8(a)(b)(d).
According to the above observations, we can conclude that the proposed network can effectively generate the pixelwise mixing weights and thus is able to pixelwisely determine receptive field size and mapping function.
4.3 Ablation study
In this part, we carry out an ablation study on the NTIRE2018 dataset to demonstrate the effect of the different ingredients, the number of basis functions and the number of FM blocks on the proposed network.
Methods  RMSE  PSNR  SAM  SSIM 

Ours w/o mix  1.10  48.44  1.16  0.9950 
Ours w/o fusion  1.05  48.97  1.09  0.9953 
Ours  1.03  49.29  1.05  0.9955 
Methods  RMSE  PSNR  SAM  SSIM 

Ours (1)  1.47  45.82  1.57  0.9913 
Ours (2)  1.08  48.76  1.10  0.9952 
Ours (3)  1.03  49.29  1.05  0.9955 
Ours (5)  0.98  49.87  1.00  0.9958 
Methods  RMSE  PSNR  SAM  SSIM 

Ours (2)  1.05  48.95  1.09  0.9954 
Ours (3)  1.03  49.29  1.05  0.9955 
Ours (4)  1.05  49.42  1.05  0.9954 
Ours (6)  1.00  49.59  1.02  0.9956 
Effect of Different Ingredients
In the proposed FM network, there are two important ingredients, namely the pixelwise mixture and the intermediate feature fusion. To demonstrate the effect of these two ingredients, we compare the proposed method with its two variants. One (, ’Ours w/o mix’) disables the pixelwise mixture in the proposed network, which implies mixing the outputs of the basis functions with equal weights; while the other (, ’Ours w/o fusion’) disables the intermediate feature fusion, , removing the skip connections as well as the FM block . We plot the training loss curves and the testing PSNR curves of these three methods in Figure 10. As can be seen that the proposed method obtains the smallest training loss and the highest testing PSNR. More numerical results are reported in Table 2. It can be seen that the proposed method still obviously outperforms these two variants. This demonstrate that both the pixelwise mixture and the intermediate feature fusion are crucial for the proposed network.
Effect of the Number of Basis Functions
In the above experiments, we fix the number of basis functions as in each FM block. Intuitively, increasing will enlarge the expressive capacity of the basis fictions and thus lead to better performance, vice versa. To validate this, we evaluate the proposed method on the NTIRE2018 dataset using different , , 1, 2, 3 and 5. The obtained numerical results are provided in Table 3. As can be seen, the reconstruction accuracy gradually increases as the number of basis functions increases. When 1, the proposed method degenerates to the convolutional blocks based network, which shows the lowest reconstruction accuracy in Table 3. When increases to , the obtained RMSE is even lower than 1.0 and the PSNR is close to 50db. However, there is also no free lunch in our case and a larger often results in higher computational complexity. Therefore, we make a balance between the accuracy and efficiency by tuning . This makes it possible to customize the proposed network for a specific device.
Effect of the Number of FM Blocks
In addition to the number of basis functions, the model complexity of the proposed method also depends on the number of the FM blocks. To demonstrate the effect of on the proposed method, we evaluate the proposed method on the NTIRE2018 dataset using different number of FM blocks, , =2,3,4 and 6. The obtained numerical results are reported in Table 4. Similar as the case of , the performance of the proposed method can be gradually improved as the number of FM blocks increases. We also find an interesting thing, increasing may be more effective than increasing in terms of boosting the performance of the proposed method.
5 Conclusion
In this study, to flexibly handle the pixels from different categories or spatial positions in HSIs and consequently improve the performance, we present a pixelaware deep functionmixture network for SSR, which is composed of multiple FM blocks. Each FM block consists of one mixing function and some basis functions, which are implemented as parallel DCNN based subnets. Thereinto, the basis functions take different sized receptive fields and learn distinct mapping schemes; while the mixing function generates the pixelwise weights to linearly mix the outputs of all these basis functions. This enables to pixelwisely determine the receptive field size and mapping function. Moreover, we stack several such FM block in the network to further increase its flexibility in learning the pixelwise mapping. To boost the SSR performance, we also fuse the intermediate features generated by the FM blocks for feature reuse. With extensive experiments on three benchmark SSR datasets, the proposed method shows superior performance over several existing stateoftheart competitors.
It is worth noting that this study employs the linear mixture to approximate the pixelwise mapping function. In the future, it is interesting to exploit the nonlinear mixture. In addition, it is promising to generalize the idea in this study to other tasks requiring pixelwise modelling, , semantic segmentation, colorization
References

[1]
J. Aeschbacher, J. Wu, and R. Timofte.
In defense of shallow learned spectral reconstruction from rgb
images.
In
Proceedings of the IEEE International Conference on Computer Vision
, pages 471–479, 2017.  [2] F. Agostinelli, M. R. Anderson, and H. Lee. Adaptive multicolumn deep neural networks with application to robust image denoising. In Advances in Neural Information Processing Systems, pages 1493–1501, 2013.

[3]
N. Akhtar and A. Mian.
Nonparametric coupled bayesian dictionary and classifier learning for hyperspectral classification.
IEEE transactions on neural networks and learning systems, 29(9):4038–4050, 2018.  [4] A. AlvarezGila, J. Van De Weijer, and E. Garrote. Adversarial networks for spatial contextaware spectral image reconstruction from rgb. In Proceedings of the IEEE International Conference on Computer Vision, pages 480–490, 2017.
 [5] B. Arad and O. BenShahar. Sparse recovery of hyperspectral signal from natural rgb images. In European Conference on Computer Vision, pages 19–34. Springer, 2016.
 [6] B. Arad and O. BenShahar. Filter selection for hyperspectral estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3153–3161, 2017.
 [7] A. Chakrabarti and T. Zickler. Statistics of realworld hyperspectral images. In CVPR 2011, pages 193–200. IEEE, 2011.
 [8] D. Cireşan, U. Meier, and J. Schmidhuber. Multicolumn deep neural networks for image classification. arXiv preprint arXiv:1202.2745, 2012.

[9]
G. Cybenko.
Approximation by superpositions of a sigmoidal function.
Mathematics of control, signals and systems, 2(4):303–314, 1989.  [10] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
 [11] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, and X. Li. Hyperspectral image superresolution via nonnegative structured sparse representation. IEEE Transactions on Image Processing, 25(5):2337–2352, 2016.
 [12] B. S. Everitt. Finite mixture distributions. Encyclopedia of statistics in behavioral science, 2005.
 [13] Y. Fu, T. Zhang, Y. Zheng, D. Zhang, and H. Huang. Joint camera spectral sensitivity selection and hyperspectral image recovery. In Proceedings of the European Conference on Computer Vision (ECCV), pages 788–804, 2018.
 [14] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [15] D. Ha, A. Dai, and Q. V. Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
 [16] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask rcnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

[17]
K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 770–778, 2016.  [18] H. Hou and H. Andrews. Cubic splines for image interpolation and digital filtering. IEEE Transactions on acoustics, speech, and signal processing, 26(6):508–517, 1978.
 [19] X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool. Dynamic filter networks. In Advances in Neural Information Processing Systems, pages 667–675, 2016.
 [20] Y. Jia, Y. Zheng, L. Gu, A. SubpaAsa, A. Lam, Y. Sato, and I. Sato. From rgb to spectrum for natural scenes via manifoldbased mapping. In Proceedings of the IEEE International Conference on Computer Vision, pages 4705–4713, 2017.
 [21] N. Ketkar. Introduction to pytorch. In Deep learning with python, pages 195–208. Springer, 2017.
 [22] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image superresolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
 [23] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeplyrecursive convolutional network for image superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
 [24] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [25] C. Lanaras, E. Baltsavias, and K. Schindler. Hyperspectral superresolution by coupled spectral unmixing. In Proceedings of the IEEE International Conference on Computer Vision, pages 3586–3594, 2015.
 [26] D. Manolakis and G. Shaw. Detection algorithms for hyperspectral imaging applications. IEEE signal processing magazine, 19(1):29–43, 2002.
 [27] S. Mei, X. Yuan, J. Ji, Y. Zhang, S. Wan, and Q. Du. Hyperspectral image spatial superresolution via 3d full convolutional neural network. Remote Sensing, 9(11):1139, 2017.

[28]
V. Nair and G. E. Hinton.
Rectified linear units improve restricted boltzmann machines.
InProceedings of the 27th international conference on machine learning (ICML10)
, pages 807–814, 2010.  [29] O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pages 234–241. Springer, 2015.
 [30] K. Sato and R. D. Lauro. Deep networks with internal selective attention through feedback connections. In International Conference on Neural Information Processing Systems, 2014.
 [31] G. Seif and D. Androutsos. Large receptive field networks for highscale image superresolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 763–772, 2018.
 [32] Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu. Hscnn+: Advanced cnnbased hyperspectral recovery from rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 939–947, 2018.
 [33] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [34] R. Timofte, S. Gu, J. Wu, and L. Van Gool. Ntire 2018 challenge on single image superresolution: methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 852–863, 2018.
 [35] H. Van Nguyen, A. Banerjee, and R. Chellappa. Tracking via object reflectance using a hyperspectral video camera. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern RecognitionWorkshops, pages 44–51. IEEE, 2010.
 [36] Z. Wei, Y. Sun, J. Wang, H. Lai, and S. Liu. Learning adaptive receptive fields for deep image parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2434–2442, 2017.
 [37] Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu. Hscnn: Cnnbased hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the IEEE International Conference on Computer Vision, pages 518–525, 2017.
 [38] F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar. Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE transactions on image processing, 19(9):2241–2253, 2010.
 [39] F. Yu and V. Koltun. Multiscale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
 [40] L. Zhang, W. Wei, C. Bai, Y. Gao, and Y. Zhang. Exploiting clustering manifold structure for hyperspectral imagery superresolution. IEEE Transactions on Image Processing, 27(12):5969–5982, 2018.
 [41] L. Zhang, W. Wei, Y. Zhang, C. Shen, A. van den Hengel, and Q. Shi. Cluster sparsity field: An internal hyperspectral imagery prior for reconstruction. International Journal of Computer Vision, 126(8):797–821, 2018.
 [42] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image superresolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2472–2481, 2018.
 [43] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma. Singleimage crowd counting via multicolumn convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 589–597, 2016.

[44]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba.
Learning deep features for discriminative localization.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
Comments
There are no comments yet.