A spatial-spectral prior deep network for hyperspectral image super-resolution
Recently, single gray/RGB image super-resolution reconstruction task has been extensively studied and made significant progress by leveraging the advanced machine learning techniques based on deep convolutional neural networks (DCNNs). However, there has been limited technical development focusing on single hyperspectral image super-resolution due to the high-dimensional and complex spectral patterns in hyperspectral image. In this paper, we make a step forward by investigating how to adapt state-of-the-art residual learning based single gray/RGB image super-resolution approaches for computationally efficient single hyperspectral image super-resolution, referred as SSPSR. Specifically, we introduce a spatial-spectral prior network (SSPN) to fully exploit the spatial information and the correlation between the spectra of the hyperspectral data. Considering that the hyperspectral training samples are scarce and the spectral dimension of hyperspectral image data is very high, it is nontrivial to train a stable and effective deep network. Therefore, a group convolution (with shared network parameters) and progressive upsampling framework is proposed. This will not only alleviate the difficulty in feature extraction due to high-dimension of the hyperspectral data, but also make the training process more stable. To exploit the spatial and spectral prior, we design a spatial-spectral block (SSB), which consists of a spatial residual module and a spectral attention residual module. Experimental results on some hyperspectral images demonstrate that the proposed SSPSR method enhances the details of the recovered high-resolution hyperspectral images, and outperforms state-of-the-arts. The source code is available at <https://github.com/junjun-jiang/SSPSR>READ FULL TEXT VIEW PDF
Recently, single gray/RGB image super-resolution (SR) methods based on d...
Hyperspectral images are of crucial importance in order to better unders...
Different from traditional hyperspectral super-resolution approaches tha...
Hyperspectral (HS) images contain detailed spectral information that has...
The recent advancement of deep learning techniques has made great progre...
Due to the limitations of hyperspectral imaging systems, hyperspectral
This work studies Hyperspectral image (HSI) super-resolution (SR). HSI S...
A spatial-spectral prior deep network for hyperspectral image super-resolution
Unlike human eyes, which can only be exposed to visible light, hyperspectral imaging is an imaging technique for collection and processing information across the entire range of electromagnetic spectrum . The most important feature of hyperspectral imaging is the combination of imaging technology and spectral detection technology. While imaging the spatial features of the target, each spatial pixel in a hyperspectral image is dispersed to form dozens or even hundreds of narrow spectral bands for continuous spectral coverage. Therefore, hyperspectral images have a strong spectral diagnostic capability to distinguish materials that look similar for humans.
However, the hyperspectral imaging system is often compromised due to the limitations of the amount of the incident energy. There is always a tradeoff between the spatial and spectral resolution of the real imaging process. With the increase of spectral features, if all other factors are kept constant to ensure a high signal-to-noise ratio (SNR), the spatial resolution will inevitably become a victim. Therefore, how to obtain a reliable hyperspectral image with high-resolution still remains a very challenging problem.
Super-resolution reconstruction can infer a high-resolution image from one or sequential observed low-resolution images . It is a post-processing technique that does not require hardware modifications, and thus could break through the limitations of the imaging system. According to whether the auxiliary information (such as panchromatic, RGB, or multispectral image) is utilized, hyperspectral image super-resolution techniques can be divided into two categories: fusion based hyperspectral image super-resolution (sometimes called hyperspectral image pansharpening) and single hyperspectral image super-resolution 
. The former merges the observed low-resolution hyperspectral image with the higher spatial resolution auxiliary image to improve the spatial resolution of the observed hyperspectral image. These fusion approaches based on Bayesian inference, matrix factorization, sparse representation, or recently advanced deep learning techniques have flourished in recent years and achieved considerable performance[48, 58, 4]. However, most of these methods all assume that the input low-resolution hyperspectral image and the high-resolution auxiliary image are well co-registered. In practical applications, obtaining such well co-registered auxiliary images would be difficult, if not impossible [8, 35, 64].
Compared with fusion based hyperspectral image super-resolution, single hyperspectral image super-resolution has received less attention and there has been limited advancement due to the spectral patterns in hyperspectral images and no additional auxiliary information. To exploit the abundant spectral correlations among successive spectral bands, several single hyperspectral image super-resolution approaches based on sparse and dictionary learning or low-rank approximation have been developed [20, 17, 46, 21]. However, these hand-crafted priors can only reflect the characteristics of one aspect of the hyperspectral data.
Recently, deep convolutional neural network (DCNN) has shown extraordinary capability of modelling the relationship between the low-resolution images and high-resolution ones, i.e., single gray/RGB image super-resolution task [13, 30, 62]. The practiced rationale in these schemes can be summarized as follows: given a very large number of example pairs of original images and their corrupted versions, a deep network can be learned to restore the degraded image to its source.
Specifically, compared with the single gray/RGB image super-resolution based on deep learning, in the single hyperspectral image super-resolution task, it is nontrivial to train a computationally efficient and effective deep network. This is mainly due to the following reasons: on the one hand, hyperspectral images are not as popular as natural images, the training sample number of available hyperspectral image dataset is extremely small. Even if we can collect a lot of images, hyperspectral images may be obtained by different hyperspectral cameras. The differences in the number of spectral bands and imaging conditions will make it more difficult to establish a unified deep network. On the other hand, the spectral dimensionality of hyperspectral image data itself is very high. Unlike traditional gray/RGB images, hyperspectral images often have hundreds of contiguous spectral bands, which calls for larger dataset to guarantee the training process. Otherwise, it is easy to cause the over-fitting problem.
In order to deal with the above problems caused by the lack of data and the inability to fully exploit the spatial information and spatial correlation characteristics in hyperspectral data, a group convolution (with shared network parameters) and progressive upsampling framework is proposed in this paper, which can greatly reduce the size of the model and make it feasible to obtain stable training results under small data conditions. For exploiting the spatial and spectral correlation characteristics of hyperspectral data, we carefully design the spatial-spectral prior network (SSPN), which cascades multiple spatial-spectral blocks (SSBs). For each SSB, it contains a spatial residual module and a spectral attention residual module. The former consists of a standard residual block which is used to exploit spatial information of the hyperspectral data, while the latter consists of a spectral attention residual module which is used to extract spectral correlations. Through short and long skip connections, a residual in residual architecture is formed, which makes the spatial-spectral feature extraction more efficient.
Figure 1 shows the network architecture of our spatial-spectral prior network based super-resolution network (SSPSR). The input low-resolution hyperspectral image is firstly divided into several overlap groups. For each group, a branch network is applied to extract the spatial-spectral features of the input grouped hyperspectral images (a subset of the entire hyperspectral linages) and upscale them with a smaller unsampling factor (compared with the final target). And then, the output features of all branches are concatenated and fed to the following global spatial-spectral feature extraction and upsampling networks. Note that in order to let the SSPN in branch network and global network share the same structure, we insert a “reconstruction” layer after each branch upsampling module. Similar to many previous super-resolution networks, we also adapt a global residual structure to facilitate the prediction of the target. Therefore, in the proposed SSPSR network, the transmission of information flow is very flexible by designing these short (refer to residual spatial/spectral blocks), long (refer to the spatial-spectral prior network), global skip links. During the training phase, we share the network parameters of each branch across all groups, which avoids heavy computational cost and simplifies the complex optimization process. Comprehensive ablation studies demonstrate the effectiveness of each component and the fusion strategy used in the proposed method. Comparison results with state-of-the-art single hyperspectral image super-resolution methods on two public datasets demonstrate the effectiveness of the proposed SSPSR network.
We summarize the main contributions of this paper as follows. Considering the limited hyperspectral training samples and the high dimensionality of spectral bands, it is difficult to learn the mapping relationship from low-resolution space to high- resolution space in one-step upsampling. Inspired by the idea of some general image super-resolution methods, which con- duct super-resolution progressively, we apply the progressive upsampling scheme to the single hyperspectral image super-resolution task and verify its effectiveness. In addition, we propose a spectral grouping and parameter sharing strategy to greatly reduce the parameters of the model and alleviate the difficulty in feature extraction. Inspired by the efficient residual learning and attention mechanism, we develop a spatial-spectral feature extraction network to fully exploit the spatial-spectral prior of hyperspectral images.
The rest of this paper is organized as follows: Section II presents the related work of hyperspectral image super-resolution. In Section III, we give the details of our SSPSR network architecture and the SSB. Then, the network configuration and experimental results including ablation analysis are reported in Section IV. Finally, some conclusions are drawn in Section V.
In this section, we briefly review some methods that are most relevant to our work, which include fusion based hyperspectral image super-resolution, single hyperspectral image super-resolution, and single gray/RGB image super-resolution. A list of hyperspectral image super-resolution resources collected by Jiang can be found at .
Remote sensing image fusion is a very challenging problem with long history. Generally speaking, this problem can be classified to two categories, pansharpening and super-resolution. In order to improve the spatial resolution of the multispectral images, some previous works cast the fusion problem into a variational reconstruction task by blending a panchromatic image with higher resolution. This is often referred as pansharpening. A taxonomy of pansharpening based fusion methods can be found in the literature[2, 18, 28, 33].
Recently, low-resolution hyperspectral image and high-resolution multispectral image fusion based spatial resolution improvement technique, which is often referred as hyperspectral image super-resolution, has received extensive attention. For example, Yokoya et al.  proposed a coupled nonnegative matrix factorization (CNMF) based approach to infer the high-resolution hyperspectral images with a pair of high-resolution multispectral image and low-resolution hyperspectral image. To exploit the redundancy and correlation in spectral domain, some approaches have been proposed by exploiting the sparsity , non-local similarity [14, 53], superpixel-guided self-similarity , clustering manifold structure 
, tensor and low-rank constraints[44, 12]. Most recently, some deep learning based methods have gradually become popular due to its superior performance and fewer assumptions regarding the image prior [54, 11, 37, 7]. Inspired by the iterative optimization based on the observation model, some deep unfolding network for fusion based hyperspectral image super-resolution methods are becoming popular in recent years [51, 49, 10]. The common idea of the above fusion based hyperspectral image super-resolution methods is to borrow high-frequency spatial information from high-resolution auxiliary image, and fuse these information to the target high-resolution hyperspectral image. Though these approaches have achieved very good performance, the major drawback of them is that a well co-registered auxiliary image with a higher resolution is needed. However, obtaining such a well co-registered auxiliary image would be arduous, if not impossible in practical applications [8, 35, 64].
Without co-registered auxiliary image, single hyperspectral image super-resolution methods have still attracted considerable attention in reality. The pioneer work is proposed by Akgun et al. , in which a hyperspectral image acquisition model and the projection onto convex sets (POCS) algorithm  is applied to reconstruct the high-resolution hyperspectral image. By incorporating the low-rank and group-sparse constraints, Huang et al.  developed a novel method to tack with the unknown blurring problem. Recently, variants of sparse representations and dictionary learning based approaches are widely studied [46, 27]
. However, these methods have some drawbacks. First, they usually need to solve some complex and time consuming optimization problems in the testing phase. Second, the image priors are often hand-crafted and based on the internal example without consideration of any external information from external samples. Due to the superior performance in many computer vision problems, deep learning techniques have also been introduced into the single hyperspectral image super-resolution task very recently. For example, Yuanet al.  and Xie et al.  firstly super-resolved the hyperspectral image based on the DCNNs, and then applied the nonnegative matrix factorization (NMF) to guarantee the spectral characteristic for the intermediate results. Essentially, they utilized DCNNs and matrix factorization to exploit the spatial and spectral features, separately, in a non-end-to-end manner. In , Mei et al. introduced a 3D full convolutional neural network to extract the feature of hyperspectral images. Although 3D convolution can well exploit the spectral correlation, the computational complexity is very large. Li et al.  proposed a grouped deep recursive residual network (GDRRN) by designing a group recursive module and embedding it into a global residual structure. This group-wise convolution and recursive structure can guarantee that it could yield very good performance. In our previous work , a feature pyramid block is designed to extract multi-scale features of the hyperspectral images. Most recently, inspired by the work of , which states that the image prior can be found within a CNN itself, Sidorov et al.  developed an effective single hyperspectral-image restoration algorithm. In general, these deep methods achieve better results than traditional methods. However, due to the limited hyperspectral training samples and the high dimensionality of spectral bands, it is difficult to fully exploit the spatial information and the correlation among the spectra of the hyperspectral data.
Recently, DCNN based approaches have achieved excellent performance over the single gray/RGB image super-resolution problem. The seminal work by Dong et al.  proposes a three layer convolutional neural network for the end-to-end image super-resolution(SRCNN) and achieved much better performance over conventional non-deep learning based methods. Benefiting from the residual learning, in VDSR  and DRCN  Kim et al. introduced very deep network for image super-resolution and achieved better results than the three layer SRCNN. The residual structure was then adopted in LapSRN , DRRN , and EDSR 
. By simply attaching residual blocks, introducing the feedback, or incorporating non-local operations into a recurrent neural network, RDN, DBPN , and NLRN  are proposed. Inspired by the SE block , Zhang et al. developed a very deep network named RCAN by incorporating the channel attention module . Most recently, Dai et al. introduced the non-local block and presented a second-order attention network (SAN) to capture the long-range dependencies . Although fascinating results have been achieved, these methods are designed for the gray/RGB images, which have only one or three channels. When directly applying these approaches to the hyperspectral image, they will neglect the spectral correlations among spectra of the hyperspectral data, hindering the representation capacity of the network. In addition, for single gray/RGB image super-resolution, when using one- or three-channel pictures as network input, in order to extract features, a feature map of 64 (or more) channels is usually used. Similarly, if we also apply this 20-fold (or more) parameter growth network design scheme to hyperspectral images which have hundreds of channels, it will lead to a sharp increase in parameters. However, there is not enough hyperspectral data to support the model training like for the gray/RGB images.
In Fig. 1
, we show the network architecture of the proposed SSPSR method. It mainly consists of two parts: the branch networks and global network. For each branch network or the global network, it includes shallow feature extraction, spatial-spectral deep feature extraction, upsampling module, and reconstruction part. We denotethe input low-resolution hyperspectral image, the corresponding output high-resolution hyperspectral image, and the ground truth (original high-resolution hyperspectral image) of the input image . Our goal is to predict the high-resolution hyperspectral image from the input low-resolution hyperspectral image by the proposed end-to-end super-resolution reconstruction network,
where denotes the function of the proposed SSPSR method.
Different from previous methods, which treat the hyperspectral images as multiple single channel images (reconstructing them separately) or as a whole, we divide the whole hyperspectral image into some groups. In this way, we can not only exploit the correlations among neighboring spectral bands of hyperspectral images, but also reduce the dimensionality of features of each group. Inspired by the success of the recently proposed residual network structure, which has achieved very good performance in the field of image restoration, we specifically design a SSB based on residual network structure. As shown in Fig. 1, the proposed SSPSR network contains several branch networks and a global network. For each branch network and the global network, they first extract the shallow features and fed them to the SSPN, then upscale the outputs of SSPN with an intermediate upsampling factor. By cascading the parallel branch networks with the global network, we can super-resolve the input low-resolution hyperspectral image in a coarse-to-fine manner. In the following, we will give details of the branch network and global network, respectively.
Specifically, the input low-resolution hyperspectral image is firstly divided into groups, . It should be noted that, in our settings the neighboring groups may have overlaps. More details about the settings can be found at the experiment section. For each group , we directly apply one convolutional layer to obtain its shallow features as investigated in previous work [30, 62],
where denotes convolution operation, i.e., feature extraction layer. is then used for deep feature extraction with the proposed SSPN. Consequently, we can further have
where denotes the function of the proposed SSPN, which contains SSBs and we will present its details in the following.
The output of SSPN can be treated as the deep features of one grouped hyperspectral images. In order to alleviate the burden of the final super-resolution reconstruction, we adopt a strategy of progressive super-resolution reconstruction. Particularly, we add an upsampling module in the middle of the network (before feeding the output of branch SSPN to the global SSPN), which has proven to be a very effective technique, especially when the magnification is very large. Thus, by the upsampling module we obtain the upscaled feature maps,
where and denote an upsampling module and upscaled features respectively. In this paper, we leverage the PixelShuffle  operator to conduct the upsampling procedure.
Before feeding the upscaled features to the following global SSPN, we add one Conv layer after each branch upsampling module to reduce the number of feature channels to the spectral number of each input group. Therefore, the output of the branch network will have the same channels as the input grouped hyperspectral images, and we call this layer as a “reconstruction” layer,
where denotes the “reconstruction” layer (Here we use a lowercase term “rec” to represent a pseudo-reconstruction operation). By this Conv layer, each branch can be seen as a super-resolution reconstruction subnetwork. Another purpose of designing this layer is to make the branch SSPN and global SSPN have the same network structure.
After extracting features from different groups with the branch networks, we concatenate them together from all branches (as shown in the “concatenation operator” of Fig. 1), i.e., . It should be noted that if the neighboring groups have overlaps, the integrated feature maps can be generated according to their original spectral band position and by averaging feature values in the overlapping bands. Similar to the local branch, before feeding the contacted features into the global SSPN, we apply one Conv layer to extract the “shallow features”,
where is similar to and is used to extract corresponding “shallow features” of the input contacted features of all branch networks.
And then, we further feed into the global SSPN, whose structure is the same as the local one,
where refers to the global version of . In this way, we extract the spatial-spectral features of the input hyperspectral images.
To upscale the obtained features to the target size, here we apply upsampling module once more (progressively reconstruction) to generate the upscaled spatial-spectral feature maps,
where refers to the global version of .
The final super-resolved hyperspectral images can be then obtained via one reconstruction layer by feeding the upscaled spatial-spectral features and the upscaled input hyperspectral images,
where refers to the Bicubic upsampling version of the input low-resolution hyperspectral images, is similar to and is used to extract shallow features of the input Bicubic upscaled hyperspectral images for residual learning, and is the reconstruction operation that has one layer. Here, “” is referred to as the residual learning.
Image super-resolution is a very ill-posed problem, which calls for additional prior (regularization) to constrain the reconstruction procedure. Traditional approaches all try to design sophisticated regularization terms such as total variation (TV), sparse, low-rank, by hand [5, 14, 15, 61, 44]. Therefore, the performance of these algorithms is highly dependent on whether the designed prior can well characterize the observed data. As for the hyperspectral image super-resolution problem, it is crucial to effectively exploit the intrinsic properties of hyperspectral images, i.e., the non-local self-similarity in spatial and the high correlation across spectra. Previous manually designed constraints are insufficient for accurate hyperspectral image restoration.
In this paper, we advocate a spatial-spectral feature extraction network (SSPN) to exploit the spatial and spectral prior. In particular, SSPN cascades spatial-spectral blocks (SSBs) and can be formulated as,
where refers to the function of the -th SSB, and is the input of the -th SSB and is the extracted features. Noted that we use the notations from the local branch network to demonstrate the detailed design of the local SSPN, and the global SSPN is the same to the local one.
To facilitate the prediction of the target, the long skip connection is further introduced in SSPN. This will lead to the direct passing of the low frequency features of the current features to the end, and let the current residual body pay more attention to the high frequency information. Therefore, the output of the SSPN can be obtained by
Here,“” is referred to as the residual learning (same as below). This residual in residual structure can enable fast as well as stable training.
In this paper, we specifically design the SSB to exploit the spatial-spectral information from the hyperspectral images. In particular, each SSB has two parts, i.e. a spatial residual module and a spectral attention residual module. The architecture of SSB is illustrated in Fig. 2. For the spatial residual module, we leverage the standard residual block with 33 convolutions to extract the spatial features,
where refers to the function of the spatial residual module for the -th SSB, and is the spatial feature for the -th SSB. The standard residual block can well extract the spatial information of a hyperspectral image.
However, due to the strong correlation between the spectra of a hyperspectral image, standard residual convolutional networks cannot effectively extract the spectral dependencies. The spectral correlation, which is characterized by that there exists strong correlation among neighboring spectral bands of hyperspectral image, has been widely used for hyperspectral image reconstruction and analysis [58, 50]. To exploit this correlation, we can use all the spectral bands to obtain the newly reconstructed spectral band , i.e., .
are the linear combination (reconstruction) weights. If similar spectral bands share similar weights, the correlation information will be embedded in the reconstructed spectral band, thus exploiting the correlation among neighboring spectral bands of hyperspectral image. If we relax the weights to any learnable parameters, this will be equal to learning a set of weight vectors, and thus obtaining a new representation of the hyperspectral image. Mathematically, this can be achieved by some 11 filters (bottleneck layer), whose weights are . By designing a spectral network with 1
1 filters, we can expect to fully exploit the correlations between different spectral bands. It is worth noting that we further apply the ReLU layer to enhance its representation ability. Therefore, the structure of the SSB is designed as the combination of a spatial residual module and a spectral attention residual module as shown in Fig.2. Thus, we have
where denotes the spectral network of the -th SSB.
To further improve the representation ability of spectral information as well as the entire network, we are inspired by Zhang et al. 
and introduce the channel attention mechanism to adaptively rescale each channel-wise feature by modeling the interdependencies across feature spectra. Specifically, a global average pooling layer is applied to the extracted feature maps of previous spectral network to obtain a global context embedding vector. And then, two thin fully connected layers with a simple gating mechanism (by sigmoid function) is applied to learn nonlinear interactions between spectra. Then we obtain the final channel scaling coefficient vector, which is used to reweight the extracted feature maps. The output of the spectral attention residual module is simply computed by
Average quantitative performance by different loss functions over four testing images of Chikusei dataset with respect to six PQIs when the upsampling factor is 4.
In order to measure the super-resolution performance, several cost functions have been investigated to make the super-resolution results approximate to ground truth high-resolution images. In the current literature, , , perceptual, and adversarial losses are the most commonly used loss functions. When compared with perceptual and adversarial losses, which may restore details that do not exist in the original images and is undesirable in remote sensing field, and losses are more credible. As for loss, it encourages finding pixel-wise averages of plausible solutions which are typically overly-smooth. Due to that loss can effectively penalize small errors and maintain better convergence throughout the training phase, we adopt loss to measure the reconstruction accuracy of the network. Specifically, the loss is defined by mean absolute error (MAE) between all the reconstructed images and the ground truth:
where and are the -th reconstructed high-resolution hyperspectral image and ground truth hyperspectral image, respectively. denotes the number of images in one training batch, and refers the parameter set of our network.
However, above-mentioned loss is primarily designed for general image restoration tasks. Although they can well preserve the spatial information of the super-resolution results, the reconstructed spectral information may be distorted due to the ignorance of the correlations among spectral features. In order to simultaneously ensure the spatial and spectral credibility of the reconstruction results, we introduce the spatial-spectral total variation (SSTV) . It extends the conventional total variation model and accounts for both the spatial and the spectral correlation. In this paper, we add the SSTV to the loss to impose spatial and spectral smoothness simultaneously,
where , , and are functions to compute the horizontal, vertical, and spectral gradient of .
In summary, the final objective loss for the proposed model is a weighted sum of the two losses:
where is used to balance the contributions of different losses. In our experiments, we set it as a constant, .
In Table I, we report the reconstruction results (in terms of objective measurements) when using different losses (more details regarding the experimental settings can be found at the experiment section). Clearly, loss is much more suitable for our task, because it can effectively penalize small errors and maintain better convergence throughout the training phase. By introducing the SSTV constraint, slightly better results can be achieved.
We use Pytorch libraries111https://pytorch.org to implement and train the proposed SSPSR network. We train different models to super-resolve the hyperspectral images for scale factors 4 and 8 with random initialization. We use the ADAM optimizer 
with an initial learning rate of 1e-4 which decays by a factor of 10 when it reaches 30 epochs. In our experiments, we find it will take 40 epochs to achieve a stable performance. The models are trained with a batch size of 32. As in many previous work, we also apply the Bicubic interpolation to downsample the high-resolution hyperspectral images to obtain the corresponding low-resolution hyperspectral images.
Unless otherwise specified, in the following experiments we set the spectral band number () of each group to 8 and the overlap () between neighboring groups to 2. To efficiently process the “edge” spectral bands, we adopt a so called “fallback” dividing strategy. When the last group has less than spectral bands, we select the last bands as the last group. Therefore, the number of groups can be obtained by the following equation,
where is the function that rounds the elements of to the nearest integers towards infinity. In the SSPN, the number of spatial-spectral blocks () is set to 3. We set the size of all Conv layers to 33 except for that in the spectral residual modules, where the kernel size is set to 1
1. To ensure that the size of the feature map is not changed, the zero-padding strategy is applied for theseConv layers with kernel size 33. The Conv layers in shallow feature extraction and SSPN have filters, except for that in the channel-downscaling, i.e., the reconstruction network after the upscaled features at the branch networks (please refer to Eq. (5)).
|Our - w/o GS||4||0.9548||2.4048||0.0116||5.0399||40.1901||0.9424|
|Our - w/o PU||4||0.9520||2.5239||0.0119||5.2329||39.9185||0.9388|
|Our - w/o PS||4||0.9537||2.4152||0.0118||5.0991||40.0712||0.9410|
|Our - w/o SA||4||0.9563||2.3597||0.0115||4.9443||40.3408||0.9438|
|Our - w/o GS||8||0.8622||4.5121||0.0199||8.8459||35.3857||0.8427|
|Our - w/o PU||8||0.8585||4.5542||0.0202||9.0285||35.2489||0.8358|
|Our - w/o PS||8||0.8732||4.0587||0.0194||8.4621||35.7074||0.8522|
|Our - w/o SA||8||0.8760||4.0198||0.0192||8.3650||35.8144||0.8538|
SG: Grouping Strategy, PU: Progressive Upsampling PS: Parameter Sharing, SA: Spectral Attention
|overlaps ()||groups ()||params||FLOPs||CC||SAM||RMSE||ERGAS||PSNR||SSIM|
In this section, we present a detailed analysis and evaluation of our approach on three public hyperspectral image datasets, which include two remote sensing hyperspectral image datasets, i.e., Chikusei dataset 222https://www.sal.t.u-tokyo.ac.jp/hyperdata/ and Pavia Center dataset333http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes, and one nature hyperspectral image dataset, i.e., CAVE dataset 444https://www.cs.columbia.edu/CAVE/databases/multispectral/. We compare the proposed method with eight comparison methods, including four state-of-the-art deep single gray/RGB image super-resolution methods, VDSR , EDSR , RCAN , and SAN , and four representative and most relevant deep single hyperspectral image super-resolution methods, TLCNN , 3DCNN , GDRRN , and DeepPrior 
. We carefully adjust hyperparameters of these comparison methods to achieve their best performance. Bicubic interpolation is introduced as the baseline.
Evaluation measures. Six widely used quantitative picture quality indices (PQIs) are employed to evaluate the performance of our method, including cross correlation (CC) , spectral angle mapper (SAM) , root mean squared error (RMSE), erreur relative globale adimensionnelle de synthese (ERGAS) , peak signal-to-noise ratio (PSNR), and structure similarity (SSIM) . For PSNR and SSIM of the reconstructed hyperspectral images, we report their mean values of all spectral bands. CC, SAM, and ERGAS are three widely adopted quality indices in HS fusion task, while the remaining three indices are commonly used quantitative image restoration quality indices. The best values for these indices are 1, 0, 0, 0, , and 1, respectively.
The proposed SSPSR method contains four main components including Grouping Strategy (GS), Progressive Upsampling (PU), Parameter Sharing (PS), and Spectral Attention (SA). In order to validate the effectiveness of these components, we modify our model and compare their variants. We use the training images from Chikusei dataset as a training set, and evaluate the super-resolution performance (in terms of average objective results) on the four testing images from Chikusei dataset (more details regarding the experimental settings on Chikusei dataset can be found in the following subsection). Table II tabulates the four variants of the proposed method, in which denotes the upsampling scale. In the following, we will give the detailed analysis about them.
Grouping Strategy (GS). To effectively exploit the correlation among neighboring spectral bands of hyperspectral image and reduce the parameters of the model, we design a grouping strategy to divide the input hyperspectral image into some overlap groups. In order to verify the effectiveness of this strategy, we remove the grouping strategy and treat them as one group. As shown in Table II, “Our - w/o GS”, where the grouping strategy is discarded, is getting worse. The grouping strategy leads to a considerable performance improvement, e.g.,+0.17 dB for 4 and +0.45 dB for 8. As for other objective indicators, the gains are also considerable.
In addition to above with/without GS comparisons, we also report the number of parameters and FLOPs as well as the six PQIs of our method under some typical setting for the spectral band numbers () of each group and overlaps () between neighboring groups. The group number is calculated by Eq. (18). As shown in Table II, when and , our method considers all the spectral bands as a whole group () and there is no grouping strategy, i.e., the case of “Our - w/o GS”. When , , and our method will treat each spectral band as a group and this can be seen as a special case, i.e., the band-wise grouping. From the results, we can see that regardless of whether we treat all spectra as a whole or treat them separately, their performance cannot be compared with our proposed grouping strategy. When comparing the two schemes, the band-wise one obtained better performance due to the combination of grouping and parameter sharing. However, it will also greatly increase the computational overhead (please refer to the FLOPs). Because the more branches of the model, the more calculations are required.
We also report the performance of the proposed methods with different settings for the overlaps between neighboring groups, i.e., and . With the increase of overlap (from to ), the performance of our method will be gradually improved, but the calculation amount of the model is also constantly expanding. It is worth noting that because we adopt a strategy of parameter sharing, when we fix the spectral band number and change the overlap , the parameters of the model are the same. In order to achieve a balance among the number of parameters and FLOPs and the objective results, in this paper, we set the and to 8 and 2, respectively.
Progressive Upsampling (PU). To learn the end-to-end relationship between low-resolution input and high-resolution output, there are two commonly used upsampling frameworks, pre-upsampling super-resolution and post-upsampling super-resolution. They either increase the parameters of the network or increase the difficulty of training. Inspired by Laplacian pyramid super-resolution network , we leverage a progressive upsampling super-resolution framework. In this way, it decomposes a difficult task into some easy tasks, thus not only greatly reducing the learning difficulty but also obtaining better performance. In Table II, we report the performance of the proposed SSPSR method without the PU strategy, i.e., “Our - w/o PU”. We remove the upsampling module in the branch networks and obtain the variant of our method. We can see that our method with PU achieves better performance on all the six indices, including the spatial reconstruction fidelity (e.g., RMSE, PSNR and SSIM) and the spectral consistency (CC, SAM, and ERGAS). Especially when the upsampling factor is large, this strategy appears to be paramount. For example, the improvement of CC and PSNR of 8 is greater than that of 4, e.g., +0.045 and +0.45 dB for 4, and +0.181 and +0.58 dB for 8.
Parameter Sharing (PS). In the proposed SSPSR method, in order to make the training process more efficient, we share the network parameters of each branch across all groups. In Table II, we tabulates the comparison results of the proposed SSPSR method with and without parameter sharing strategy. Obviously, by parameter sharing, we have greatly reduced the computational complexity of the model. Although parameter sharing strategy reduces the parameters of the model, it does not weaken the representation ability of the model. Through the parameter sharing strategy555Since the network parameters are mainly dominated by module of SSPN, we can deduce that the parameter ratio between the models with and without parameter sharing is ., we can make full use of the training samples provided by different branches (training “more” data with only one branch network parameters), so that we get a more stable model. From the results, we can see that the overall performance of the parameter sharing strategy is even better than the parametric unsharing method on all six PQIs under and .
Spectral Attention (SA). To exploit the spatial-spectral prior, we apply the bottleneck network (with 11 filters) to extract the correlations among neighboring spectral bands of hyperspectral image. In addition, the attention module is also introduced to model the interdependencies between the spectra of the hyperspectral data. To verify the effectiveness of the SA module, we compare the performance of with and without SA module. As shown in Table II, with the SA mechanism, our method has achieves a slight performance gain compared to “Our - w/o SA” that without SA mechanism. By adding the SA module, although the improvement of each objective index is relatively small, the improvement of spectral confidence (i.e., SAM) is more obvious than that of spatial reconstruction confidence (i.e., PSNR), 2.2% vs. 0.43% for and 11% vs. 1.3% for . This proves that the introduction of SA will be more conducive to the representation of spectral features.
The Chikusei dataset is taken by Headwall Hyperspec-VNIR-C imaging sensor, and it is an urban area in Chikusei, Ibaraki, Japan, taken on 29 July 2014. It has 128 spectral bands in the spectral range from 363 nm to 1018 nm and 25172335 pixels in total.
Due to missing information on the edge, we first crop the center region of the image to obtain a subimage with 23042048128 pixels, which is further divided into training and test data. Specifically, the top region of this image are extracted to form the testing data, which has four non-overlap hyperspectral images with 512512128 pixels. Besides, from the remaining region of the subimage, we extract overlap patches as reference high-resolution hyperspectral images for training (10% of the training data is included as a validation set). When the upsampling factor is 4, we let the extracted patches as 6464 pixels (with 32 pixels overlap); when the upsampling factor is 8, we let the extracted patches as 128128 pixels (with 64 pixels overlap). Here we use different block sizes for different factors mainly because of the following considerations: if the factor is large and the patch size is small, the input information is very limited and this will hinder the training of the network. Therefore, we use a big patch size for the large factor. Note that the low-resolution hyperspectral images is generated by Bicubic downsampling (the Matlab function imresize) the ground truth with a factor of 4 or 8.
Table IV reports the average objective performance over four testing images of all comparison algorithms, where bold represents the best result, underline denotes the second best. We can easily observe that the proposed SSPSR method significantly outperforms other algorithms with respect to all objective evaluation indexes. The average PSNR value of our method is more than 0.30 dB higher than that of the second best method. As a two-step method (first super-resolves the hyperspectral images and then conduct decomposition), TLCNN  can well reconstruct the target hyperspectral images. Similar to our method, GDRRN  also takes a group strategy, and thus can well exploit the spectral information (it achieves the second best results in term of SAM). DeepPrior  is a very novel method, however, it takes much time to adjust the results and there is no superior strategy to determine when to stop iteration. RCAN  and SAN  receive the similar results and are slight better than EDSR . This may be due to the fact that the former two consider the channel attention, and thus can well capture the spectral features of the hyperspectral data.
Fig. 3 and Fig. 4 show the reconstructed composite images of one test hyperspectral image in Chikusei dataset of different comparison methods with upsampling factors = 4 and = 8, respectively. We can also easily observe that the proposed SSPSR method performs better than other algorithms, in the better recovery of both finer-grained textures and coarser-grained structures (please refer to the regions marked with red boxes). At the bottom of these visual comparison results, we also report their PSNR and SSIM values of the reconstructed composite images. Our approach SSPSR still has considerable advantages.
The Pavia Centre dataset is taken by Reflective Optics System Imaging Spectrometer (ROSIS) sensor, and it is a flight campaign over the center area of Pavia, northern Italy, in 2001. It has 102 spectral bands (the water vapor absorption and noisy spectral bands have been removed from the initially 115 spectral bands) and 10961096 pixels in total. It should be noted that in the Pavia Centre scene, regions that contain no information are removed, leaving a meaningful region with 1096715 pixels.
To evaluate the proposed SSPSR method, we crop the center region of the image to obtain a subimage with 1096715 102 pixels, which is further divided into training and testing data. Specifically, the left part of this image are extracted to form the testing data, which has four non-overlap hyperspectral images with 223223 pixels. Besides, from the remaining region of the subimage, we extract overlap patches as reference high-resolution hyperspectral images for training (10% of the training data is included as a validation set). Similar to previous settings, the patch size and low-resolution hyperspectral images are generated accordingly.
Table V tabulates the average performance in terms of six PQIs over four testing images of all competing approaches. We can easily observe that the proposed SSPSR method significantly outperforms other algorithms with respect to almost all objective evaluation indexes. The average PSNR value of our method is 0.3 dB for 4 and 0.2 dB for 8 higher than the second best method. As the most competitive general gray/RGB image super-resolution methods, EDSR, RCAN, and SAN can achieve quite pleasurable results. However, their SAM indices are relatively poor when compared with these single hyperspectral image super-resolution methods, i.e., 3DCNN  and GDRRN .
Fig. 5 and Fig. 6 show the reconstructed composite images and error maps of one test hyperspectral image in Pavia Center dataset of the six most competitive approaches with upsampling factors = 4 and = 8, respectively. The results of EDSR , 3DCNN , and GDRRN  are very blur, while RCAN  and SAN  seem to introduce some noise. The proposed SSPSR method can maintain the main structural information. From the error maps of these methods, we can notice that the proposed method does not include obvious contour information of the image, which indicates that our method can well recover these information. It should be noted that when compared with the situation = 4, the visual results with upsampling factor = 8 are worse. In addition, when we compare the visual results of Fig. 4 and Fig. 6, we also notice that reconstructed results on Pavia Center dataset are worse than these on Chikusei dataset. We think this is mainly due to the limited number of the training samples of the Pavia Center database. This is also a major drawback of these deep learning based methods. That is, they require a large number of training samples, otherwise they are difficult to train a model with promising generalization ability.
The previous experiments are conducted on the Chikusei and Pavia Centre datasets, which are all remotely sensed hyperspectral images. To further verify the effectiveness of the proposed SSPSR method, we also conduct comparison experiments on hyperspectral images of natural scenes. Specifically, we use the CAVE multispectral image database because it is widely used in many multispectral image recovery tasks. The database consists of 32 scenes of everyday objects with spatial size of 512512, including 31 spectral bands ranging from 400nm to 700nm at 10nm steps. To prepare samples for training, we randomly select 20 hyperspectral images from the database (10% samples are randomly selected for evaluations). When the upsampling factor is 4, we extract patches with 6464 pixels (32 pixels overlap) for training; when the upsampling factor is 8, we let the extracted patches as 128128 pixels (with 64 pixels overlap). The corresponding low-resolution hyperspectral image are generated by Bicubic downsampling with a factor of 4 or 8. The remaining 12 hyperspectral images of the database are used for testing, where the original images are treated as ground truth high-resolution hyperspectral images, and the low-resolution hyperspectral inputs are generated similarly as the training samples. For this dataset, we set the spectral band number () of each group to 4 and the overlap () between neighboring groups to 1. Since the Cave dataset can provide more training samples, we use a larger to design our network.
We compare the proposed SSPSR method with some very competitive approaches, EDSR , RCAN , 3DCNN , and GDRRN . The average performance of the CC, SAM, RMSE, ERGAS, PSNR, and SSIM results of competing methods for different upsampling factors on the CAVE dataset are reported in Table VI. From these results, we notice that the 3DCNN method performs worse than other methods. Clearly, the proposed SSPSR method outperforms all other competing methods. The proposed SSPSR method performs much better than EDSR  and RCAN , which focus on exploiting the spatial prior. On average, the PSNR and SSIM values of the proposed SSPSR method for upsampling factor = 4/8 are 0.3/0.4 dB and 0.002/0.012 higher than the second best method, respectively.
Fig. 7 and Fig. 8 show the reconstructed HR hyperspectral images and the corresponding error maps at 480nm, 580nm and 680nm by the competing methods for test images stuffed_toys and real_and_fake_apples with upsampling factors = 4 and = 8, respectively. From the visual reconstruction results, we can see that all the comparison methods can well reconstruct the high-resolution spatial structures of the hyperspectral images. In these error maps, we learn that the proposed method and RCAN method achieve the best reconstruction fidelity in recovering the details of the original hyperspectral images. For example, the edges of the checkerboards and the contours of dog’s ears and apples (please refer to the regions marked with red boxes). In the subfigure (g), we also report the RMSE, PSNR, and SSIM results of each spectral band for the competing methods. Obviously, the proposed SSPSR method performs best in most cases. 3DCNN  and GDRRN , which are designed for the hyperspectral images, can achieve favorable results in some cases, but their performance seems to be unstable when reconstructing different spectral bands.
In this paper, a novel deep neural network based on spatial-spectral prior network (SSPN) is introduced to address the single hyperspectral image super-resolution problem. In particular, in order to discover the spatial and spatial correlation characteristics of hyperspectral data, we carefully designed a spatial-spectral prior network (SSPN) to fully exploit the spatial information and correlation among the different spectral features. In addition, to cope with the problems that the training samples of hyperspectral image are limited and the dimensionality is high, a group convolution (with shared network parameters) and progressive upsampling framework is proposed. In this way, we can expect to greatly reduce the parameters of the model and make it possible to obtain stable training results under small data and large spectral band number conditions. In our introduced network, the transmission of information flow is very flexible by the short, long, global skip links via residual learning. To regularize the network outputs, we adopt a spatial-spectral total variation (SSTV) based constraint to preserve the edge sharpness spectral correlations of the super-resolved high-resolution hyperspectral image. Evaluations on three public hyperspectral datasets demonstrate that our model not only achieves the best performance in terms of some commonly used objective indicators, but also generates clear high-resolution images which are perceptually closer to the ground truth when compared with state-of-the-arts.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3631–3640. Cited by: §II-A, §III-B.
Pan-GAN: an unsupervised learning method for pan-sharpening in remote sensing image fusion using a generative adversarial network. Information Fusion. Cited by: §II-A.
Hyperspectral image superresolution by transfer learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (5), pp. 1963–1974. Cited by: §II-B, Fig. 3, Fig. 4, §IV-B, TABLE IV, TABLE V, §IV.