Source code for paper "Progressive Multi-Scale Residual Network for Single Image Super-Resolution"
Super-resolution is a classical issue in image restoration field. In recent years, deep learning methods have achieved significant success in super-resolution topic, which concentrate on different elaborate network designs to exploit the image features more effectively. However, most of the networks focus on increasing the depth or width for superior capacities with a large number of parameters, which cause a high computation complexity cost and seldom focus on the inherent correlation of different features. This paper proposes a progressive multi-scale residual network (PMRN) for single image super-resolution problem by sequentially exploiting features with restricted parameters. Specifically, we design a progressive multi-scale residual block (PMRB) to progressively explore the multi-scale features with different layer combinations, aiming to consider the correlations of different scales. The combinations for feature exploitation are defined in a recursive fashion for introducing the non-linearity and better feature representation with limited parameters. Furthermore, we investigate a joint channel-wise and pixel-wise attention mechanism for comprehensive correlation exploration, termed as CPA, which is utilized in PMRB by considering both scale and bias factors for features in parallel. Experimental results show that proposed PMRN recovers structural textures more effectively with superior PSNR/SSIM results than other lightweight works. The extension model PMRN+ with self-ensemble achieves competitive or better results than large networks with much fewer parameters and lower computation complexity.READ FULL TEXT VIEW PDF
Single image super resolution is a very important computer vision task, ...
Aiming at the problems that the convolutional neural networks neglect to...
Convolutional neural networks have been proven to be of great benefit fo...
Perceptual Extreme Super-Resolution for single image is extremely diffic...
Recent deep learning approaches to single image super-resolution have
Image enhancement from degradation of rainy artifacts plays a critical r...
Recent progress in the deep learning-based models has improved single-im...
Source code for paper "Progressive Multi-Scale Residual Network for Single Image Super-Resolution"
Single image super-resolution (SISR), as a traditional image restoration issue, has attracted more and more attention from researchers, which aims to recover the corresponding high resolution (HR) image for a given low resolution (LR) image. Methods for super-resolution are widely used in video codec , view synthesis , facial analysis 
, and other computer vision tasks. Many SISR methods, especially learning-based methods are proposed in recent years, which find the mapping relationship between LR and HR images.
In learning-based methods, features of LR images play a critical role in restoring HR images, and convolutional neural network (CNN) has shown its amazing performance on SISR tasks due to the high efficiency in feature representation and exploration. SRCNN
is the first CNN-based work for SISR problem with three convolutional layers, which perform feature extraction, non-linear mapping and restoration separately. After SRCNN, VDSR, DRRN  and DRCN 
are developed with deeper networks. These works require upscaled HR images by bicubic interpolation as inputs, causing high computation complexity. To our best knowledge, FSRCNN
is the first work without bicubic pre-processing, where one deconvolutional layer was utilized to upscale the feature maps to high resolution at the end of the network. Since residual connection can well solve the gradient vanish problem and makes deeper network be possible, EDSR  and MDSR  with deeper structures by adopting residual blocks acquired amazing super-resolution performance. Besides residual design, dense connection  can also provide efficient gradient flow and information preservation. With dense blocks, SRDenseNet  and RDN  recovered the high-frequency information and structural textures efficiently. Recently, some works concentrate on various effective network modules specially investigate for SISR problem. RCAN , SAN , and DBPN  demonstrated superior performances with elaborate blocks. However, these works require a large number of parameters and high computation complexity, which are challenging for practical applications.
There are also works with lightweight designs for fast super-resolution. Ahn et al. introduced cascading blocks with sharing parameters in CARN , which derived a good balance between speed and recovery capacities. Information distillations are applied in IDN  and IMDN  for favorable performances with fewer parameters. In fact, the restricted number of parameters limits the capacity of network. A multi-scale feature extraction block design with more parameters was designed in MSRN  to recover more structural textures than other lightweight works.
To improve the feature exploration and representation capacity, multi-scale structures are designed for various computer vision tasks. Features from different scales contain various information, which provide a comprehensive consideration for better exploitation. To our best knowledge, MSRN  is the first work with multi-scale structure for SISR problem, where convolutional layers with different kernel sizes are parallelly stacked and crossly connected for feature extraction. In fact, the parallel design makes it challenging to find the relationship between different scales, while layers with different kernel sizes result in a large number of parameters, which increase the computation complexity.
Among elaborate block designs, classical SISR works improve the feature representation capacity by building deeper or wider networks, which seldom consider inherent relationship of features. Attention mechanism is utilized for correlation learning, which turns out to be an efficient component for better representation. Recently, channel-wise feature attention mechanism for image processing was proposed in SENet , which allocated importance to different channels. SAN , RCAN , IMDN , and other recent works integrated channel attention into effective block designs for SISR problem and achieved superior performance. Holistically, most of the channel attention based methods simply extracted the feature information from different channels by global average pooling, without considering the pixel-wise diversity of feature maps. Non-local attention  is one of the pixel-wise mechanism for image restoration. Dai et al. applied both pixel-wise and channel-wise attentions in SAN  and achieved better performance than other SISR works. However, non-local attention requires a large memory cost with high computation complexity, which is challenging for flexible use.
In this paper, we propose a progressive multi-scale residual network (PMRN) for SISR problem. To restore HR images, PMRN extracts and exploits the features from LR images sequentially with limited parameters. Specifically, progressive multi-scale residual block (PMRB) is investigated for feature exploration from different scales. In PMRB, multi-scale features are progressively exploited by different layer combinations for finding the inherent correlations, which are designed in a recursive fashion. To preserve information and stabilize the training phase, residual connections are utilized between different layer combinations. After exploitation, multi-scale features are jointly fused to learn the adaptive information. Holistically, local residual learning is introduced to PMRB to preserve the information from sequential exploration of different blocks and stabilize the training step.
In particular, we design an attention mechanism termed as CPA for both channel-wise and pixel-wise inherent correlations. In CPA, channel-wise and pixel-wise attentions are adaptively exploited by point-wise and depth-wise convolutional layers from two dual processing paths, which parallelly explore scale and bias factors for features. After extraction, the attentions are allocated to feature maps jointly. Experimental results show the proposed PMRN not only achieves better PSNR/SSIM results, but also restores more structural information than other lightweight SR works. An example visual quality comparison is shown in Fig. 8. Furthermore, the extension model PMRN with self-ensemble operation achieves competitive or better results than large networks with fewer parameters and lower computation complexity.
The contributions of this paper are summarized as follows:
We propose a progressive multi-scale residual network (PMRN) for SISR problem. Specifically, a progressive multi-scale block (PMRB) is investigate for information extraction from different scales. In PMRB, layer combinations for multi-scale feature extraction are built in a recursive fashion for better representation with fewer parameters. The features are exploited progressively for finding the inherent correlation from different scales more effectively.
We proposed an attention mechanism by jointly considering channel-wise and pixel-wise features, termed as CPA. In CPA, adaptive scale and bias factors are learned parallelly from two dual paths with point-wise and depth-wise convolutions, which are allocated to features collectively.
Experimental results show the proposed PMRN achieves better PSNR/SSIM results than other lightweight works on all testing benchmarks with superior capacity on complex structure texture recovery. Extension model PMRN achieves competitive or better performance than large networks with fewer parameters and lower computation complexity.
There are CNN-based works for SISR problem. To our best knowledge, SRCNN  investigated by Dong et al. is the first work using CNN for super-resolution with three convolutional layers, which denote feature extraction, non-linear mapping, and restoration operations separately. After SRCNN, Dong et al. proposed FSRCNN  with a faster and deeper network. In FSRCNN, one deconvolutional layer is utilized to upscale the features. Shi et al. firstly replaced the deconvoluiton with a sub-pixel operation in ESPCN , which has turned out to be an effective block for upscaling the features. Since residual learning can improve the network representation efficiently, Kim et al. utilized a very deep network  with global residual learning to achieve good performance. Furthermore, considering residual structure  can successfully relieve the gradient vanishing problem and makes deeper network possible, EDSR and MDSR  performed better on SISR problem by building deeper networks with residual blocks. To deliver the information and gradient flow more effectively, residual-in-residual structure was utilized in RCAN  to establish a very deep network for remarkable performance. Besides residual design, dense connection  also provided an effective way for gradient transmission and information preservation. SRDenseNet  investigated by Tong et al. and RDN  proposed by Zhang et al. demonstrated good capacity with dense blocks. Recently, some works based on different mathematical models were proposed for SISR problem. Lai et al. introduced a progressive image super-resolution network motivated by the Laplacian pyramid in LapSRN . In DBPN , Haris et al. investigated an elaborate block based on iterative back-projection. These methods achieved good results on PSNR/SSIM with a large number of parameters and high computation complexity, which are challenging for practical applications.
Meanwhile, some lightweight works were designed for fast super-resolution. Ahn et al. introduced a cascading block design with sharing parameters to balance the performance and speed in CARN . Hui et al. utilized a lightweight network with information distillation mechanism for better recovery performance in IDN . Based on IDN, IMDN  applied channel separation and advanced channel-wise attention mechanism to improve the network representation. However, the performances are limited because of the restricted number of parameters and computation complexity. MSRN  investigate by Li et al. conducted super-resolution by utilizing multi-scale feature extraction blocks, which balanced the performance and parameters.
Besides effective block designs, attention mechanism has been proved as an effective component for deep learning, which concentrates on the inherent correlation of features [14, 38, 17]. Channel-wise attention was firstly introduced to image processing by Hu et al. in SENet . In SENet, importance from different channels is evaluated by global average pooling and fully connection layers. Since it is a simple but effective component, there are works for SISR problem with channel-wise attentions. To our best knowledge, RCAN  proposed by Zhang et al. is the first super-resolution method with channel-wise attention. SAN  and IMDN  also utilized the attention to improve the network representation. Non-local attention  proposed by Wang et al. introduced a pixel-wise attention method considering the global information of features. Inspired by non-local attentions, Dai et al. investigated a second-order attention in SAN  which considered both pixel-wise and channel-wise attentions, and achieved better performance than other SISR works. In fact, non-local attention requires a large cost on memory and computation complexity, which limits the flexible uses in different networks.
As shown in Fig. 9, there are three modules in PMRN: feature extraction, non-linear feature exploration and restoration. These modules extract features from LR images, and restore the corresponding HR images from features after exploration sequentially. Let’s denote , as the input LR images and restored output HR images of PMRN. Features from LR images will be extracted as,
where denotes the feature extraction module, and denotes the features.
After feature extraction, non-linear feature exploration module builds the mapping from LR features to corresponding HR ones, which is composed of several PMRBs and a padding structure. A global residual learning structure is designed in the module for better gradient transmission and effective representation. Suppose there arePMRBs, for the -th block, there is,
where denotes the PMRB, and denotes the output of -th PMRB. After PMRBs, features will pass the padding structure with residual learning. The output of non-linear feature exploration module is,
where denotes the padding structure.
Finally, HR images will be restored from features after non-linear feature exploration. The restoration step can be demonstrated as,
where denotes the restoration module.
This section will introduce the proposed PMRB in detail. We hold the hypothesis that there are inherent correlations between features from different scales, and the progressive extraction will make full use of the correlations of multi-scale features. As shown in Fig. 9, there are different layer combinations progressively connected for multi-scale feature extraction. Different from other multi-scale methods, layer combinations in PMRB are defined in a recursive way with non-linearity and increase the network depth with restricted parameters. Residual connections are designed between the combinations to preserve the information from different scales and improve the gradient flow. After multi-scale feature extraction, one convolutional layer is utilized to fuse multi-scale features, which adaptively learns information from various scales. Finally, a shortcut for local residual learning is designed in PMRB for information preservation and stabling the training phase.
The main idea of PMRB is to convert the multi-scale exploration into a deeper representation with restricted parameters. Multi-scale structures have turned out to be effective designs for feature exploitation, which contains various information from different scales. Vanilla multi-scale designs for SISR problem utilize layers with different kernel sizes, and parallelly exploit the multi-scale information with different receptive fields. However, layers with larger kernel sizes require more parameters and high computation complexity, and the parallel design considers information from different scales separately, lacking to explore the correlation among multi-scale features. To handle these issues, in PMRB, the layer combinations for multi-scale feature extraction are defined in a recursive way with limited parameters and computation complexity. Meanwhile, the progressive feature extraction focuses on the correlation between features from different scales for adaptive information learning.
As shown in Fig. 9, PMRB can be separated into three steps. Firstly, progressive multi-scale processing (PMP) step extracts the multi-scale features progressively. After PMP, the multi-scale features will be concatenated and fused in multi-scale feature fusion (MFF) step. Finally, local residual learning (LRL) step utilizes a shortcut to preserve the information and stabilize the training phase.
Progressive multi-scale processing step extracts multi-scale features by utilizing the combinations of convolutional layers with non-linear activation. Let’s denote , as the combination and features for scale separately, then there is,
As shown in Fig. 10, the combinations are designed in a recursive fashion. For scale
, there is one convolutional layer for feature extraction. For other scales, the combinations are composed of an identical structure of previous scale combination, and a convolutional layer with ReLU activation. Notice that there is no explicit residual connection with scale. On one hand, the invariant information will be delivered by the local residual learning in PMRB. On the other hand, there is no activation in , and the identical addition will be implied by the convolution operation.
The main idea of multi-scale design is to exploit the features with different receptive fields. Traditional multi-scale works apply layers with different kernel sizes or dilation convolution for feature extraction. In fact, layers with larger kernel sizes require more parameters and higher computation complexity, while dilation convolutions may lose information from features. In this paper, the combination of layers with small kernel size is utilized to substitute the convolutional layers with different receptive fields. We perform the substitution based on the fact that one convolutional layer with kernel size holds an identical receptive field with the combination of one layer with kernel size and one with . For any , the combination of layers with kernel size can substitute a layer with receptive field . From this perspective, the proposed combinations are composed of layers with identical small kernel size, and decrease the parameters.
There are three benefits of the recursive design. The combinations of layers increase the network depth, which is helpful to improve the expression performance. The substitution of utilizing layer combinations limits the number of parameters and computation complexity. Different from multi-scale layers with different kernel sizes, which can be regarded as linear operations, we introduce the non-linearity in the recursive design and improve the network representation.
Besides recursive design, a progressive way is applied for multi-scale feature extraction. We hold the notion that there are inherent correlations among multi-scale features, and information from small scales will be favorable for features exploration on larger ones. From this point of view, the larger scale features are extracted from small ones. With the progressive feature extraction, multi-scale features are sequentially explored and the inherent correlations are fully considered. Features with larger scale factors will be processed by more convolutional layers according to the progressive design, which may contain more complex structural information.
However, there are two critical issues. The progressive design builds the network deeper, which will suffer the vanishing gradient problem. Meanwhile, information from small scales will be lost with the increase of layers. To handle these issues, residual connections are introduced to multi-scale combinations. As shown in Eq. (5), shortcuts are applied in every processing step. On one hand, the shortcuts provide a better gradient transmission and alleviate the gradient vanishing problem. On the other hand, with the residual connections, information from small scales will be identically delivered to larger ones, which maintains the information from all scales.
Multi-scale feature fusion step concatenates and fuses the multi-scale features. In MFF step, there is one convolutional layer for feature processing. After fusion, a CPA block is utilized for attention mechanism. The operation can be demonstrated as,
where denotes the MFF module, and is the output feature.
Local residual learning is devised to preserve the information and improve the gradient flow. Finally, the output of PMRB is,
In this section, the proposed CPA block will be introduced in detail, which considers the attentions from two perspectives jointly. As shown in Fig. 11, CPA can be separated into three parts. Firstly, space transformation (ST) step converts the input features into a specific space for attention exploration. After ST, factor extraction (FE) step exploits the scale and bias factors from converted features jointly on two parallel paths, which consider pixel-wise and channel-wise features separately. Finally, attention allocation (AA) step distributes the learned adaptive attentions onto the features.
Space transformation step transforms the input feature into a specific space. There is one convolutional layer to perform the transformation. The operation of ST can be demonstrated as,
where denotes the features after transformation, and is the input features.
Factor extraction step exploits the scale and bias factors after ST. In FE step, channel-wise and pixel-wise attentions are jointly considered. Channel-wise attentions are firstly extracted by one point-wise convolutional layer (P-Conv), then the pixel-wise attentions are extracted by one depth-wise convolutional layer (D-Conv). The two layers explore attentions from different perspectives orthogonally. One ReLU activation is utilized between the two convolutional layers for non-linearity. The operations of FE module can be demonstrated as,
where denotes the extraction layers, and denotes the sigmoid activation. , are the bias and scale factors separately. Sigmoid activation after introduces the non-negativity of learned scales.
Attention allocation step allocates the attentions to features via learned scale and bias factors. The output of AA step is,
In CPA, and hold same shape as the input , which perform the adaptive attentions on all area of the features. Although channel-wise and pixel-wise attentions are performed jointly, they are explored orthogonally. Channel-wise attention is considered firstly. A P-Conv is designed to explore the inhere correlations among channels, which treats different pixels of features in the same channel equally. After channel-wise attention, pixel-wise attention is explored by one D-Conv. Since D-Conv treats features from different channels separately, it will not influence the correlations of channels. ReLU activation is utilized between two convolutions for the non-linearity. The orthogonal design for attention exploration concentrates different kinds of attentions specifically with limited parameters and computation complexity, which makes CPA as a flexible component for various network designs. Different from other attention mechanisms, there are two parallel paths for finding both scale and bias factors. The scale bypath is similar to other methods for finding weights, while the bias bypath finds a shift on features, which provides another perspective of feature correlations.
CPA holds a similar representation to batch normalization (BN). From Eq. (11), if and are replaced with fixed parameters, then the operation will be identical to a BN step with batch size as 1. In CPA, and are adaptively learned from , which consider a window on features for better correlation exploration with larger receptive fields. From CPA, different pixels and channels will acquire different scale and bias factors, and more attention will be paid to complex textures and information. Since different patches of minibatch are processed independently, there will be no information fusion problem which occurs in BN. From Eq.(11), there is a residual structure in CPA. contains the self-adaptive scale factors and an identical addition of input features, which is utilized to preserve the information and improve the gradient transmission.
In PMRN, all convolutional layers are with kernel size as expect for MFF step in PMRB, which is designed with . The filter number of convolutional layers is set as . There are PMRBs stacked in non-linear feature exploration module, and the padding structure is composed of two convolutional layers with a ReLU activation.
We introduce an efficient restoration module design with restricted parameters. The upscale module contains only one convolutional layer with a sub-pixel layer, which is corresponding to the feature extraction module and can be easily extended for other scaling factors. We utilize one convolution to restore the images and upscale the resolution simultaneously. In other words, there is no convolution after sub-pixel layer, which decreases the parameters and computation complex.
To our best knowledge, MSRN is the first work for SISR problem with multi-scale mechanism. MSRN introduced a multi-scale block termed as MSRB with and convolutional layers. In MSRB, features from two kinds of convoluitonal layers are crossly concatenated and explored, and an convolutional layer is utilized to fuse the multi-scale features. Different from MSRB, there are features from four different scales extracted by PMRB, and concatenated with one convolutional layer for fusion. Features from different scales are explored sequentially, and residual connections are utilized for information preservation and better gradient flow. Multi-scale information is extracted by layers with different kernel sizes in MSRB, while PMRB designs the multi-scale structure in a recursive way, which decreases the parameters and computation complexity. Besides multi-scale design, a novel attention mechanism CPA is designed in PMRB. Features from different MSRBs are collected and concatenated with an convolutional layer for global feature fusion. Different from the global feature fusion, blocks in PMRN are stacked with global residual learning. With the elaborated design, PMRN achieves better PSNR/SSIM results on all testing benchmarks than MSRN with fewer parameters and lower computation complexity.
There is an effective channel-wise attention design in SENet, which has been widely utilized for different image restoration problems. In SENet, information from different channels is evaluated by global average pooling. Two full connection layers with a ReLU activation are designed to explore the attentions, and a Sigmoid activation is introduced for non-negativity. In PMRN, CPA is devised for joint channel-wise and pixel-wise attentions. Different from channel-wise attentions, features are extracted and explored by convolutional layers, which concentrates more on complex textures and information. Squeezing step in SENet shrinks the channel number, which may cause information loss. In CPA, the numbers of filters are invariable for all convolutional layers. Besides scale factors for attention, bias factors are also explored in CPA to shift the features and find a better attention representation. Finally, a shortcut is designed in CPA to maintain the origin information.
LapSRN is a progressive network for image super-resolution. In LapSRN, the progressive structure is designed for images restorations with multiple resolutions by using one network. Residual maps are learned from the network sequentially with the increase of resolutions. In PMRN, an end-to-end network is proposed for image super-resolution with a specific scaling factor. The progressive structure is mainly designed in PMRB to extract the multi-scale features. Information from multi-scale features is sequentially extracted with different layer combinations and fused with one convolutional layer.
The proposed PMRN is trained with DIV2K  dataset. DIV2K is a high-quality dataset with 2K resolution images from real world. There are 800 training images, 100 validation images and 100 test images in DIV2K dataset. In this paper, 800 images are chosen for training and 5 images for validation. For testing, five benchmarks widely used in image super-resolution works: Set5 , Set14 , B100 , Urban100 , and Manga109  are chosen. The training images are randomly flipped and rotated for data augmentation. Patch size of LR image for training is set as . PMRN are trained for 1000 iterations with loss, and the parameters are updated with an Adam  optimizer. The learning rate of optimizer is chosen as , and halved for every 200 iterations. The degradation model is chosen as bicubic down (BI) with scaling factor , , and . PSNR and SSIM are chosen as the indicators for quantitive comparison with other works. Self-ensemble strategy is used to improve the performance, and the extension model is termed as as PMRN.
To make quantitive comparison, we compare the PSNR/SSIM results with several lightweight works: bicubic, SRCNN , VDSR , LapSRN , MemNet , SRMDNF , CARN  and MSRN . For a fair comparison, extension model PMRN is compared with large networks: EDSR , D-DBPN , and SRFBN . The result is shown in Table I. From Table I, PMRN achieves better performance than other fast works on all five benchmarks. PMRN achieves competitive or better performance than large networks.
Meanwhile, we compare the computation complexity and parameters with other works to evaluate the performance. The total number of parameters is calculated as,
where , denote the input and output number of filters in -th convolutional layer, and denote the width and height of the kernel size, denotes the number of groups, and represents as the bias.
Computation complexity is modeled as the number of multiply-accumulate operations (MACs). Since it is a software and hardware independent factor, MACs can purely describe the computation complexity from the mathematical perspective. Comparisons of MACs are conducted by producing a 720P () resolution image from corresponding LR image with different scaling factors. The experimental results are shown in Table. I. From the results, PMRN achieves better PSNR/SSIM results than other lightweight works with competitive parameters and MACs, which shows that PMRN holds a more efficient network design for super-resolution. There are also comparisons between PMRN and other large networks. PMRN achieves competitive or better PSNR/SSIM results with much fewer parameters and MACs. Visualization comparisons on parameters and MACs are shown in Fig. 12 and Fig. 13. A running time comparison is investigated in Fig. 14. The time cost and performance are evaluated on Manga109 with BI degradation.
|Scale||Model||Params||MACs||Set5 ||Set14 ||B100 ||Urban100 ||Manga109 |
Besides quantitative comparisons, we also analyze the qualitative restoration performance via visualization comparisons. Three images from Urban100 benchmark are chosen for comparison with BI degradation, which is shown in Fig. 42. These images are from real world with abundant high-frequency textures and competitive for restoration with large scaling factors. From the result, the proposed PMRN can recover lines and other complex textures more efficiently.
Besides Urban100, we also conduct the experiments on Manga109, which is composed of comic book covers with plentiful line structures. The result is shown in Fig. 51. From the visualization comparison, PMRN recovers more lines and structural textures.
In PMRB, residual connections are introduced to preserve the information from small scales. Feature fusion with convolution is also used to concatenate information from different scales. To show the performance of information preservation and feature fusion, we perform the comparisons without residual and convolution. The results are shown in Table. II, where Res and Fuse denote the residual connection and concatenation separately. Three benchmarks covering different kinds of textures are used for testing with scaling factor . From the Table II, residual and feature fusion are both efficient for different benchmarks. For Set5, residual structure performs better than fusion, achieving around 0.1db improvement. For B100 and Urban100, feature fusion can recover the texture more effectively. Set5 contains less high-frequency information than the other benchmarks, while B100 and Urban100 are composed of abundant images from real world. From this perspective, residual connection is suitable for simple images, while feature fusion performs better on complex structural textures.
There is multi-scale structure in PMRB, extracting information from different scales. To show the performance of multi-scale design, comparisons are conducted without different combinations of convolutional layers. All combinations are replaced by only one convolutional layer. In other words, all the scales in PMRB are identical to . The results are shown in Table. III on four benchmarks with scaling factor . From Table III, model with multi-scale design achieves better PSNR/SSIM results than the other one. There are two reasons for the performance improvement. On one hand, the features of different scales will contain more information, which helps to recover the complex structural textures. On the other hand, the multi-scale structures are built in a recursive way. With the combination of convolutional layers, the depth of PMRN will be increased, which may be helpful to improve the network representation.
Furthermore, we analyze the exploited features from different scales, which are shown in Fig. 58. The multi-scale features are exploited from different layer combinations. With the increasing of scale factors, the structural information will be sharper and more clear, and the tiny textures will be flat. This accords with the notion that multi-scale features contain different information.
In PMRN, recursive layer combinations are proposed to substitute convolutional layers with different kernel sizes. To show the performance of substitution, PSNR/SSIM comparisons are made on five benchmarks with scaling factor . For ensuring the same receptive field, network without combinations is built with layers holding the kernel sizes as , and separately. The results are shown in Table IV. From Table IV, model built with layer combinations achieves better PSNR/SSIM results on all five testing benchmarks, showing the performance of recursive design. Meanwhile, there are around 40.2% off on parameters and MACs when utilizing recursive combinations.
In PMRN, the largest scale of PMRB is chosen as and the number of PMRB is chosen as . To show the effect of different and
, models are trained with different scales and block numbers for 200 epochs. Quantitative comparisons are made on B100 with scaling factor. The visualization results are shown in Fig. 59. From Fig. 59, both and will affect the network performance. In general, with the increase of and , the networks will achieve better results. Compared with , counts more for the performance. On one hand, when is larger, the network will be deeper. On the other hand, with the increase of , features from more scales will be considered.
In PMRN, CPA is investigated for joint attention mechanism. To show the performance of proposed CPA, comparisons are designed on three testing benchmarks. We compare the models with CPA, channel-wise attention (CA) , and no attentions. The results are shown in Table V. From the table, model with CPA achieves the best performance on all testing benchmarks. Model with channel-wise attentions achieves better PSNR/SSIM results than the one without attentions. The results demonstrate that attention mechanism is efficient for image super-resolution.
To analyze the operation of CPA, attention factors , and the feature maps before and after attention are visualized in Fig. 64. From the illustrations, learned attentions are more concentrated on structural textures. and vary sharply on the area of edges and complex textures. After attentions, the features are more discriminative on structural textures, which is a convincing evidence of the attention mechanism.
In this paper, we proposed a progressive multi-scale residual network (PMRN) with limited parameters and computation complexity for single image super-resolution (SISR) problem. Specifically, a novel progressive multi-scale residual block (PMRB) was introduced in PMRN for information exploration from various scales. Different layer combinations for multi-scale features extraction were designed in a recursive way to decrease the parameters and computation complexity, which progressively exploited the features. After feature extraction, multi-scale features were concatenated and fused for adaptive information exploration. Local residual learning was introduced into PMRB for stable training phase and information preservation. Besides structure designs, we also proposed a joint channel-wise and pixel-wise attention mechanism named CPA for better performance, which jointly learned both channel-wise and pixel-wise attentions by point-wise and depth-wise convolutions separately. Different from other attention works, scale and bias factors were explored in parallel for features. Experimental results showed our PMRN could not only achieve better PSNR/SSIM results than other lightweight works on five testing benchmarks, but also recover more complex structural textures. Meanwhile, our extension model PMRN with much fewer parameters and lower computation complexity could achieve competitive or better PSNR/SSIM results than other deep networks.
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 1122–1131. Cited by: §V.
Optical flow estimation using dual self-attention pyramid networks. IEEE Transactions on Circuits and Systems for Video Technology (), pp. 1–1. Cited by: §II-B.