Progressive Multi-Scale Residual Network for Single Image Super-Resolution

07/19/2020 ∙ by Yuqing Liu, et al. ∙ Dalian University of Technology Peking University 0

Super-resolution is a classical issue in image restoration field. In recent years, deep learning methods have achieved significant success in super-resolution topic, which concentrate on different elaborate network designs to exploit the image features more effectively. However, most of the networks focus on increasing the depth or width for superior capacities with a large number of parameters, which cause a high computation complexity cost and seldom focus on the inherent correlation of different features. This paper proposes a progressive multi-scale residual network (PMRN) for single image super-resolution problem by sequentially exploiting features with restricted parameters. Specifically, we design a progressive multi-scale residual block (PMRB) to progressively explore the multi-scale features with different layer combinations, aiming to consider the correlations of different scales. The combinations for feature exploitation are defined in a recursive fashion for introducing the non-linearity and better feature representation with limited parameters. Furthermore, we investigate a joint channel-wise and pixel-wise attention mechanism for comprehensive correlation exploration, termed as CPA, which is utilized in PMRB by considering both scale and bias factors for features in parallel. Experimental results show that proposed PMRN recovers structural textures more effectively with superior PSNR/SSIM results than other lightweight works. The extension model PMRN+ with self-ensemble achieves competitive or better results than large networks with much fewer parameters and lower computation complexity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 5

page 9

page 10

page 11

page 13

Code Repositories

PMRN

Source code for paper "Progressive Multi-Scale Residual Network for Single Image Super-Resolution"


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Single image super-resolution (SISR), as a traditional image restoration issue, has attracted more and more attention from researchers, which aims to recover the corresponding high resolution (HR) image for a given low resolution (LR) image. Methods for super-resolution are widely used in video codec [22], view synthesis [32], facial analysis [4]

, and other computer vision tasks. Many SISR methods, especially learning-based methods are proposed in recent years, which find the mapping relationship between LR and HR images.

In learning-based methods, features of LR images play a critical role in restoring HR images, and convolutional neural network (CNN) has shown its amazing performance on SISR tasks due to the high efficiency in feature representation and exploration. SRCNN 

[7]

is the first CNN-based work for SISR problem with three convolutional layers, which perform feature extraction, non-linear mapping and restoration separately. After SRCNN, VDSR 

[20], DRRN [34] and DRCN [21]

are developed with deeper networks. These works require upscaled HR images by bicubic interpolation as inputs, causing high computation complexity. To our best knowledge, FSRCNN 

[8]

is the first work without bicubic pre-processing, where one deconvolutional layer was utilized to upscale the feature maps to high resolution at the end of the network. Since residual connection 

[11] can well solve the gradient vanish problem and makes deeper network be possible, EDSR [27] and MDSR [27] with deeper structures by adopting residual blocks acquired amazing super-resolution performance. Besides residual design, dense connection [15] can also provide efficient gradient flow and information preservation. With dense blocks, SRDenseNet [35] and RDN [40] recovered the high-frequency information and structural textures efficiently. Recently, some works concentrate on various effective network modules specially investigate for SISR problem. RCAN [41], SAN [6], and DBPN [10] demonstrated superior performances with elaborate blocks. However, these works require a large number of parameters and high computation complexity, which are challenging for practical applications.

There are also works with lightweight designs for fast super-resolution. Ahn et al. introduced cascading blocks with sharing parameters in CARN [2], which derived a good balance between speed and recovery capacities. Information distillations are applied in IDN [18] and IMDN [19] for favorable performances with fewer parameters. In fact, the restricted number of parameters limits the capacity of network. A multi-scale feature extraction block design with more parameters was designed in MSRN [25] to recover more structural textures than other lightweight works.

To improve the feature exploration and representation capacity, multi-scale structures are designed for various computer vision tasks. Features from different scales contain various information, which provide a comprehensive consideration for better exploitation. To our best knowledge, MSRN [25] is the first work with multi-scale structure for SISR problem, where convolutional layers with different kernel sizes are parallelly stacked and crossly connected for feature extraction. In fact, the parallel design makes it challenging to find the relationship between different scales, while layers with different kernel sizes result in a large number of parameters, which increase the computation complexity.

Among elaborate block designs, classical SISR works improve the feature representation capacity by building deeper or wider networks, which seldom consider inherent relationship of features. Attention mechanism is utilized for correlation learning, which turns out to be an efficient component for better representation. Recently, channel-wise feature attention mechanism for image processing was proposed in SENet [13], which allocated importance to different channels. SAN [6], RCAN [41], IMDN [19], and other recent works integrated channel attention into effective block designs for SISR problem and achieved superior performance. Holistically, most of the channel attention based methods simply extracted the feature information from different channels by global average pooling, without considering the pixel-wise diversity of feature maps. Non-local attention [36] is one of the pixel-wise mechanism for image restoration. Dai et al. applied both pixel-wise and channel-wise attentions in SAN [6] and achieved better performance than other SISR works. However, non-local attention requires a large memory cost with high computation complexity, which is challenging for flexible use.

(a) image_024 from Urban100 [16]
(b) HR (PSNR/SSIM)
(c) Bicubic (16.94/0.5539)
(d) LapSRN [24] (18.10/0.6714)
(e) CARN [2] (18.84/0.7132)
(f) MSRN [25] (18.81/0.7224)
(g) PMRN(19.09/0.7340)
Fig. 8: Visual quality comparisons for various image SR methods with scaling factor .

In this paper, we propose a progressive multi-scale residual network (PMRN) for SISR problem. To restore HR images, PMRN extracts and exploits the features from LR images sequentially with limited parameters. Specifically, progressive multi-scale residual block (PMRB) is investigated for feature exploration from different scales. In PMRB, multi-scale features are progressively exploited by different layer combinations for finding the inherent correlations, which are designed in a recursive fashion. To preserve information and stabilize the training phase, residual connections are utilized between different layer combinations. After exploitation, multi-scale features are jointly fused to learn the adaptive information. Holistically, local residual learning is introduced to PMRB to preserve the information from sequential exploration of different blocks and stabilize the training step.

In particular, we design an attention mechanism termed as CPA for both channel-wise and pixel-wise inherent correlations. In CPA, channel-wise and pixel-wise attentions are adaptively exploited by point-wise and depth-wise convolutional layers from two dual processing paths, which parallelly explore scale and bias factors for features. After extraction, the attentions are allocated to feature maps jointly. Experimental results show the proposed PMRN not only achieves better PSNR/SSIM results, but also restores more structural information than other lightweight SR works. An example visual quality comparison is shown in Fig. 8. Furthermore, the extension model PMRN with self-ensemble operation achieves competitive or better results than large networks with fewer parameters and lower computation complexity.

The contributions of this paper are summarized as follows:

  • We propose a progressive multi-scale residual network (PMRN) for SISR problem. Specifically, a progressive multi-scale block (PMRB) is investigate for information extraction from different scales. In PMRB, layer combinations for multi-scale feature extraction are built in a recursive fashion for better representation with fewer parameters. The features are exploited progressively for finding the inherent correlation from different scales more effectively.

  • We proposed an attention mechanism by jointly considering channel-wise and pixel-wise features, termed as CPA. In CPA, adaptive scale and bias factors are learned parallelly from two dual paths with point-wise and depth-wise convolutions, which are allocated to features collectively.

  • Experimental results show the proposed PMRN achieves better PSNR/SSIM results than other lightweight works on all testing benchmarks with superior capacity on complex structure texture recovery. Extension model PMRN achieves competitive or better performance than large networks with fewer parameters and lower computation complexity.

Fig. 9: Illustration of proposed PMRN. There are three modules in PMRN sequentially restore the resolution from corresponding LR images. In PMRB, there are layer combinations for feature exploration with different scales. CPA block is utilized for joint channel-wise and pixel-wise attentions.

Ii Related Work

Ii-a Deep Learning for SISR

There are CNN-based works for SISR problem. To our best knowledge, SRCNN [7] investigated by Dong et al. is the first work using CNN for super-resolution with three convolutional layers, which denote feature extraction, non-linear mapping, and restoration operations separately. After SRCNN, Dong et al. proposed FSRCNN [8] with a faster and deeper network. In FSRCNN, one deconvolutional layer is utilized to upscale the features. Shi et al. firstly replaced the deconvoluiton with a sub-pixel operation in ESPCN [31], which has turned out to be an effective block for upscaling the features. Since residual learning can improve the network representation efficiently, Kim et al. utilized a very deep network [20] with global residual learning to achieve good performance. Furthermore, considering residual structure [11] can successfully relieve the gradient vanishing problem and makes deeper network possible, EDSR and MDSR [27] performed better on SISR problem by building deeper networks with residual blocks. To deliver the information and gradient flow more effectively, residual-in-residual structure was utilized in RCAN [41] to establish a very deep network for remarkable performance. Besides residual design, dense connection [15] also provided an effective way for gradient transmission and information preservation. SRDenseNet [35] investigated by Tong et al. and RDN [40] proposed by Zhang et al. demonstrated good capacity with dense blocks. Recently, some works based on different mathematical models were proposed for SISR problem. Lai et al. introduced a progressive image super-resolution network motivated by the Laplacian pyramid in LapSRN [24]. In DBPN [10], Haris et al. investigated an elaborate block based on iterative back-projection. These methods achieved good results on PSNR/SSIM with a large number of parameters and high computation complexity, which are challenging for practical applications.

Meanwhile, some lightweight works were designed for fast super-resolution. Ahn et al. introduced a cascading block design with sharing parameters to balance the performance and speed in CARN [2]. Hui et al. utilized a lightweight network with information distillation mechanism for better recovery performance in IDN [18]. Based on IDN, IMDN [19] applied channel separation and advanced channel-wise attention mechanism to improve the network representation. However, the performances are limited because of the restricted number of parameters and computation complexity. MSRN [25] investigate by Li et al. conducted super-resolution by utilizing multi-scale feature extraction blocks, which balanced the performance and parameters.

Ii-B Attention Mechanism

Besides effective block designs, attention mechanism has been proved as an effective component for deep learning, which concentrates on the inherent correlation of features [14, 38, 17]. Channel-wise attention was firstly introduced to image processing by Hu et al. in SENet [13]. In SENet, importance from different channels is evaluated by global average pooling and fully connection layers. Since it is a simple but effective component, there are works for SISR problem with channel-wise attentions. To our best knowledge, RCAN [41] proposed by Zhang et al. is the first super-resolution method with channel-wise attention. SAN [6] and IMDN [19] also utilized the attention to improve the network representation. Non-local attention [36] proposed by Wang et al. introduced a pixel-wise attention method considering the global information of features. Inspired by non-local attentions, Dai et al. investigated a second-order attention in SAN [6] which considered both pixel-wise and channel-wise attentions, and achieved better performance than other SISR works. In fact, non-local attention requires a large cost on memory and computation complexity, which limits the flexible uses in different networks.

Iii Progressive Multi-Scale Residual Network

Iii-a Network Design

As shown in Fig. 9, there are three modules in PMRN: feature extraction, non-linear feature exploration and restoration. These modules extract features from LR images, and restore the corresponding HR images from features after exploration sequentially. Let’s denote , as the input LR images and restored output HR images of PMRN. Features from LR images will be extracted as,

(1)

where denotes the feature extraction module, and denotes the features.

After feature extraction, non-linear feature exploration module builds the mapping from LR features to corresponding HR ones, which is composed of several PMRBs and a padding structure. A global residual learning structure is designed in the module for better gradient transmission and effective representation. Suppose there are

PMRBs, for the -th block, there is,

(2)

where denotes the PMRB, and denotes the output of -th PMRB. After PMRBs, features will pass the padding structure with residual learning. The output of non-linear feature exploration module is,

(3)

where denotes the padding structure.

Finally, HR images will be restored from features after non-linear feature exploration. The restoration step can be demonstrated as,

(4)

where denotes the restoration module.

Iii-B Progressive Multi-scale Residual Block

This section will introduce the proposed PMRB in detail. We hold the hypothesis that there are inherent correlations between features from different scales, and the progressive extraction will make full use of the correlations of multi-scale features. As shown in Fig. 9, there are different layer combinations progressively connected for multi-scale feature extraction. Different from other multi-scale methods, layer combinations in PMRB are defined in a recursive way with non-linearity and increase the network depth with restricted parameters. Residual connections are designed between the combinations to preserve the information from different scales and improve the gradient flow. After multi-scale feature extraction, one convolutional layer is utilized to fuse multi-scale features, which adaptively learns information from various scales. Finally, a shortcut for local residual learning is designed in PMRB for information preservation and stabling the training phase.

The main idea of PMRB is to convert the multi-scale exploration into a deeper representation with restricted parameters. Multi-scale structures have turned out to be effective designs for feature exploitation, which contains various information from different scales. Vanilla multi-scale designs for SISR problem utilize layers with different kernel sizes, and parallelly exploit the multi-scale information with different receptive fields. However, layers with larger kernel sizes require more parameters and high computation complexity, and the parallel design considers information from different scales separately, lacking to explore the correlation among multi-scale features. To handle these issues, in PMRB, the layer combinations for multi-scale feature extraction are defined in a recursive way with limited parameters and computation complexity. Meanwhile, the progressive feature extraction focuses on the correlation between features from different scales for adaptive information learning.

As shown in Fig. 9, PMRB can be separated into three steps. Firstly, progressive multi-scale processing (PMP) step extracts the multi-scale features progressively. After PMP, the multi-scale features will be concatenated and fused in multi-scale feature fusion (MFF) step. Finally, local residual learning (LRL) step utilizes a shortcut to preserve the information and stabilize the training phase.

Progressive multi-scale processing step extracts multi-scale features by utilizing the combinations of convolutional layers with non-linear activation. Let’s denote , as the combination and features for scale separately, then there is,

(5)
Fig. 10: Illustration of different layer combination design. The combinations are defined in a recursive way for larger scales.

As shown in Fig. 10, the combinations are designed in a recursive fashion. For scale

, there is one convolutional layer for feature extraction. For other scales, the combinations are composed of an identical structure of previous scale combination, and a convolutional layer with ReLU activation. Notice that there is no explicit residual connection with scale

. On one hand, the invariant information will be delivered by the local residual learning in PMRB. On the other hand, there is no activation in , and the identical addition will be implied by the convolution operation.

The main idea of multi-scale design is to exploit the features with different receptive fields. Traditional multi-scale works apply layers with different kernel sizes or dilation convolution for feature extraction. In fact, layers with larger kernel sizes require more parameters and higher computation complexity, while dilation convolutions may lose information from features. In this paper, the combination of layers with small kernel size is utilized to substitute the convolutional layers with different receptive fields. We perform the substitution based on the fact that one convolutional layer with kernel size holds an identical receptive field with the combination of one layer with kernel size and one with . For any , the combination of layers with kernel size can substitute a layer with receptive field . From this perspective, the proposed combinations are composed of layers with identical small kernel size, and decrease the parameters.

There are three benefits of the recursive design. The combinations of layers increase the network depth, which is helpful to improve the expression performance. The substitution of utilizing layer combinations limits the number of parameters and computation complexity. Different from multi-scale layers with different kernel sizes, which can be regarded as linear operations, we introduce the non-linearity in the recursive design and improve the network representation.

Besides recursive design, a progressive way is applied for multi-scale feature extraction. We hold the notion that there are inherent correlations among multi-scale features, and information from small scales will be favorable for features exploration on larger ones. From this point of view, the larger scale features are extracted from small ones. With the progressive feature extraction, multi-scale features are sequentially explored and the inherent correlations are fully considered. Features with larger scale factors will be processed by more convolutional layers according to the progressive design, which may contain more complex structural information.

However, there are two critical issues. The progressive design builds the network deeper, which will suffer the vanishing gradient problem. Meanwhile, information from small scales will be lost with the increase of layers. To handle these issues, residual connections are introduced to multi-scale combinations. As shown in Eq. (

5), shortcuts are applied in every processing step. On one hand, the shortcuts provide a better gradient transmission and alleviate the gradient vanishing problem. On the other hand, with the residual connections, information from small scales will be identically delivered to larger ones, which maintains the information from all scales.

Multi-scale feature fusion step concatenates and fuses the multi-scale features. In MFF step, there is one convolutional layer for feature processing. After fusion, a CPA block is utilized for attention mechanism. The operation can be demonstrated as,

(6)

where denotes the MFF module, and is the output feature.

Local residual learning is devised to preserve the information and improve the gradient flow. Finally, the output of PMRB is,

(7)

Iii-C Channel-wise and Pixel-wise Attention

Fig. 11: Illustration of proposed CPA. Scale factor and bias factor are adaptively learned from the attention mechanism. In CPA, point-wise (P-Conv) and depth-wise (D-Conv) convolutional layers exploit the channel-wise and pixel-wise relations separately.

In this section, the proposed CPA block will be introduced in detail, which considers the attentions from two perspectives jointly. As shown in Fig. 11, CPA can be separated into three parts. Firstly, space transformation (ST) step converts the input features into a specific space for attention exploration. After ST, factor extraction (FE) step exploits the scale and bias factors from converted features jointly on two parallel paths, which consider pixel-wise and channel-wise features separately. Finally, attention allocation (AA) step distributes the learned adaptive attentions onto the features.

Space transformation step transforms the input feature into a specific space. There is one convolutional layer to perform the transformation. The operation of ST can be demonstrated as,

(8)

where denotes the features after transformation, and is the input features.

Factor extraction step exploits the scale and bias factors after ST. In FE step, channel-wise and pixel-wise attentions are jointly considered. Channel-wise attentions are firstly extracted by one point-wise convolutional layer (P-Conv), then the pixel-wise attentions are extracted by one depth-wise convolutional layer (D-Conv). The two layers explore attentions from different perspectives orthogonally. One ReLU activation is utilized between the two convolutional layers for non-linearity. The operations of FE module can be demonstrated as,

(9)
(10)

where denotes the extraction layers, and denotes the sigmoid activation. , are the bias and scale factors separately. Sigmoid activation after introduces the non-negativity of learned scales.

Attention allocation step allocates the attentions to features via learned scale and bias factors. The output of AA step is,

(11)

In CPA, and hold same shape as the input , which perform the adaptive attentions on all area of the features. Although channel-wise and pixel-wise attentions are performed jointly, they are explored orthogonally. Channel-wise attention is considered firstly. A P-Conv is designed to explore the inhere correlations among channels, which treats different pixels of features in the same channel equally. After channel-wise attention, pixel-wise attention is explored by one D-Conv. Since D-Conv treats features from different channels separately, it will not influence the correlations of channels. ReLU activation is utilized between two convolutions for the non-linearity. The orthogonal design for attention exploration concentrates different kinds of attentions specifically with limited parameters and computation complexity, which makes CPA as a flexible component for various network designs. Different from other attention mechanisms, there are two parallel paths for finding both scale and bias factors. The scale bypath is similar to other methods for finding weights, while the bias bypath finds a shift on features, which provides another perspective of feature correlations.

CPA holds a similar representation to batch normalization (BN). From Eq. (

11), if and are replaced with fixed parameters, then the operation will be identical to a BN step with batch size as 1. In CPA, and are adaptively learned from , which consider a window on features for better correlation exploration with larger receptive fields. From CPA, different pixels and channels will acquire different scale and bias factors, and more attention will be paid to complex textures and information. Since different patches of minibatch are processed independently, there will be no information fusion problem which occurs in BN. From Eq.(11), there is a residual structure in CPA. contains the self-adaptive scale factors and an identical addition of input features, which is utilized to preserve the information and improve the gradient transmission.

Iv Implementation and Discussion

Iv-a Implementation Details

In PMRN, all convolutional layers are with kernel size as expect for MFF step in PMRB, which is designed with . The filter number of convolutional layers is set as . There are PMRBs stacked in non-linear feature exploration module, and the padding structure is composed of two convolutional layers with a ReLU activation.

We introduce an efficient restoration module design with restricted parameters. The upscale module contains only one convolutional layer with a sub-pixel layer, which is corresponding to the feature extraction module and can be easily extended for other scaling factors. We utilize one convolution to restore the images and upscale the resolution simultaneously. In other words, there is no convolution after sub-pixel layer, which decreases the parameters and computation complex.

Iv-B Discussion

Difference to MSRN [25]

To our best knowledge, MSRN is the first work for SISR problem with multi-scale mechanism. MSRN introduced a multi-scale block termed as MSRB with and convolutional layers. In MSRB, features from two kinds of convoluitonal layers are crossly concatenated and explored, and an convolutional layer is utilized to fuse the multi-scale features. Different from MSRB, there are features from four different scales extracted by PMRB, and concatenated with one convolutional layer for fusion. Features from different scales are explored sequentially, and residual connections are utilized for information preservation and better gradient flow. Multi-scale information is extracted by layers with different kernel sizes in MSRB, while PMRB designs the multi-scale structure in a recursive way, which decreases the parameters and computation complexity. Besides multi-scale design, a novel attention mechanism CPA is designed in PMRB. Features from different MSRBs are collected and concatenated with an convolutional layer for global feature fusion. Different from the global feature fusion, blocks in PMRN are stacked with global residual learning. With the elaborated design, PMRN achieves better PSNR/SSIM results on all testing benchmarks than MSRN with fewer parameters and lower computation complexity.

Difference to Channel-wise Attention [13]

There is an effective channel-wise attention design in SENet, which has been widely utilized for different image restoration problems. In SENet, information from different channels is evaluated by global average pooling. Two full connection layers with a ReLU activation are designed to explore the attentions, and a Sigmoid activation is introduced for non-negativity. In PMRN, CPA is devised for joint channel-wise and pixel-wise attentions. Different from channel-wise attentions, features are extracted and explored by convolutional layers, which concentrates more on complex textures and information. Squeezing step in SENet shrinks the channel number, which may cause information loss. In CPA, the numbers of filters are invariable for all convolutional layers. Besides scale factors for attention, bias factors are also explored in CPA to shift the features and find a better attention representation. Finally, a shortcut is designed in CPA to maintain the origin information.

Difference to LapSRN [24]

LapSRN is a progressive network for image super-resolution. In LapSRN, the progressive structure is designed for images restorations with multiple resolutions by using one network. Residual maps are learned from the network sequentially with the increase of resolutions. In PMRN, an end-to-end network is proposed for image super-resolution with a specific scaling factor. The progressive structure is mainly designed in PMRB to extract the multi-scale features. Information from multi-scale features is sequentially extracted with different layer combinations and fused with one convolutional layer.

V Experiments

The proposed PMRN is trained with DIV2K [1] dataset. DIV2K is a high-quality dataset with 2K resolution images from real world. There are 800 training images, 100 validation images and 100 test images in DIV2K dataset. In this paper, 800 images are chosen for training and 5 images for validation. For testing, five benchmarks widely used in image super-resolution works: Set5 [3], Set14 [37], B100 [28], Urban100 [16], and Manga109 [29] are chosen. The training images are randomly flipped and rotated for data augmentation. Patch size of LR image for training is set as . PMRN are trained for 1000 iterations with loss, and the parameters are updated with an Adam [23] optimizer. The learning rate of optimizer is chosen as , and halved for every 200 iterations. The degradation model is chosen as bicubic down (BI) with scaling factor , , and . PSNR and SSIM are chosen as the indicators for quantitive comparison with other works. Self-ensemble strategy is used to improve the performance, and the extension model is termed as as PMRN.

V-a Results

To make quantitive comparison, we compare the PSNR/SSIM results with several lightweight works: bicubic, SRCNN [7], VDSR [20], LapSRN [24], MemNet [33], SRMDNF [39], CARN [2] and MSRN [25]. For a fair comparison, extension model PMRN is compared with large networks: EDSR [27], D-DBPN [10], and SRFBN [26]. The result is shown in Table I. From Table I, PMRN achieves better performance than other fast works on all five benchmarks. PMRN achieves competitive or better performance than large networks.

Meanwhile, we compare the computation complexity and parameters with other works to evaluate the performance. The total number of parameters is calculated as,

(12)

where , denote the input and output number of filters in -th convolutional layer, and denote the width and height of the kernel size, denotes the number of groups, and represents as the bias.

Computation complexity is modeled as the number of multiply-accumulate operations (MACs). Since it is a software and hardware independent factor, MACs can purely describe the computation complexity from the mathematical perspective. Comparisons of MACs are conducted by producing a 720P () resolution image from corresponding LR image with different scaling factors. The experimental results are shown in Table. I. From the results, PMRN achieves better PSNR/SSIM results than other lightweight works with competitive parameters and MACs, which shows that PMRN holds a more efficient network design for super-resolution. There are also comparisons between PMRN and other large networks. PMRN achieves competitive or better PSNR/SSIM results with much fewer parameters and MACs. Visualization comparisons on parameters and MACs are shown in Fig. 12 and Fig. 13. A running time comparison is investigated in Fig. 14. The time cost and performance are evaluated on Manga109 with BI degradation.

Fig. 12: An illustration comparison of performance and parameters.
Fig. 13: An illustration comparison of performance and MACs.
Fig. 14: An illustration comparison of performance and running time.
Scale Model Params MACs Set5 [3] Set14 [37] B100 [28] Urban100 [16] Manga109 [29]
PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM
SRCNN [7] 57K 52.7G 36.66/0.9542 32.42/0.9063 31.36/0.8879 29.50/0.8946 35.74/0.9661
FSRCNN [8] 12K 6.0G 37.00/0.9558 32.63/0.9088 31.53/0.8920 29.88/0.9020 36.67/0.9694
VDSR [20] 665K 612.6G 37.53/0.9587 33.03/0.9124 31.90/0.8960 30.76/0.9140 37.22/0.9729
DRCN [21] 1,774K 17,974.3G 37.63/0.9588 33.04/0.9118 31.85/0.8942 30.75/0.9133 37.63/0.9723
CNF [30] 337K 311.0G 37.66/0.9590 33.38/0.9136 31.91/0.8962 - -
LapSRN [24] 813K 29.9G 37.52/0.9590 33.08/0.9130 31.80/0.8950 30.41/0.9100 37.27/0.9740
DRRN [34] 297K 6,796.9G 37.74/0.9591 33.23/0.9136 32.05/0.8973 31.23/0.9188 37.92/0.9760
BTSRN [9] 410K 207.7G 37.75/- 33.20/- 32.05/- 31.63/- -
MemNet [33] 677K 2,662.4G 37.78/0.9597 33.28/0.9142 32.08/0.8978 31.31/0.9195 37.72/0.9740
SelNet [5] 974K 225.7G 37.89/0.9598 33.61/0.9160 32.08/0.8984 - -
CARN [2] 1,592K 222.8G 37.76/0.9590 33.52/0.9166 32.09/0.8978 31.92/0.9256 38.36/0.9765
MSRN [25] 5,930K 1367.5G 38.08/0.9607 33.70/0.9186 32.23/0.9002 32.29/0.9303 38.69/0.9772
OISR-RK2 [12] 4,970K 1145.7G 38.12/0.9609 33.80/0.9193 32.26/0.9006 32.48/0.9317 -
PMRN 3,577K 824.2G 38.13/0.9609 33.85/0.9204 32.28/0.9010 32.59/0.9328 38.91/0.9775
EDSR [27] 40,729K 9,388.8G 38.11/0.9602 33.92/0.9195 32.32/0.9013 32.93/0.9351 39.10/0.9773
D-DBPN [10] 5,953K 3,746.2G 38.09/0.9600 33.85/0.9190 32.27/0.9000 32.55/0.9324 38.89/0.9775
SRFBN [26] 2,140K 5,043.5G 38.11/0.9609 33.82/0.9196 32.29/0.9010 32.62/0.9328 39.08/0.9779
PMRN 3,577K 3,296.8G 38.22/0.9612 33.90/0.9205 32.34/0.9015 32.78/0.9342 39.15/0.9781
SRCNN [7] 57K 52.7G 32.75/0.9090 29.28/0.8209 28.41/0.7863 26.24/0.7989 30.59/0.9107
FSRCNN [8] 12K 5.0G 33.16/0.9140 29.43/0.8242 28.53/0.7910 26.43/0.8080 30.98/0.9212
VDSR [20] 665K 612.6G 33.66/0.9213 29.77/0.8314 28.82/0.7976 27.14/0.8279 32.01/0.9310
DRCN [21] 1,774K 17,974.3G 33.82/0.9226 29.76/0.8311 28.80/0.7963 27.15/0.8276 32.31/0.9328
CNF [30] 337K 311.0G 33.74/0.9226 29.90/0.8322 28.82/0.7980 - -
DRRN [34] 297K 6,796.9G 34.03/0.9244 29.96/0.8349 28.95/0.8004 27.53/0.8378 32.74/0.9390
BTSRN [9] 410K 176.2G 34.03/- 29.90/- 28.97/- 27.75/- -
MemNet [33] 677K 2,662.4G 34.09/0.9248 30.00/0.8350 28.96/0.8001 27.56/0.8376 32.51/0.9369
SelNet [5] 1,159K 120.0G 34.27/0.9257 30.30/0.8399 28.97/0.8025 - -
CARN [2] 1,592K 118.8G 34.29/0.9255 30.29/0.8407 29.06/0.8034 28.06/0.8493 33.49/0.9440
MSRN [25] 6,114K 626.6G 34.46/0.9278 30.41/0.8437 29.15/0.8064 28.33/0.8561 33.67/0.9456
OISR-RK2 [12] 5,640K 578.6G 34.55/0.9282 30.46/0.8443 29.18/0.8075 28.50/0.8597 -
PMRN 3,586K 366.6G 34.57/0.9284 30.43/0.8444 29.19/0.8075 28.51/0.8601 33.85/0.9465
EDSR [27] 43,680K 4,471.5G 34.65/0.9280 30.52/0.8462 29.25/0.8093 28.80/0.8653 34.17/0.9476
SRFBN [26] 2,832K 6,023.8G 34.70/0.9292 30.51/0.8461 29.24/0.8084 28.73/0.8641 34.18/0.9481
PMRN 3,586K 1,466.4G 34.65/0.9289 30.54/0.8461 29.24/0.8087 28.71/0.8630 34.10/0.9480
SRCNN [7] 57K 52.7G 30.48/0.8628 27.49/0.7503 26.90/0.7101 24.52/0.7221 27.66/0.8505
FSRCNN [8] 12K 4.6G 30.71/0.8657 27.59/0.7535 26.98/0.7150 24.62/0.7280 27.90/0.8517
VDSR [20] 665K 612.6G 31.35/0.8838 28.01/0.7674 27.29/0.7251 25.18/0.7524 28.83/0.8809
DRCN [21] 1,774K 17,974.3G 31.53/0.8854 28.02/0.7670 27.23/0.7233 25.14/0.7510 28.98/0.8816
CNF [30] 337K 311.0G 31.55/0.8856 28.15/0.7680 27.32/0.7253 - -
LapSRN [24] 813K 149.4G 31.54/0.8850 28.19/0.7720 27.32/0.7280 25.21/0.7560 29.09/0.8845
DRRN [34] 297K 6,796.9G 31.68/0.8888 28.21/0.7720 27.38/0.7284 25.44/0.7638 29.46/0.8960
BTSRN [9] 410K 207.7G 31.85/- 28.20/- 27.47/- 25.74/- -
MemNet [33] 677K 2,662.4G 31.74/0.8893 28.26/0.7723 27.40/0.7281 25.50/0.7630 29.42/0.8942
SelNet [5] 1,417K 83.1G 32.00/0.8931 28.49/0.7783 27.44/0.7325 - -
SRDenseNet [35] 2,015K 389.9G 32.02/0.8934 28.50/0.7782 27.53/0.7337 26.05/0.7819 -
CARN [2] 1,592K 90.9G 32.13/0.8937 28.60/0.7806 27.58/0.7349 26.07/0.7837 30.40/0.9082
MSRN [25] 6,373K 368.6G 32.26/0.8960 28.63/0.7836 27.61/0.7380 26.22/0.7911 30.57/0.9103
OISR-RK2 [12] 5,500K 412.2G 32.32/0.8965 28.72/0.7843 27.66/0.7390 26.37/0.7953 -
PMRN 3,598K 207.2G 32.34/0.8971 28.71/0.7850 27.66/0.7392 26.37/0.7953 30.71/0.9107
EDSR [27] 43,089K 2,895.8G 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033 31.02/0.9148
D-DBPN [10] 10,426K 5,213.0G 32.47/0.8980 28.82/0.7860 27.72/0.7400 26.38/0.7946 30.91/0.9137
SRFBN [26] 3,631K 7,466.1G 32.47/0.8983 28.81/0.7868 27.72/0.7409 26.60/0.8015 31.15/0.9160
PMRN 3,598K 828.8G 32.47/0.8984 28.81/0.7870 27.72/0.7405 26.55/0.7995 31.07/0.9144
TABLE I: Average PSNR/SSIM, parameters and MACs results with degradation model BI , , and on five benchmarks. The best and second performances are shown in bold and underline.

Besides quantitative comparisons, we also analyze the qualitative restoration performance via visualization comparisons. Three images from Urban100 benchmark are chosen for comparison with BI  degradation, which is shown in Fig. 42. These images are from real world with abundant high-frequency textures and competitive for restoration with large scaling factors. From the result, the proposed PMRN can recover lines and other complex textures more efficiently.

Besides Urban100, we also conduct the experiments on Manga109, which is composed of comic book covers with plentiful line structures. The result is shown in Fig. 51. From the visualization comparison, PMRN recovers more lines and structural textures.

(a) image_059 from Urban100
(b) HR (PSNR/SSIM)
(c) LR (18.96/0.7246)
(d) Bicubic (19.21/0.7331)
(e) VDSR [20] (19.94/0.7910)
(f) LapSRN [24] (19.92/0.7894)
(g) CARN [2] (20.82/0.8234)
(h) MSRN [25] (21.11/0.8369)
(i) Ours (21.44/0.8447)
(j) image_067 from Urban100
(k) HR (PSNR/SSIM)
(l) LR (14.95/0.7116)
(m) Bicubic (15.80/0.7490)
(n) VDSR [20] (17.30/0.8474)
(o) LapSRN [24] (17.34/0.8577)
(p) CARN [2] (18.12/0.8882)
(q) MSRN [25] (18.58/0.8950)
(r) Ours (18.84/0.9035)
(s) image_078 from Urban100
(t) HR (PSNR/SSIM)
(u) LR (23.74/0.7624)
(v) Bicubic (24.49/0.7866)
(w) VDSR [20] (25.49/0.8401)
(x) LapSRN [24] (25.41/0.8395)
(y) CARN [2] (25.88/0.8536)
(z) MSRN [25] (26.12/0.8598)
(aa) Ours (26.45/0.8658)
Fig. 42: Visualization comparisons on Urban100 with BI  degradation.
(a) HR (PSNR/SSIM)
(b) LR (20.16/0.8521)
(c) MSRN [25] (26.31/0.9296)
(d) PMRN (26.46/0.9637)
(e) HR (PSNR/SSIM)
(f) LR (21.34/0.8693)
(g) MSRN [25] (27.49/0.9690)
(h) PMRN (27.76/0.9637)
Fig. 51: Visualization comparisons on Manga109 with BI  degradation.

V-B Ablation Study

Study on Network Design

In PMRB, residual connections are introduced to preserve the information from small scales. Feature fusion with convolution is also used to concatenate information from different scales. To show the performance of information preservation and feature fusion, we perform the comparisons without residual and convolution. The results are shown in Table. II, where Res and Fuse denote the residual connection and concatenation separately. Three benchmarks covering different kinds of textures are used for testing with scaling factor . From the Table II, residual and feature fusion are both efficient for different benchmarks. For Set5, residual structure performs better than fusion, achieving around 0.1db improvement. For B100 and Urban100, feature fusion can recover the texture more effectively. Set5 contains less high-frequency information than the other benchmarks, while B100 and Urban100 are composed of abundant images from real world. From this perspective, residual connection is suitable for simple images, while feature fusion performs better on complex structural textures.

Res Fuse Set5 B100 Urban100
32.34/0.8971 27.66/0.7392 26.37/0.7953
32.35/0.8971 27.64/0.7384 26.34/0.7942
32.24/0.8963 27.65/0.7388 26.36/0.7955
TABLE II: Investigation on different structures in PMRB with scaling factor for different benchmarks.

There is multi-scale structure in PMRB, extracting information from different scales. To show the performance of multi-scale design, comparisons are conducted without different combinations of convolutional layers. All combinations are replaced by only one convolutional layer. In other words, all the scales in PMRB are identical to . The results are shown in Table. III on four benchmarks with scaling factor . From Table III, model with multi-scale design achieves better PSNR/SSIM results than the other one. There are two reasons for the performance improvement. On one hand, the features of different scales will contain more information, which helps to recover the complex structural textures. On the other hand, the multi-scale structures are built in a recursive way. With the combination of convolutional layers, the depth of PMRN will be increased, which may be helpful to improve the network representation.

Furthermore, we analyze the exploited features from different scales, which are shown in Fig. 58. The multi-scale features are exploited from different layer combinations. With the increasing of scale factors, the structural information will be sharper and more clear, and the tiny textures will be flat. This accords with the notion that multi-scale features contain different information.

(a) (a)
(b) (b)
(c) (c)
(d) (d)
(e) (e)
(f) (f)
Fig. 58: Illustrations of multi-scale features. (a) and (f) denote the input and output features. (b)-(e) denote the features with scale factor 3, 5, 7, and 9.
Multi Set5 Set14 B100 Urban100
w 32.34/0.8971 28.71/0.7850 27.66/0.7392 26.37/0.7953
w/o 32.03/0.8932 28.51/0.7799 27.53/0.7348 25.90/0.7803
TABLE III: Investigation on multi-scale mechanism in PMRB with scaling factor for different benchmarks.

In PMRN, recursive layer combinations are proposed to substitute convolutional layers with different kernel sizes. To show the performance of substitution, PSNR/SSIM comparisons are made on five benchmarks with scaling factor . For ensuring the same receptive field, network without combinations is built with layers holding the kernel sizes as , and separately. The results are shown in Table IV. From Table IV, model built with layer combinations achieves better PSNR/SSIM results on all five testing benchmarks, showing the performance of recursive design. Meanwhile, there are around 40.2% off on parameters and MACs when utilizing recursive combinations.

Comb Param MACs Set5 Set14 B100 Urban100 Manga109
w 3,598K 207.2G 32.34/0.8971 28.71/0.7850 27.66/0.7392 26.37/0.7953 30.71/0.9107
w/o 6,020K 346.7G 32.07/0.8932 28.53/0.7804 27.53/0.7350 25.93/0.7819 30.16/0.9043
TABLE IV: Investigation on recursive combination in PMRB with scaling factor on different benchmarks.

In PMRN, the largest scale of PMRB is chosen as and the number of PMRB is chosen as . To show the effect of different and

, models are trained with different scales and block numbers for 200 epochs. Quantitative comparisons are made on B100 with scaling factor

. The visualization results are shown in Fig. 59. From Fig. 59, both and will affect the network performance. In general, with the increase of and , the networks will achieve better results. Compared with , counts more for the performance. On one hand, when is larger, the network will be deeper. On the other hand, with the increase of , features from more scales will be considered.

Fig. 59: Investigation on different and with scaling factor .

Study on Attentions

In PMRN, CPA is investigated for joint attention mechanism. To show the performance of proposed CPA, comparisons are designed on three testing benchmarks. We compare the models with CPA, channel-wise attention (CA) [13], and no attentions. The results are shown in Table V. From the table, model with CPA achieves the best performance on all testing benchmarks. Model with channel-wise attentions achieves better PSNR/SSIM results than the one without attentions. The results demonstrate that attention mechanism is efficient for image super-resolution.

Method Set5 Set14 Urban100
CPA 32.34/0.8971 28.71/0.7850 26.37/0.7953
CA [13] 32.31/0.8968 28.69/0.7844 26.34/0.7940
w/o 32.29/0.8965 28.68/0.7851 26.29/0.7940
TABLE V: Investigation on different normalization methods with scaling factor for different benchmarks.

To analyze the operation of CPA, attention factors , and the feature maps before and after attention are visualized in Fig. 64. From the illustrations, learned attentions are more concentrated on structural textures. and vary sharply on the area of edges and complex textures. After attentions, the features are more discriminative on structural textures, which is a convincing evidence of the attention mechanism.

(a) (a) Feature before attention
(b) (b) Feature after attention
(c) (c)  from CPA
(d) (d)  from CPA
Fig. 64: Visualization attention factors and feature maps about CPA.

Vi Conclusion

In this paper, we proposed a progressive multi-scale residual network (PMRN) with limited parameters and computation complexity for single image super-resolution (SISR) problem. Specifically, a novel progressive multi-scale residual block (PMRB) was introduced in PMRN for information exploration from various scales. Different layer combinations for multi-scale features extraction were designed in a recursive way to decrease the parameters and computation complexity, which progressively exploited the features. After feature extraction, multi-scale features were concatenated and fused for adaptive information exploration. Local residual learning was introduced into PMRB for stable training phase and information preservation. Besides structure designs, we also proposed a joint channel-wise and pixel-wise attention mechanism named CPA for better performance, which jointly learned both channel-wise and pixel-wise attentions by point-wise and depth-wise convolutions separately. Different from other attention works, scale and bias factors were explored in parallel for features. Experimental results showed our PMRN could not only achieve better PSNR/SSIM results than other lightweight works on five testing benchmarks, but also recover more complex structural textures. Meanwhile, our extension model PMRN with much fewer parameters and lower computation complexity could achieve competitive or better PSNR/SSIM results than other deep networks.

References

  • [1] E. Agustsson and R. Timofte (2017) NTIRE 2017 challenge on single image super-resolution: dataset and study. In

    2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    ,
    Vol. , pp. 1122–1131. Cited by: §V.
  • [2] N. Ahn, B. Kang, and K. Sohn (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 256–272. Cited by: (e)e, §I, §II-A, (g)g, (p)p, (y)y, §V-A, TABLE I.
  • [3] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Cited by: TABLE I, §V.
  • [4] L. Chen, J. Pan, R. Hu, Z. Han, C. Liang, and Y. Wu (2019) Modeling and optimizing of the multi-layer nearest neighbor network for face image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology (), pp. 1–1. Cited by: §I.
  • [5] J. Choi and M. Kim (2017) A deep convolutional neural network with selection units for super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 1150–1156. Cited by: TABLE I.
  • [6] T. Dai, J. Cai, Y. Zhang, S. Xia, and L. Zhang (2019) Second-order attention network for single image super-resolution. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 11057–11066. Cited by: §I, §I, §II-B.
  • [7] C. Dong, C. C. Loy, K. He, and X. Tang (2014) Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199. Cited by: §I, §II-A, §V-A, TABLE I.
  • [8] C. Dong, C. C. Loy, and X. Tang (2016) Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 391–407. Cited by: §I, §II-A, TABLE I.
  • [9] Y. Fan, H. Shi, J. Yu, D. Liu, W. Han, H. Yu, Z. Wang, X. Wang, and T. S. Huang (2017) Balanced two-stage residual networks for image super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 1157–1164. Cited by: TABLE I.
  • [10] M. Haris, G. Shakhnarovich, and N. Ukita (2018) Deep back-projection networks for super-resolution. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 1664–1673. Cited by: §I, §II-A, §V-A, TABLE I.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 770–778. Cited by: §I, §II-A.
  • [12] X. He, Z. Mo, P. Wang, Y. Liu, M. Yang, and J. Cheng (2019) ODE-inspired network design for single image super-resolution. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 1732–1741. Cited by: TABLE I.
  • [13] J. Hu, L. Shen, and G. Sun (2018) Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 7132–7141. Cited by: §I, §II-B, §IV-B, §V-B, TABLE V.
  • [14] Y. Hu, J. Li, Y. Huang, and X. Gao (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology (), pp. 1–1. Cited by: §II-B.
  • [15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 2261–2269. Cited by: §I, §II-A.
  • [16] J. Huang, A. Singh, and N. Ahuja (2015) Single image super-resolution from transformed self-exemplars. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 5197–5206. Cited by: (a)a, TABLE I, §V.
  • [17] Y. Huang, S. Lian, S. Zhang, H. Hu, D. Chen, and T. Su (2020) Three-dimension transmissible attention network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology (), pp. 1–1. Cited by: §II-B.
  • [18] Z. Hui, X. Wang, and X. Gao (2018) Fast and accurate single image super-resolution via information distillation network. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 723–731. Cited by: §I, §II-A.
  • [19] Z. Hui, X. Gao, Y. Yang, and X. Wang (2019) Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032. Cited by: §I, §I, §II-A, §II-B.
  • [20] J. Kim, J. K. Lee, and K. M. Lee (2016) Accurate image super-resolution using very deep convolutional networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 1646–1654. Cited by: §I, §II-A, (e)e, (n)n, (w)w, §V-A, TABLE I.
  • [21] J. Kim, J. K. Lee, and K. M. Lee (2016) Deeply-recursive convolutional network for image super-resolution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 1637–1645. Cited by: §I, TABLE I.
  • [22] Y. Kim, J. Choi, and M. Kim (2019) A real-time convolutional neural network for super-resolution on fpga with applications to 4k uhd 60 fps video services. IEEE Transactions on Circuits and Systems for Video Technology 29 (8), pp. 2521–2534. Cited by: §I.
  • [23] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §V.
  • [24] W. Lai, J. Huang, N. Ahuja, and M. Yang (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 5835–5843. Cited by: (d)d, §II-A, §IV-B, (f)f, (o)o, (x)x, §V-A, TABLE I.
  • [25] J. Li, F. Fang, K. Mei, and G. Zhang (2018) Multi-scale residual network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 527–542. Cited by: (f)f, §I, §I, §II-A, §IV-B, (h)h, (q)q, (z)z, (c)c, (g)g, §V-A, TABLE I.
  • [26] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu (2019) Feedback network for image super-resolution. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3862–3871. Cited by: §V-A, TABLE I.
  • [27] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee (2017) Enhanced deep residual networks for single image super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 1132–1140. Cited by: §I, §II-A, §V-A, TABLE I.
  • [28] D. Martin, C. Fowlkes, D. Tal, and J. Malik (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, pp. 416–423 vol.2. Cited by: TABLE I, §V.
  • [29] Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa (2017) Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications 76 (20), pp. 21811–21838. Cited by: TABLE I, §V.
  • [30] H. Ren, M. El-Khamy, and J. Lee (2017) Image super resolution based on fusing multiple convolution neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 1050–1057. Cited by: TABLE I.
  • [31] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 1874–1883. Cited by: §II-A.
  • [32] X. Song, Y. Dai, and X. Qin (2019) Deeply supervised depth map super-resolution as novel view synthesis. IEEE Transactions on Circuits and Systems for Video Technology 29 (8), pp. 2323–2336. Cited by: §I.
  • [33] Y. Tai, J. Yang, X. Liu, and C. Xu (2017) MemNet: a persistent memory network for image restoration. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 4549–4557. Cited by: §V-A, TABLE I.
  • [34] Y. Tai, J. Yang, and X. Liu (2017) Image super-resolution via deep recursive residual network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 2790–2798. Cited by: §I, TABLE I.
  • [35] T. Tong, G. Li, X. Liu, and Q. Gao (2017) Image super-resolution using dense skip connections. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 4809–4817. Cited by: §I, §II-A, TABLE I.
  • [36] X. Wang, R. Girshick, A. Gupta, and K. He (2018) Non-local neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 7794–7803. Cited by: §I, §II-B.
  • [37] R. Zeyde, M. Elad, and M. Protter (2010) On single image scale-up using sparse-representations. In International conference on curves and surfaces, pp. 711–730. Cited by: TABLE I, §V.
  • [38] M. Zhai, X. Xiang, R. Zhang, N. Lv, and A. El Saddik (2019)

    Optical flow estimation using dual self-attention pyramid networks

    .
    IEEE Transactions on Circuits and Systems for Video Technology (), pp. 1–1. Cited by: §II-B.
  • [39] K. Zhang, W. Zuo, and L. Zhang (2018) Learning a single convolutional super-resolution network for multiple degradations. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 3262–3271. Cited by: §V-A.
  • [40] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu (2018) Residual dense network for image super-resolution. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 2472–2481. Cited by: §I, §II-A.
  • [41] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 294–310. Cited by: §I, §I, §II-A, §II-B.