Single-image super-resolution (SISR) refers to the transformation of an image from low-resolution (LR) to high-resolution (HR). SISR is a long-standing problem in computer graphics and vision. Higher-resolution images often provide more desired information and can be applied in many domains, such as security and surveillance imaging, medical imaging, satellite imaging, and other fields. Therefore, it is necessary to explore the reconstruction performance of image super-resolution with larger upscaling factors.
Various algorithms have been introduced to solve the super-resolution (SR) problem, beginning with initial work by Freeman et al. . Currently, deep-learning-based methods, especially convolutional neural networks (CNNs), are widely used to handle image SR owing to the powerful learning ability of CNNs. Super-Resolution Convolutional Neural Network (SRCNN)  pioneered the use of three-layer CNNs to learn the mapping relationship between an interpolated image and HR image and significantly outperformed traditional non-deep learning methods. After that, Kumar et al.  tapped into the ability of polynomial neural networks to hierarchically learn refinements of a function that maps LR to HR patches. Shi et al.  developed a contextualized multitask learning framework to address the SR problem. Kim et al. proposed two neural network structures with 20-layer convolutions, termed VDSR  and DRCN  respectively, and achieved state-of-the-art performance. Lim et al. built a wide-network EDSR  using residual blocks. To generate photo-realistic natural images, Ledig et al.  presented a generative adversarial network for SR. Lai et al.  proposed a deep convolutional network within a Laplacian pyramid framework, which progressively predicts high-frequency residuals in a coarse-to-fine manner.
However, as the upscaling factor becomes higher, these methods exhibit strong visual artifacts (Figure 1) caused by their network design philosophy. Current approaches possess three inherent limitations. First, most existing methods  apply interpolation strategies such as bicubic interpolation to first process the input image to the desired size and then use CNNs to extract features and learn LR/HR mapping relations. This pre-processing step often results in visible reconstruction artifacts. Second, several methods extract raw features directly from input LR images and replace the pre-defined upsampling operator with transposed convolution  or sub-pixel convolution . These methods, however, use relatively small networks and cannot learn complicated mapping well due to a limited network capacity. Moreover, these approaches reconstruct HR images in one upsampling step at the end of the network, which increases the difficulties of training for large scaling factors (e.g., 8). Third, in the reconstruction stage, many algorithms have only one reconstruction level and cannot fully leverage more underlying information, including original and complementary information, among different recovery stages. Additionally, images reconstructed by a single-level structure lack many realistic texture details.
To address the above limitations, in this paper, we propose a deep recurrent fusion framework (DRFN) for the large-factor SR problem. As illustrated in Figure 2, we jointly extract and upsample raw features from an input LR image by putting the transposed convolution in front of the network. This design does not require a pre-defined upsampling operator (e.g., bicubic interpolation) as the pre-processing step and allows the following convolutional layers to focus on mapping in the HR feature space. After that, we use recurrent residual blocks to gradually recover high-frequency information of the HR image using fewer parameters. Then, three convolutional layers are used to extract features with different receptive field sizes at each recovery stage. In doing so, we can make full use of complementary information among three different level features. Finally, we use a convolutional layer to fuse feature maps and reconstruct HR images.
In summary, we propose a novel DRFN end-to-end framework for single-image super-resolution with high upscaling factors (4 and 8). Without extraneous steps, this DRFN training from scratch can produce HR images with more texture details and better visual performance. As demonstrated through extensive experiments, the proposed DRFN significantly outperforms existing deep learning methods in terms of accuracy and visual effects, especially when dealing with large scaling factors.
Ii Related Work
Extensive research has investigated the SR problem. In this section, we summarize the main related works with respect to conventional methods, learning-based methods, and deep-learning-based methods.
Conventional Methods. Early methods were mainly based on image interpolation, namely linear, bicubic, or Lanczos . Later, prior information was introduced to promote results, such as edge prior  and edge statistics . Michaeli et al.  utilized the recurrent property of image patches to recover an SR blur kernel. Efrat et al.  combined accurate reconstruction constraints and gradient regularization to improve reconstruction results. Although most conventional approaches are fast and generate smooth HR images, high-frequency information is difficult to recover as overly smooth solutions.
Learning-based Methods. More approaches focus on recovering complex mapping relations between LR and HR images. These mapping relations can be established by external or internal databases.
Several methods can learn LR/HR mapping relations from external databases using different models and strategies. Yang et al.  introduced sparse representations of LR/HR patch pairs. Freeman et al.  presented dictionaries of LR/HR patch pairs and reconstructed HR patches with the corresponding nearest neighbors from the LR space. Timofte et al. 
assumed all LR/HR patches lie on the manifold in the LR/HR space, so outputs were reconstructed by the retrieved patches. Additionally, K-means
and random forest algorithms were proposed to seek mapping by partitioning the image database. Methods based on external databases can obtain a mass of different prior knowledge to achieve good performance. Nevertheless, the efficiency of these approaches is rather poor given the cost of matching HR patches.
Methods based on internal databases create LR/HR patch pairs and utilize the self-similarity property of input images. Freedman et al.  used image pyramids to seek the local self-similarity property. Singh et al.  used directional frequency sub-bands to compose patches. Cui et al.  conducted image SR layer by layer. In each layer, they elaborately integrated the non-local self-similarity search and collaborative local auto-encoder. Huang et al.  warped the LR patch to find matching patches in the LR image and unwarped the matching patch as the HR patch. Methods based on internal databases have high computational costs to search patches, resulting in slow speed.
Liu et al.  proposed a group-structured sparse representation approach to make full use of internal and external dependencies to facilitate image SR. Xu et al.  proposed an integration model based on Gaussian conditional random fields, which learns the probabilistic distribution of the interaction between patch-based and deep-learning-based SR methods.
Deep-learning-based Methods. Deep learning methods have achieved great success with SR. Dong et al.  successfully pioneered a CNN to solve the SR problem. Shi et al.  presented an efficient sub-pixel convolution (ESPCN) layer to upscale LR feature maps into HR output. By doing so, ESPCN achieves a stunning average speed. Dong et al.  presented a hourglass-shaped CNN to accelerate SRCNN. Motivated by SRCNN, Kim et al. 
presented a very deep convolutional network (VDSR) to obtain a larger receptive field. VDSR achieves fast convergence via proposed residual learning and gradient clipping. Moreover, VDSR can handle multi-scale SR using a single network. Because deeper networks often introduce more parameters, recurrent learning strategies were applied in this study to reduce the number of parameters along with skip connections to accelerate convergence.
Kim et al.  presented an approach that used more layers to increase the receptive field of the network and proposed a very deep recursive layer to avoid excessive parameters. Zhang et al.  proposed an effective and fast SISR algorithm by combining clustering and collaborative representation. Tai et al. introduced recursive blocks in DRRN  and memory blocks in Memnet  for deeper networks, but each method must interpolate the original LR image to the desired size. Yang et al.  utilized the LR image and its edge map to infer sharp edge details of an HR image during the recurrent recovery process. Lai et al.  presented a Laplacian pyramid network for SR. The proposed model can predict high-frequency information with coarse feature maps. LapSRN is accurate and fast for SISR. Unlike most deep-learning methods, we adopted transposed convolution to replace bicubic interpolation to extract raw features and promote reconstruction performance. Furthermore, a multi-level structure was designed to obtain better reconstruction performance, including visual effects.
In this section, we describe the design methodology of our proposed DRFN. Figure 2 presents the DRFN for image reconstruction, which consists of three main parts: joint feature extraction and upsampling, recurrent mapping in the HR feature space, and multi-level fusion reconstruction. First, the proposed method uses multiple transposed convolution operations to jointly extract and upsample raw features from the input image. Second, two recurrent residual blocks are used for mapping in the HR feature space. Finally, DRFN uses three convolutional layers to extract features with different receptive field sizes at each recovery stage and uses one convolutional layer for multi-level fusion reconstruction. Now, we will present additional technical details of each part of our model.
Iii-a Joint Feature Extraction and Upsampling
In this subsection, we show how to jointly extract and upsample raw features. The transposed convolutional layer consists of diverse, automatically learned upsampling kernels, from which raw features can be extracted simultaneously from the input image to achieve upsampling. For higher SR factors, compared with bicubic interpolation, transposed convolution can alleviate training difficulties while effectively suppressing artifacts. Therefore, we use transposed convolutions to amplify the original LR input image () to the desired size.
FSRCNN  extracts feature maps in the LR space and replaces the bicubic upsampling operation with one transposed convolution layer at the end of the network, and LapSRN progressively reconstructs an HR image throughout the structure. By contrast, the proposed method first puts the transposed convolution at the forefront of the network for joint feature extraction and upsampling. This setting is conducive to allowing the rest of the network to extract features in the HR feature space to further improve performance. Moreover, extracting raw features from the original image enhances the reconstruction details, which is beneficial in generating large-scale and visually satisfying images. The upsampling process can be formulated as
where represents the transposed convolution operation that doubles the size of the image, and
is a non-linear operation achieved using a parametric rectified linear unit (PReLU).denotes the feature maps extracted from the transposed convolution of step . When , is the input LR image . Then, the size of the input image is magnified by iterating Eq. 1. The number of iterations can be adjusted to determine the SR scale factor; for example, the scale factor is when .
Iii-B Recurrent Mapping in HR Feature Space
In this subsection, we show the structure of our recurrent residual block and how recurrent blocks gradually recover high-frequency information in HR images. In ResNet 
, the basic residual unit uses batch normalization (BN)
and ReLU as the activation function after the weight layers. BN layers normalize features to limit networks’ range of flexibility and occupy considerable GPU memory; therefore, we removed the batch normalization layers from the proposed network as Nah et al. did in their image-deblurring work. In addition, we replaced ReLU with parametric ReLU to avoid ”dead features” caused by zero gradients in ReLU. Our basic unit for mapping in an HR feature space is illustrated in Figure 3. Deep models are prone to overfitting and become disk hungry, hence our adoption of recurrence learning strategies to reduce the number of parameters. Skip connections were used in recurrent blocks to provide fast and improved convergence.
The size of the convolution kernel used for feature extraction was 3, and padding was set to 1 to prevent the size of the feature maps from changing. We used two recurrent residual blocks, each of which looped 10 times. Therefore, our network had a much larger receptive field, which benefits large factors. Different from other methods that map from an LR feature space to HR feature space, the proposed DRFN used recurrent blocks to map in the HR feature space. Output feature maps of the recurrent residual block are progressively updated as follows:
where represents the -th cyclically generated feature maps; and and denote convolution and PReLU operation, respectively.
Iii-C Multi-level Fusion Reconstruction
In this subsection, we show how to fuse different-level features and perform HR image reconstruction. A larger SR factor requires more diverse feature information; to meet this need, we propose fusing different-level features to recover the final HR image. As shown in Figure 2, three convolutional layers were used to automatically extract features at different levels. Then, we concatenated these features and ultimately apply one convolutional layer to integrate features with different receptive field sizes. Each recurrent residual block gradually refined the rough image information from the front block but may lose original feature information for reconstruction in this process. Therefore, different-level feature information must be integrated, including refined information and easy-to-lose information (i.e., original feature information), to make full use of complementary information among three different level features. Corresponding experiments demonstrated that fusion networks with three levels improve reconstruction accuracy and visual performance compared with single and double levels, especially for larger-scale factors.
We chose mean square error (MSE) as the loss function. Letdenote the input LR image patch and indicate the corresponding HR image patch. Given a training dataset containing patches, the goal is to minimize the the following formula:
represents the feed-forward neural network parameterized by19].
In this section, we first describe the training and test datasets of our method. We then introduce the implementation details of the algorithm. Next, we compare our proposed method with several state-of-the-art SISR methods and demonstrate the superiority of DRFN. Finally, the contributions of different components are analyzed.
Training dataset: RFL  and VDSR  use a training dataset of 291 images, containing 91 images from Yang et al.  and 200 images from the Berkeley Segmentation Dataset . We also used 291 images to ensure fair comparison with other methods. In addition, we rotated the original images by , , and and flipped them horizontally. After this process, each image had eight versions for a total training set of images.
|PSNR SSIM IFC||PSNR SSIM IFC||PSNR SSIM IFC||PSNR SSIM IFC||PSNR SSIM IFC|
|2||33.64 0.9293 5.714||30.31 0.8693 5.699||29.55 0.8432 5.256||26.88 0.8409 6.191||30.03 0.8667 5.970|
|A+||2||36.55 0.9545 8.465||32.40 0.9064 8.001||31.23 0.8868 7.282||29.24 0.8944 8.246||32.05 0.8998 6.913|
|JOR||2||36.58 0.9543 8.511||32.38 0.9063 8.052||31.22 0.8867 7.321||29.25 0.8951 8.301||32.05 0.8998 6.969|
|SRCNN||2||36.35 0.9521 7.522||32.29 0.9046 7.227||31.15 0.8851 6.653||29.10 0.8900 7.446||31.98 0.8970 6.374|
|FSRCNN||2||37.00 0.9557 8.047||32.75 0.9095 7.727||31.51 0.8910 7.068||29.88 0.9015 8.005||32.52 0.9031 6.712|
|VDSR||2||37.53 0.9587 8.580||33.15 0.9132 8.159||31.90 0.8960 7.494||30.77 0.9143 8.605||33.22 0.9106 7.096|
|LapSRN_x2||2||37.44 0.9581 8.400||33.06 0.9122 8.011||31.78 0.8944 7.293||30.39 0.9096 8.430||32.98 0.9082 6.912|
|DRFN_x2||2||37.71 0.9595 8.927||33.29 0.9142 8.492||32.02 0.8979 7.721||31.08 0.9179 9.076||33.42 0.9123 8.002|
|3||30.39 0.8673 3.453||27.62 0.7756 3.327||27.20 0.7394 3.003||24.46 0.7359 3.604||27.91 0.7995 3.363|
|A+||3||32.59 0.9077 4.922||29.24 0.8208 4.491||28.30 0.7844 3.971||26.05 0.7984 4.812||29.42 0.8351 4.034|
|JOR||3||32.55 0.9067 4.892||29.19 0.8204 4.485||28.27 0.7837 3.966||25.97 0.7972 4.766||29.34 0.8343 4.028|
|SRCNN||3||32.39 0.9026 4.315||29.11 0.8167 4.027||28.22 0.7809 3.608||25.87 0.7889 4.240||29.27 0.8294 3.617|
|FSRCNN||3||33.16 0.9132 4.963||29.55 0.8263 4.551||28.52 0.7901 4.025||26.43 0.8076 4.841||29.78 0.8376 4.078|
|VDSR||3||33.66 0.9213 5.203||29.88 0.8330 4.692||28.83 0.7976 4.151||27.14 0.8284 5.163||30.37 0.8509 4.251|
|LapSRN_x4111Due to the network design of LapSRN, scale factors for training are limited to the power of 2 (e.g., 2, 4, or 8). LapSRN performs SR to other scales by first upsampling input images to a larger scale and then downsampling the output to the desired resolution. As mentioned in their paper, we tested the results for 3 SR by using their 4 model.||3||33.78 0.9209 5.079||29.87 0.8328 4.552||28.81 0.7972 3.946||27.06 0.8269 5.019||30.32 0.8497 4.085|
|DRFN_x3||3||34.01 0.9234 5.421||30.06 0.8366 4.897||28.93 0.8010 4.281||27.43 0.8359 5.481||30.59 0.8539 4.582|
|4||28.42 0.8099 2.342||26.00 0.7025 2.259||25.96 0.6692 2.021||23.15 0.6592 2.355||26.70 0.7530 2.137|
|A+||4||30.28 0.8587 3.248||27.32 0.7497 2.962||26.82 0.7100 2.551||24.34 0.7201 3.180||27.92 0.7877 2.660|
|JOR||4||30.19 0.8563 3.190||27.27 0.7479 2.923||26.79 0.7083 2.534||24.29 0.7181 3.113||27.87 0.7865 2.630|
|SRCNN||4||30.48 0.8618 2.991||27.50 0.7517 2.751||26.90 0.7115 2.396||24.16 0.7066 2.769||28.18 0.7903 2.492|
|FSRCNN||4||30.70 0.8646 2.986||27.59 0.7539 2.707||26.96 0.7174 2.359||24.62 0.7281 2.907||28.16 0.7895 2.412|
|VDSR||4||31.35 0.8838 3.542||28.02 0.7678 3.106||27.29 0.7252 2.679||25.18 0.7534 3.462||28.77 0.8056 2.820|
|LapSRN_x4||4||31.54 0.8852 3.515||28.19 0.7716 3.089||27.32 0.7275 2.618||25.21 0.7554 3.448||28.82 0.8082 2.785|
|DRFN_x4||4||31.55 0.8861 3.693||28.30 0.7737 3.250||27.39 0.7293 2.766||25.45 0.7629 3.693||28.99 0.8106 2.954|
Upscaling Factor 8
Test dataset: We evaluated the results of five test sets—Set5 , Set14 , BSDS100 , Urban100, and ImageNet400—which contained , , , , and images, respectively. In these datasets, Set5, Set14, and BSDS100 were composed of natural scenes and have been used often in other studies. Urban100 was created by Huang et al. 
and includes 100 images of various real-world structures, which presents challenges for many methods. Images in ImageNet400 are randomly selected from ImageNet.
Iv-B Implementation Details
We converted original-color RGB images into grayscale images and performed training and testing on the luminance channel. We generated LR training images using the bicubic downsampling and cut them into patches with a stride of four. Approximately
patches were generated after this operation. We set the mini-batch size of SGD with momentum to 32, such that each epoch containediterations for training. In addition, we set the momentum parameter to 0.9 and weight decay to .
All PReLUs were initially set to 0.33. The stride of transposed convolution was to ensure that each transposed convolution would magnify the image twice. The kernel size of each convolution was . We used the same strategy as He et al.  for convolution weight initialization. The initial learning rate was set to 0.1 and then reduced by a factor of 10 every 10 epochs. We also adopted adjustable gradient clipping  to ease the difficulty of training the network. The gradient of each iteration update was limited to , where is the maximum value of each update step size and is the current learning rate. We stopped training when the loss ceased to fall. It took approximately 3 days to train the network using an NVIDIA GTX Ti graphics card.
Iv-C Benchmark Results
Quantitative Evaluation. Here, we provide quantitative comparisons for 2, 3, and 4 SR results in Table I and 8 SR results in Table II, respectively. We compare our proposed method with bicubic interpolation and the following six state-of-the-art SR methods: A+ , JOR , SRCNN , FSRCNN , VDSR , and LapSRN . We evaluated SR images based on three commonly used image quality metrics: peak signal-to-noise ratio (PSNR), structural similarity (SSIM) , and information fidelity criterion (IFC) . In particular, IFC has been shown to be related to human visual perception . For fair comparison of the 8 factor, we followed LapSRN  in using their datasets generated by retrained models of A+ , SRCNN , FSRCNN , and VDSR . For scale factors of four and eight, our approach was superior to other SR methods on all datasets.
, findings show that the proposed DRFN estimated better visual details. For instance, images generated by other SR methods exhibited visible artifacts, whereas the proposed method generated a more visually pleasant image with clean details and sharp edges. For example, in Figure5, the results of other methods are completely blurred, and only our result demonstrates clear textures. The experimental results show that the proposed method can achieve good visual performance.
Iv-D Model Analysis
In this subsection, we first compare and SR results of the proposed DRFN with existing methods. Then, we study the contributions of different components of the proposed DRFN to SR reconstruction and explore the effects of cycle times of recurrent blocks on reconstruction performance.
Comparisons of and SR Results. Although our work is intended for for large-factor SR problems, the proposed DRFN can also perform and SR well. The quantitative results for and SR are presented in Table I. The proposed DRFN significantly outperformed existing methods on and SR results, suggesting that our methodology is reasonable and effective. This DRFN is hence powerful enough to handle different scaling factors.
Transposed Convolution. First, to verify the superiority of transposed convolution compared to bicubic interpolation, we used an interpolated image as input and replaced the transposed convolution with general convolution. Second, we used the original small image as input and placed transposed convolution at the final part of the network to enlarge the image. By doing so, we can prove that the location of transposed convolution has an effect on reconstruction results. We carried out experiments on 4 and 8 scale factors; results are shown in Table III. In the table, 4-prebic and 4(8)-posttransconv represent the above two mentioned contrast experiments, respectively; and 4(8)-pretransconv is our DRFN_4(8). Visual results are shown in Figure 9(the three-level is DRFN_4). Quantitative and qualitative results indicate that using transposed convolution to replace bicubic interpolation and placing the transposed convolution at the forefront of the network can boost performance. For example, 4-pretransconv was higher than 4-prebic on Set5, and 8-pretransconv was higher than 8-posttransconv on BSDS100. Although pretransconv and posttransconv had the same number of model parameters, pretransconv was more computationally intensive in the prediction phase because it caused the following convolution layers to recover high-frequency information in the HR space. The average inference times of post-transconv and pre-transconv for each image in Set5  with a scale factor of 4 were 0.584s and 1.187s, respectively. For greater performance improvements, we chose pretransconv as our model.
Recurrent residual learning. Kim et al.  demonstrated that the performance of the network improved as the depth increased. However, deeper networks need to train more parameters. Our use of a recurrent learning strategy greatly reduced the model complexity. For instance, for a recurrent residual block with three convolution layers, looping five times eliminated parameters. After calculation, our DRFN contained total parameters. Figure 10 shows the PSNR performance of several recent CNN models for SR versus the number of parameters. The proposed method achieved better performance with an appropriate number of model parameters.
Multi-level structure. To verify that the multi-level structure demonstrated an improved role in image reconstruction, (1) we removed the first two levels from DRFN (denoted as one-level), and (2) removed Level 2 from DRFN (denoted as two-level) for experimental comparison. The result of the three-level network was best as shown in Table IV. As displayed in Figure 9, the three-level network reconstructed the image with richer texture details compared with the one-level and two-level networks. Each level had a positive effect on the result. Taking the image “butterfly” in Set5 as input, feature maps of different levels appear in Figure 8. These results suggest that the features at each recovery stage had different context characteristics. The multi-level structure rendered our model more robust and more accurate.
Network depth. We also studies the effect of the cycle times of the recurrent block. Different cycle times indicated that the network had a different depth. We set the number of cycles to , , and , respectively, and the depth of the two recurrent blocks remained the same. We did not continue to train deeper networks due to GPU memory limitations. The experimental results in Table V show that increasing the number of cycles can boost performance but also increases time consumption. To achieve better results, we chose to cycle 10 times as a benchmark.
In this paper, we propose a DRFN for large-scale accurate SISR. Our DRFN uses transposed convolution to jointly extract and upsample raw features, and the following convolution layers focus on mapping in the HR feature space. High-frequency information is gradually recovered by recurrent residual blocks. Multi-level fusion makes full use of potential information for HR image reconstruction. The proposed DRFN extends quantitative and qualitative SR performance to a new state-of-the-art level. Extensive benchmark experiments and analyses indicate that DRFN is a superior SISR method, especially for large factors.
As this DRFN has achieved outstanding performance on 4 and 8, we intend to apply it to more challenging up-scaling factors such as 12. We also plan to generalize our method for other applications, such as denoising and deblurring.
-  (2011) Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence 33 (5), pp. 898–916. Cited by: Fig. 6, Fig. 7, §IV-A, TABLE III, TABLE IV.
-  (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Cited by: §IV-A, §IV-D, TABLE III, TABLE IV.
Deep network cascade for image super-resolution.
European Conference on Computer Vision, pp. 49–64. Cited by: §II.
-  (2015) Jointly optimized regressors for image super-resolution. In Computer Graphics Forum, Vol. 34, pp. 95–104. Cited by: §IV-C, TABLE I.
Soft edge smoothness prior for alpha channel super resolution.
IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: §II.
-  (1962) Bicubic spline interpolation. Studies in Applied Mathematics 41 (1-4), pp. 212–218. Cited by: Fig. 4, Fig. 5, Fig. 6, Fig. 7, TABLE I, TABLE II.
-  (2009) Imagenet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Cited by: §IV-A.
-  (2016) Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38 (2), pp. 295–307. Cited by: §I, §I, §II, Fig. 4, Fig. 5, Fig. 6, Fig. 7, §IV-C, TABLE I, TABLE II.
-  (2016) Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pp. 391–407. Cited by: §I, §II, §III-A, Fig. 4, Fig. 5, Fig. 6, Fig. 7, §IV-C, TABLE I, TABLE II.
-  (1979) Lanczos filtering in one and two dimensions. Journal of Applied Meteorology 18 (8), pp. 1016–1022. Cited by: §II.
-  (2013) Accurate blur models vs. image priors in single image super-resolution. In IEEE International Conference on Computer Vision, pp. 2832–2839. Cited by: §II.
-  (2007) Image upsampling via imposed edge statistics. In ACM Transactions on Graphics, Vol. 26, pp. 95. Cited by: §II.
-  (2011) Image and video upscaling from local self-examples. ACM Transactions on Graphics 30 (2), pp. 12. Cited by: §II.
-  (2002) Example-based super-resolution. IEEE Computer graphics and Applications 22 (2), pp. 56–65. Cited by: §I, §II.
-  (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In IEEE International Conference on Computer Vision, pp. 1026–1034. Cited by: §IV-B.
-  (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §III-B.
-  (2015) Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206. Cited by: Fig. 1, §II, Fig. 4, §IV-A.
Batch normalization: accelerating deep network training by reducing internal covariate shift.
International Conference on Machine Learning, pp. 448–456. Cited by: §III-B.
-  (2014) Caffe: convolutional architecture for fast feature embedding. In ACM international conference on Multimedia, pp. 675–678. Cited by: §III-C.
-  (2016) Accurate image super-resolution using very deep convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654. Cited by: §I, §I, §II, Fig. 4, Fig. 5, Fig. 6, Fig. 7, §IV-A, §IV-B, §IV-C, §IV-D, TABLE I, TABLE II.
-  (2016) Deeply-recursive convolutional network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645. Cited by: §I, §I, §II.
-  (2016) Fast learning-based single image super-resolution. IEEE Transactions on Multimedia 18 (8), pp. 1504–1515. Cited by: §I.
-  (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §I, §II, Fig. 4, Fig. 5, Fig. 6, Fig. 7, §IV-C, TABLE I, TABLE II.
-  (2016) Photo-realistic single image super-resolution using a generative adversarial network. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690. Cited by: §I.
-  (2017-07) Enhanced deep residual networks for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §I.
-  (2017) Retrieval compensated group structured sparsity for image super-resolution. IEEE Transactions on Multimedia 19 (2), pp. 302–316. Cited by: §II.
-  (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE International Conference on Computer Vision, Vol. 2, pp. 416–423. Cited by: §IV-A.
-  (2013) Nonparametric blind super-resolution. In IEEE International Conference on Computer Vision, pp. 945–952. Cited by: §II.
-  (2016) Deep multi-scale convolutional neural network for dynamic scene deblurring. arXiv preprint arXiv:1612.02177. Cited by: §III-B.
-  (2015) Fast and accurate image upscaling with super-resolution forests. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3791–3799. Cited by: §II, §IV-A.
-  (2005) An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Transactions on Image Processing 14 (12), pp. 2117–2128. Cited by: §IV-C.
-  (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883. Cited by: §I, §II.
-  (2017) Structure-preserving image super-resolution via contextualized multi-task learning. IEEE Transactions on Multimedia. Cited by: §I.
-  (2014) Super-resolution using sub-band self-similarity. In Asian Conference on Computer Vision, pp. 552–568. Cited by: §II.
-  (2017) MemNet: a persistent memory network for image restoration. In Proceedings of International Conference on Computer Vision, Cited by: §II.
-  (2017) Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §II.
-  (2014) A+: adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision, pp. 111–126. Cited by: §II, Fig. 4, Fig. 5, Fig. 6, Fig. 7, §IV-C, TABLE I, TABLE II.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §IV-C.
-  (2018) Efficient image super-resolution integration. The Visual Computer 34 (6-8), pp. 1065–1076. Cited by: §II.
-  (2014) Single-image super-resolution: a benchmark. In European Conference on Computer Vision, pp. 372–386. Cited by: §IV-C.
-  (2013) Fast direct super-resolution by simple functions. In IEEE International Conference on Computer Vision, pp. 561–568. Cited by: §II.
-  (2010) Image super-resolution via sparse representation. IEEE Transactions on Image Processing 19 (11), pp. 2861–2873. Cited by: §II, §IV-A.
-  (2017) Deep edge guided recurrent residual learning for image super-resolution. IEEE Transactions on Image Processing 26 (12), pp. 5895–5907. Cited by: §II.
-  (2010) On single image scale-up using sparse-representations. In International conference on curves and surfaces, pp. 711–730. Cited by: Fig. 10, §IV-A, TABLE III, TABLE IV.
-  (2016) CCR: clustering and collaborative representation for fast single image super-resolution. IEEE Transactions on Multimedia 18 (3), pp. 405–417. Cited by: §II.