1 Introduction
Single image superresolution (SISR) is a classical image restoration problem which aims to recover a highresolution (HR) image from the corresponding lowresolution (LR) image. In SISR problems, the given image is usually assumed to be a lowpass filtered and downsampled version of an HR image. Hence, recovering the HR is an illposed problem since multiple HR images can correspond to one LR image. That is, the SISR is a challenging onetomany problem which attracted researchers to find many interesting solutions and applications, and thus numerous algorithms have been proposed so far.
Recently, convolutional neural networks (CNNs) have shown great success in most computer vision areas including the SISR. In typical CNNbased SISR methods, the distortionoriented loss functions are considered. Specifically, the CNNs attempt to achieve higher peaksignaltonoise ratio (PSNR),
i.e., low distortion in terms of mean squared error (MSE). There have been lots of distortionoriented CNNs for SISR [5, 18, 30, 19, 21, 32, 36, 23, 42, 15, 11], and the performance of SISR is ever increasing as many researchers are still creating innovative architectures and also as the possible depth and connections of the networks are growing. However, they yield somewhat blurry results and do not recover the fine details even with very deep and complex networks. It is because the distortionoriented models’ results are the average of possible HR images.To resolve the abovestated issues, perceptionoriented models have also been proposed for obtaining better perceptual quality HR images. For some examples, the perceptual loss was introduced in [16], which is defined as the distance in the feature domain. More recently, SRGAN [22] and EnhanceNet [29] have been proposed for producing better perceptual quality. The SRGAN employed generative models, particularly the generative adversarial nets (GAN) [8], and adopted the perceptual loss. The EnhanceNet added an additional texture loss [7] for better texture reconstruction. However, they sometimes generate unpleasant and unnatural artifacts along with the reconstructed details.
There have also been some methods that consider the naturalness of superresolved images. One of these approaches is to implicitly supervise the naturalness through the refined dataset. Specifically, as the CNN is very sensitive to the training dataset, several methods [23, 42] considered using the refined dataset. For example, patches with low gradient magnitudes are discarded from the training dataset, which provides better naturalness implicitly. This approach might increase the PSNR performance by constraining the possible HR space to the richtextured one. Another approach is to provide explicit supervision by conditioning the feature spaces. For example, the recently developed SFTGAN [37] has shown great perceptual quality by constraining the features with its highlevel semantics while adopting the adversarial loss. However, its practical usage is limited because it requires the categorical prior, and also it is limited to the categories which are included in the training process. For the outofcategory inputs, this framework is the same as SRGAN [22]. Moreover, SFTGAN strongly relies on the ability of the adopted semantic segmentation method because the wrong designation of semantics might cause worse perceptual quality.
For obtaining realistic and natural perceptual quality HR images, we propose a new SISR approach which constrains the lowlevel domain prior instead of highlevel semantics. For this, we first investigate the process and the domain knowledge of SISR. By exploiting the domain knowledge, we explicitly model the HR space of corresponding LR image, and build a discriminator which determines the decision boundary between the natural manifold and unnatural manifold. By constraining the output image into the natural manifold, our generative model can target only one of the multimodal outputs in the desired target space. As a results, our method shows less artifacts than other perceptionoriented methods as shown in Figure 1.
In summary, the main contributions of this paper are as follows.

We model the SISR problem explicitly and investigate the desirable HR space.

We design a CNNbased natural manifold discriminator and show our model is reasonable.

We adopt a CNN structure with fractal residual learning (FRL) and demonstrate a distortionoriented model named fractal residual superresolution (FRSR), which achieves comparable results to recent CNNs.

We propose a perceptionoriented SISR method named as natural and realistic superresolution (NatSR), which generates realistic textures and natural details effectively while achieving high perceptual quality.
The rest of this paper is organized as follows. In Sec. 3, we explicitly model the LRHR space and the SISR problem, and investigate its inherent properties. Then in Sec. 4, we divide the target HR space into three disjoint sets where two sets are in the unnatural manifold and the one is in the natural manifold. In Sec. 5, we demonstrate our main method and the NatSR, and in Sec. 6 we discuss and analyze the feasibility in several ways. The experimental results are shown in Sec. 7.
2 Related Work
2.1 Single Image SuperResolution
The conventional nonCNN methods mainly focused on the domain and feature priors. Early methods explored the domain priors to predict missing pixels. For example, interpolation methods such as bicubic and Lanczos generate the HR pixels by the weighted average of neighboring LR pixels. Later, the priors such as edge feature, gradient feature
[33, 31] and internal nonlocal similarity [14] were investigated. Also, dictionary learning sparse coding methods were exploited for the SISR [40, 6, 39, 35]. Recently, it has been shown that CNNbased methods outperform the earlier nonCNN algorithms, showing great breakthrough in accuracy. These CNNbased methods implicitly adopt image and domain priors which are inscribed in training datasets. The SRCNN [5] was the first CNNbased method which uses three convolution layers, and many other works with deeper and heavier structure have been proposed afterward [18, 30, 19, 32, 21, 36, 23, 42, 15, 11]. All these methods are discriminative and distortionoriented approaches, which aim to achieve higher PSNR.2.2 Perception Oriented SuperResolution
The problem of distortionoriented models recently drew the attention of researchers that the superresolved results often lack the highfrequency details and are not perceptually satisfying. Also, Blau et al. [4] showed that there is a tradeoff between the perceptual quality and distortion, and some perceptionoriented models have been proposed accordingly. For example, Johnson et al. [16] have shown that the loss in the pixel domain is not optimal for the perceptual quality, and instead, the loss in the feature space might be closer to the human perception model. Then, Ledig et al. [22] introduced the SRGAN which adopted the generative model with GAN [8] and employed the perceptual loss as in [16]. Hence, unlike the distortionoriented methods that produce the average of possible HR images, the SRGAN generates one of the candidates in the multimodal target HR space. EnhanceNet [29] goes one step further by exploiting the texture loss [7] for better producing image details. However, due to the inherent property of onetomany inverse problem, it is required to consider the semantics for the generated pixels. In this respect, SFTGAN [37] restricts the feature space by conditioning the semantic categories of target pixels.
3 Modeling the SISR
A simple explanation of LRHR relationship and SISR in the frequency domain.
In this section, we explicitly define and model the LRHR space and the SISR problem. First of all, let us define the LR image as the lowpass filtered and downsampled HR image . Formally, the LRHR relation is described as
(1) 
where denotes a lowpass filter and denotes downsampling. Figure 1(a) and Figure 1(b) show a simple explanation of HR and LR correspondence in the frequency domain where we assume that the spatial domain is infinite. Both Figure 1(c) and Figure 1(d) are possible HRs for the corresponding LR in Figure 1(b), and moreover, there can be infinite number of possible HRs that have the same low frequency components but different highfrequency parts (denoted noisy in Figure 1(d)). As the SISR is to find an HR for the given LR, it is usually modeled as finding the conditional likelihood . Due to its onetomany property, it is better to model it as a generative model rather than a discriminative one.
4 Natural Manifold Discrimination
4.1 Designing Natural Manifold
We now go into the real situation to find the natural manifold. Figure 3 shows our LRHR image space modeling, where is the overall image set with height , width , and channel with the normalized pixel value. For a certain , is the space whose elements all results into the same by the lowpass filtering and downsampling. Conversely, an LR image is mapped to an element in by any SR method. We may also interpret the early CNNs with our LRHR model. For the distortionoriented models, the output is the average of the elements in the HR space, i.e., where , for some and weights , and thus the result is blurry. To alleviate this problem, some methods [23, 42] refined the training set. Specifically, they discarded the training patches with low gradient magnitudes, which gives implicit constraints on the candidate ’s to keep the resulting outputs away from the blurry images.
To model the natural manifold, we divide into three disjoint sets as illustrated in Figure 3. The first one is the blurry set , the elements of which are modeled as the convex combination of interpolated LR and the original HR. Specifically, the set is defined as
(2) 
where is the same lowpass filter as in eq. (1), and denotes upsampling with zero insertion between original values. Hence, corresponds to Figure 1(c) which also means the interpolation of to the size of . Also, the is a hyperparameter which decides the decision boundary between the set and , i.e., between the Figure 1(c) and Figure 1(a). We can easily show that the defined above is also an element of , i.e., . To be specific, if we apply lowpass filtering and downsampling to the , it becomes an LR as follows:
(3)  
(4)  
(5)  
(6)  
(7)  
(8) 
Hence, from eq.(1), it is shown that . In other words, the weighted sum of Figure 1(c) and Figure 1(a) is of course in the .
The second set to consider is the noisy set , which contains the images like Figure 1(d). Specifically, we can model the set as:
(9) 
where
is the noise in the highfrequency, with standard deviation
. We can also see that , because(10)  
(11)  
(12)  
(13)  
(14) 
Also, can be interpreted as the convex combination of and (weighted sum of Figure 1(a) and Figure 1(d)), because
(15)  
(16)  
(17) 
where .
The blurry and noisy are used for training our natural manifold discriminator that will be explained in the next subsection. In practice, we perform the noise injection in the frequency domain using 2Ddiscrete cosine transform (DCT). We set the lowpass filter for up/downsampling in eq.(1) and eq.(2) as the bicubic filter, and its DCT is shown in Figure 4. To generate a wide range of noisy images, we inject the noise into the last column and row. In the experiments, we use the 2DDCT for brevity.
4.2 Natural Manifold Discriminator
To narrow the target space to the natural manifold, we design a discriminator that differentiates the natural image (the elements that belong to as in Figure 3) from the blurry/noisy ones ( or
). For this, we design a CNNbased classifier that discriminates
(natural manifold) and (unnatural manifold), which will be called natural manifold discriminator (NMD). The training is performed with the sigmoid binary cross entropy loss function defined as(18) 
where denotes the output sigmoid value of NMD. For the expectation, we use the empirical mean of the training dataset. The network architecture of our NMD is shown in Figure 5, which is a simple VGGstyle CNN. Fullyconnected layers for the last stage is not used in our case. Instead, one convolution layer and global average pooling are used.
For the training, we start from and . We update both hyperparameters according to the average of 10 validation accuracies (AVA). When it reaches above , we update and following the rules below:
if  (19)  
(20)  
if  (21)  
(22) 
We stop training with the final and equal to and , respectively.
5 Natural and Realistic SuperResolution
In this section, we explain the proposed natural and realistic superresolution (NatSR) generator model and the training loss function.
5.1 Network Architecture
The overall architecture of our NatSR is shown in Figure 6, which takes the a the input and generates the SR output. As shown in the figure, our network is based on residual learning, which has long been used as a basic skill to mitigate the degradation problem in very deep networks. Typically, two types of residual learnings are used: local residual learning (LRL) which bypasses the input to the output in a local range [12], and global residual learning (GRL) which provides the skipconnection between the input and the output in a global scale of the network [18]. Former approaches [18, 10] have shown that learning the sparse features is much more effective than learning the pixel domain values directly. Hence, recent models adopt both local residual learning (shortpath) and global residual learning (longpath) [22, 23, 42].
Inspired by former studies, we adopt a connection scheme shown in Figure 6, named as fractal residual learning (FRL) structure in that the connection has a fractal pattern. Also, as a basic building block of our NatSR, we employ the residual dense block (RDBlock) [42] shown in Figure 7, and adopt the residual scaling [23] in our RDBlock. By using the FRL and RDBlock, all from short to longpath skipconnection can be employed.
As a discriminator for GAN, we apply a similar network architecture as NMD. Instead of using only convolution layers, we adopt spectral normalization [28]
to make the discriminator satisfy Lipschitz condition. Also, we use strided convolutions instead of maxpooling layers. Specific architecture details are provided in the
supplementary material.5.2 Training Loss Function
5.2.1 Reconstruction Loss
To model the , we adopt the pixelwise reconstruction loss, specifically the mean absolute error (MAE) between the groundtruths and the superresolved images:
(23) 
where denotes the superresolved output. Although all the perceptionoriented models apply perceptual losses, we do not adopt such losses, because it is found that the perceptual loss causes undesirable artifacts in our experiments. To boost highfrequency details, we instead use our NMD as a solution.
5.2.2 Naturalness Loss
We design the naturalness loss based on our pretrained natural manifold discriminator (NMD). To concentrate the target manifold within the natural manifold, the output of NMD should be nearly . We may use the loss as a negative of the sigmoid output, but we use its logscale to boost the gradients:
(24) 
where denotes the output sigmoid value of NMD.
5.2.3 Adversarial Loss
As it is wellknown that GANs are hard to train and unstable, there have been lots of variations of GANs [43, 2, 9, 25, 17]. Recently, GAN with relativistic discriminator has been proposed [17], which shows quite robust results with standard GAN [8] in generating fake images in terms of Fréchet Inception Distance [13]. Thus, we employ RaGAN for our adversarial training, which is described as:
(25)  
(26) 
where and are distributions of HR and SR respectively, and mean real and fake data respectively, and
(27)  
(28) 
where
denotes the output logit of discriminator. In our case, the motivation of RaGAN discriminator is to measure “the probability that the given image is closer to real HR images than the generated SR images on average.”
5.2.4 Overall Loss
The overall loss term to train our NatSR is defined as the weighted sum of loss terms defined above:
(29) 
As our baseline, we train the distortionoriented model where , which means that the overall loss is just the reconstruction loss . We name our baseline model as fractal residual superresolution network (FRSR). For our NatSR which is perceptionoriented, we use the full loss above with , and .
6 Discussion and Analysis
6.1 Effectiveness of Proposed Discriminator
To demonstrate the meaning and effectiveness of our NMD, we test the NMD scores for the perceptionoriented methods such as SRGAN variants [22], EnhanceNet, NatSR, and also for the distortionoriented methods including our FRSR. Table 1 shows the results on BSD100 [26], where the NMD is designed to output score 1 when the input image is close to the natural original image, and output lower score when the input is blurry or noisy. We can see that previous perceptionoriented methods score between and which means that they lie near the boundary of the natural and unnatural manifold in our LRHR model. Also, the original HR scores and bicubic interpolation scores , which means that our NMD discriminates HR and LR with high confidence. Additionally, SRResNet, EDSR, and our FRSR, which are distortionoriented, score almost . We may interpret the result that the distortionoriented methods produce the image which also lie on the blurry manifold. On the other hand, our NatSR results in the scores close to which is much higher than the other perceptionoriented algorithms. In summary, it is believed that our model of natural manifold and NMD are reasonable, and the NMD well discriminates the natural and unnatural manifold.
Method  NMD Score 

HR  
Bicubic  
SRResNet  
EDSR  
FRSR (Ours)  
SRGANMSE  
SRGANVGG22  
SRGANVGG54  
EnhanceNetPAT  
NatSR (Ours) 
6.2 Study on the Plausibility of SR Images
As we approach the SISR by interpreting the input and output images in our LRHR space model, we analyze the plausibility of superresolved images of various methods according to our model. The superresolved images must lie on the set in Figure 3, which means that the downsampling of a superresolved image must be in the LR space, i.e., it must be similar to the input LR image as
(30) 
For the analysis, we show the RGBPSNR between and in Table 2 which are tested on Set5 [3]. The results are in the ascending order of SRGAN, EnhanceNet, and our NatSR. Even though we do not give any constraints on the LR space, our NatSR results mostly lie on the feasible set . On the other hand, SRGAN result is about dB, which means that the SRGAN barely reflects the LRHR properties.
Method  RGBPSNR (dB) 

SRGAN  
ENetPAT  
NatSR 
7 Experimental Results
7.1 Implementation details
We train both NMD and NatSR (including FRSR) with recently released DIV2K [34] dataset which consists of highquality (2K resolution) training images, validation images, and test images. The size of the input LR patch is set to , and we only train with scaling factor . ADAM optimizer [20] is used for training with the initial learning rate of
, and halved once during the training. We implement our code with Tensorflow
[1]. For the test, we evaluate our model with famous SISR benchmarks: Set5 [3], Set14 [41], BSD100 [26], and Urban100 [14].7.2 Evaluation Metrics and Comparisons
For the evaluation of distortionoriented models, popular FRIQA (full reference image quality assessment), PSNR and SSIM (structure similarity) [38] are used. But since these measures are not appropriate for measuring the quality of perceptual models, we use one of the recently proposed NRIQA (no reference image quality assessment) called NQSR [24] which is for SISR and wellknown for Ma et al.’s score. Additionally, another NRIQA, NIQE [27] is used to measure the naturalness of images. The higher NQSR and the lower NIQE mean the better perceptual quality. However, it is questionable whether so many variants of NRIQA methods perfectly reflect the human perceptual quality. Hence, we need to use the NRIQA results just for rough reference.
7.3 FRIQA Results
Dataset  Scale  Bicubic  LapSRN  SRDenseNet  DSRN  EDSR  FRSR  SRGAN  ENet  NatSR 
Set5  4  28.42/0.8104  31.54/0.8850  32.02/0.8934  31.40/0.8830  32.46/0.8976  32.20/0.8939  29.41/0.8345  28.56/0.8093  30.98/0.8606 
Set14  4  26.00/0.7027  28.19/0.7720  28.50/0.7782  28.07/0.7700  28.71/0.7857  28.54/0.7808  26.02/0.6934  25.67/0.6757  27.42/0.7329 
BSD100  4  25.96/0.6675  27.32/0.7280  27.53/0.7337  27.25/0.7240  27.72/0.7414  27.60/0.7366  25.18/0.6401  24.93/0.6259  26.44/0.6827 
Urban100  4  23.14/0.6577  25.21/0.7560  26.05/0.7819  25.08/0.7470  26.64/0.8029  26.21/0.7904    23.54/0.6926  25.46/0.7602 
Parameters  4    0.8 M  2.0 M  1.2 M  43 M  4.9 M  1.5 M  0.8 M  4.9 M 
In this subsection, we discuss the distortionoriented methods and their results. The overall average PSNR/SSIM results are listed in Table 3, which shows that our FRSR shows comparable or better results compared to the others. The EDSR [23] shows the best result, however, considering the number of parameters shown in the last row of Table 3, our FRSR is also a competent method. As a subexperiment, we also evaluate the FRIQA results on the perceptionoriented methods. Of course, the results are worse than the distortionoriented algorithms, sometimes even worse than the bicubic interpolated images. Nonetheless, ours are slightly nearer to the original image in the pixeldomain than the SRGAN and EnhanceNet.
7.4 NRIQA Results
We assess the methods with the NRIQAs and the results are summarized in Figure 8, which shows the average NIQE and NQSR tested with BSD100. As can be observed, our NatSR is not the best but yields comparable measures to other perceptionoriented methods and the original HR. As expected, one of the stateoftheart distortionoriented methods, EDSR scores the worst in both metrics except for the bicubic interpolation. For NIQE, besides the groundtruth HR, SRGAN scores the best. Our NatSR scores the second best for this metric. For NQSR, SRGAN scores the best among all methods including the HR. Our NatSR ranks lower than SRGAN and ENet, but the scores of all the methods including the HR show a slight difference. Although the NatSR is not the best in both scores, we believe NatSR shows quite consistent results to human visual perception as shown in Figures 1 and 9, by suppressing the noisy and blurry outputs through the NMD cost.
8 Subjective Assessments
8.1 Mean Opinion Score (MOS)
To better assess the perceptual quality of several results, we conduct a mean opinion score (MOS) test with DIV2K validation set [34]. For the fair comparison with recent perceptionoriented methods, SFTGAN [37] is evaluated with proper semantic segmentation mask to generate the best performance. The details are in supplementary material.
8.2 Visual Comparisons
We visualize some results in Figure 1, 9. As shown in Figure 1, our NatSR shows the least distortion compared to other perceptionoriented methods. Also, Figure 9 shows that distortionoriented methods show blurry results while perceptionoriented ones show better image details. However, ENet produces unnatural cartoony scenes, and SFTGAN fails to produce natural details in buildings. More results can be found in supplementary material.
9 Conclusion
In this paper, we have proposed a new approach for SISR which hallucinates natural and realistic textures. First, we start from the modeling of LRHR space and SISR process. From this work, we developed a CNNbased natural manifold discriminator, which enables to narrow the target space into the natural manifold. We have also proposed the SR generator based on the residual dense blocks and fractal residual learning. The loss function is designed such that our network works either as a distortionoriented or perceptionoriented model. From the experiments, it is shown that our distortionoriented network (FRSR) shows considerable gain compared to the models with similar parameters. Also, our perceptionoriented network (NatSR) shows perceptually plausible results compared to others. We expect that with deeper and heavier network for generating better superresolved images and also with better classifier as NMD, our method would bring more naturalness and realistic details. The codes are publicly available at https://github.com/JWSoh/NatSR.
Acknowledgments
This research was financially the Ministry of Trade, Industry, and Energy (MOTIE), Korea, under the “Regional Specialized Industry Development Program(R&D, P0002072)” supervised by the Korea Institute for Advancement of Technology (KIAT).
References

[1]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
S. Ghemawat, G. Irving, M. Isard, et al.
Tensorflow: a system for largescale machine learning.
In OSDI, volume 16, pages 265–283, 2016.  [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.
 [3] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. AlberiMorel. Lowcomplexity singleimage superresolution based on nonnegative neighbor embedding. In BMVC, 2012.

[4]
Y. Blau and T. Michaeli.
The perceptiondistortion tradeoff.
In
Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA
, pages 6228–6237, 2018.  [5] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image superresolution. In European conference on computer vision. Springer, 2014.
 [6] X. Gao, K. Zhang, D. Tao, and X. Li. Image superresolution with sparse neighbor embedding. IEEE Transactions on Image Processing, 21(7):3194–3205, 2012.
 [7] L. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems, pages 262–270, 2015.
 [8] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [9] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
 [10] T. Guo, H. S. Mousavi, T. H. Vu, and V. Monga. Deep wavelet prediction for image superresolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
 [11] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. S. Huang. Image superresolution via dualstate recurrent networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
 [12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
 [13] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two timescale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
 [14] J.B. Huang, A. Singh, and N. Ahuja. Single image superresolution from transformed selfexemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2015.
 [15] Z. Hui, X. Wang, and X. Gao. Fast and accurate single image superresolution via information distillation network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 723–731, 2018.
 [16] J. Johnson, A. Alahi, and L. FeiFei. Perceptual losses for realtime style transfer and superresolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016.
 [17] A. JolicoeurMartineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.
 [18] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image superresolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
 [19] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeplyrecursive convolutional network for image superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
 [20] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [21] W.S. Lai, J.B. Huang, N. Ahuja, and M.H. Yang. Deep laplacian pyramid networks for fast and accurate superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
 [22] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photorealistic single image superresolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
 [23] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image superresolution. In The IEEE conference on computer vision and pattern recognition (CVPR) workshops, 2017.
 [24] C. Ma, C.Y. Yang, X. Yang, and M.H. Yang. Learning a noreference quality metric for singleimage superresolution. Computer Vision and Image Understanding, 158:1–16, 2017.
 [25] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2813–2821. IEEE, 2017.
 [26] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423. IEEE, 2001.
 [27] A. Mittal, R. Soundararajan, and A. C. Bovik. Making a” completely blind” image quality analyzer. IEEE Signal Process. Lett., 20(3):209–212, 2013.
 [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
 [29] M. S. Sajjadi, B. Schölkopf, and M. Hirsch. Enhancenet: Single image superresolution through automated texture synthesis. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 4501–4510. IEEE, 2017.
 [30] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Realtime single image and video superresolution using an efficient subpixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
 [31] J. Sun, Z. Xu, and H.Y. Shum. Image superresolution using gradient profile prior. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
 [32] Y. Tai, J. Yang, and X. Liu. Image superresolution via deep recursive residual network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
 [33] Y.W. Tai, S. Liu, M. S. Brown, and S. Lin. Super resolution using edge prior and single image detail synthesis. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2400–2407. IEEE, 2010.
 [34] R. Timofte, E. Agustsson, L. Van Gool, M.H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, et al. Ntire 2017 challenge on single image superresolution: Methods and results. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1110–1121. IEEE, 2017.
 [35] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast superresolution. In Asian Conference on Computer Vision, pages 111–126. Springer, 2014.
 [36] T. Tong, G. Li, X. Liu, and Q. Gao. Image superresolution using dense skip connections. In 2017 IEEE international conference on computer vision. IEEE, 2017.
 [37] X. Wang, K. Yu, C. Dong, and C. C. Loy. Recovering realistic texture in image superresolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
 [38] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
 [39] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang. Coupled dictionary training for image superresolution. IEEE transactions on image processing, 21(8):3467–3478, 2012.
 [40] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image superresolution via sparse representation. IEEE transactions on image processing, 19(11):2861–2873, 2010.
 [41] R. Zeyde, M. Elad, and M. Protter. On single image scaleup using sparserepresentations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
 [42] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
 [43] J. Zhao, M. Mathieu, and Y. LeCun. Energybased generative adversarial network. arXiv preprint arXiv:1609.03126, 2016.