Single image super-resolution (SISR) aims to restore the high-resolution image from a single low-resolution image counterpart, which has been successfully used in many computer vision applications (e.g., medical imaging , security monitoring , and image enhancement ). Generally, a low-resolution image can be modeled as
where is the convolution operation between the HR image and the blur kernel , represents the operation of down-sampling image with scale factor of , and
denotes the Gaussian white noise.
propose a CNN based image SR framework (SRCNN), which directly learns an end-to-end mapping to restore the HR image from a LR input by upsampling with bicubic interpolation first. Kimet al. 
design a pair of convolutional and nonlinear layers with gradient clipping to speed-up the training process, which outperforms SRCNN with a large margin thanks to stacked small filters and residual learning. Limet al.  present an enhanced residual-block based network (EDSR) without normalization layer, which introduces a multi-scale architecture (MDSR) to handle multiple scales for various SR tasks.
All the afore-mentioned CNN models are trained using synthesized LR images by the matched HR images. However, it is difficult to obtain a realistic LR image by directly down-sampling a HR image. To model LR images in real cases, Zhang et al.  propose a multiple degradations super-resolution network (SRMD) by taking the degradation maps and LR images as input to jointly consider noise, blur kernel, and down-sampler. Nevertheless, the noise level and blur kernel size are manually predefined, which weakens the ability to handle more general degradations and diverse LR images in real-world.
Since the patterns of real-world LR images and artificially degraded images have different characteristics, the models trained by synthesized LR images may be unpromising when applied to real-world LR images with complex combination of noise and blur, or for the case that LR image are obtained using different down-sampling methods.
To address the above limitations, inspired by the success of generative adversarial networks  in image style translating , we propose a novel unsupervised cycle super-resolution framework equipped with a degradation model to generate the realistic pattern in LR images, which acts as input of the reconstruction network. In this way, our model is applicable for complex degradation patterns rather than simple interpolations (e.g., bicubic and nearest-neighbor). As shown in Fig. 1, we directly reconstruct a higher-resolution image based on a real HR image with scale factor of 4. Our model can recover sharper edges and more details compared with other state-of-the-art methods (DBPN , SRMD , SRGAN ).
This paper makes the following contributions:
We propose an unsupervised learning network which consists of a degradation module and a reconstruction net. The degradation module is learned to generate realistic LR images for the reconstruction net in an unsupervised way.
The process of generating LR images does not rely on the widely used down-sampling strategy. We introduce structure perceptual loss in the degradation network to preserve the structural similarity of generated LR images and the corresponding HR images.
We develop a novel bi-cycle structure, where one cycle is designed for enforcing structural consistency between the degradation and SR reconstruction networks in an unsupervised way, and the other further stabilizes the training of SR reconstruction and degradation networks.
Extensive experiments on benchmark datasets and real-world images demonstrates that the proposed algorithm performs favorably against the state-of-the-art SR methods.
2 Related Works
In this section, we briefly review non-blind SISR and related blind SISR methods.
2.1 Non-Blind SISR
Early methods [2, 19, 37] super-resolve images based on the the interpolation-based theory. However, it is difficult to reconstruct detailed textures in the super-resolved results. Dong et al.  propose a pioneer 3-layer CNN (SRCNN) for bicubic up-sampled image SR, which then brings outs a series of CNN-based SISR methods with more descent effectiveness and higher efficiency. On one hand, more effective CNN architectures are designed to improve SR performance, including very deep CNN with residual learning  , residual and dense block [18, 20], recursive structure [13, 28] and channel attention 
. On the other hand, separate research efforts are paid to speed up computational efficiency, where deep features are extracted from original LR image[8, 17, 25]. Taking both effectiveness and efficiency into account, this speed up strategy has also been succesively adopted in [18, 20, 36, 38].
Recently, SRGAN  and ESRGAN  introduce perceptual loss and adversarial loss into the reconstruction network. Spatial feature transform  are suggested to enhance texture details for photo-realistic SISR. Furthermore, CinCGAN  resorts to unsupervised learning with unpaired data. These methods, however, are all tailored to specific bicubic down-sampling, and usually perform limited on real-world LR images. Although SRMD  can handle multiple down-samplers by taking degradation parameters as input, these degradation parameters should be accurately provided, limiting its practical applications.
In contrast, our proposed unsupervised degradation network could effectively model complex down-samplers and degradations learned from real-world LR training samples.
2.2 Blind SISR
to estimate blur kernels from LR images, in which blurring and down-sampling are considered in the degradation model. But these methods rely on hand-crafted image priors and are also limited to diverse degradations. Recently, motivated by CycleGAN, several deep CNN-based methods are suggested to learn blind SR from unpaired HR-LR images. Yuan et al.  present a Cycle-in-Cycle network to learn SISR and degradation models, but the degradation model is deterministic, making it limited in generating diverse and real-world LR images.
Closest to ours is the work of Bulat et al.  in which the authors learn a high-to-low GAN to degrade and down-sample HR images, and then employ the LR-HR pairs to train a low-to-high GAN for blind SISR. Our method differs from  in several important ways. First, both the structural consistency between the LR and HR images, and the relationship between reconstruction and degradation are explored by our bi-cycle structure, which jointly stabilizes the training of SR reconstruction and degradation networks. Second, since there are no pairs of LR-HR images in practice, our degradation model is trained in an unsupervised way, i.e., without using paired images. We introduce unpaired real-world LR images into the GAN model for generating realistic LR images, and also exploit them to enhance the reconstruction model and degradation model jointly in a cycle.
In our bi-cycle degradation network, the bi-cycle consistency of LR images and HR images stabilize the training of both High-to-Low GAN and Low-to-High SR network, further boosting the superior SR performance.
3 Proposed Method
In this section, we present the unsupervised degradation learning for single image super-resolution, which effectively learns to generate LR images with realistic noise and blur patterns. We refer to this framework as Degradation Network for Super-Resolution (DNSR).
3.1 Overview of DNSR
The proposed DNSR network architecture is illustrated in Fig. 2 which consists of the following three models: the degradation module, degradation discriminator, and reconstruction model. The degradation module aims to model the real-world degradation process from HR to LR images, and thus generates realistic LR images. The degradation discriminator is employed to ensure the degraded pattern in generated LR images to be similar to the real case. With the generated realistic LR and the corresponding HR images, the reconstruction model is trained to recover real structures and textures in HR images.
Specifically, given a HR image as input, the degradation model down-samples it into a LR image , accordingly, the reconstruction model tries to recover the corresponding HR image that is approximates . This process is shown as the blue circle in Fig. 2. To fully exploit the real-world LR images , we used them in two ways. First, they are used to train the discriminator to promote the similarity between synthesized LR images and real ones. Second, as shown by the green circle, the real-world LR images are input into the reconstruction model to generate synthesized HR images which in turn act as input into the degradation model to reconstruct the original real-world LR images. This CycleGAN inspired manner further jointly enhances the relationship between the reconstruction model and degradation model.
3.2 Degradation Model
To obtain more realistic LR images, we propose to model the mapping process from HR to real-world LR images by jointly using the degradation model and degradation discriminator. The degradation discriminator aims to distinguish whether a LR image generated by the degradation model is close to real-world LR images. Degradation model in turn tries to generate more realistic images to fool the degradation discriminator. Different from the architecture proposed in SRGAN , our degradation discriminator enforces the generated LR image be similar to the real-world LR images instead of synthesized LR images.
activation function as the first layer, and 8 residual blocks as the middle layers. We employ a convolution layer instead of the conventional down-sampling method, with the stride sizeand for scale factor of 2 and 4, respectively. We set kernel size as , number of filters as for each convolution layer, and stride size as for convolutional layers before the last one. We formulate the the degradation model as
where is the LR image generated by the degradation model .
The degradation model outputs the LR image which tries to fool the discriminator and thus induces the GAN loss as
where represents degradation discriminator and is the number of input image patches. In addition, since it is difficult to preserve the structure similarity between the generated LR and HR pair by using only the GAN loss, we introduce a structural perceptual loss  to ensure the consistency in structure, and the loss is defined as
where is the generated realistic LR image by the degradation model, denotes the real HR image, is the maxpooling layer of the pre-trained VGG network . For matching the input size of VGG19 network, and are scaled to the same size.
ESRGAN  shows the difference of features obtained by different layers of VGG19 network. The convolution layer before the maxpooling layer representing high-level features. The convolution layer before the maxpooling layer representing low-level features which contains more edges. The features of generated realistic LR image should be more blurry on the edges, close to real-world LR images. So, different from the conventional perceptual loss in ESRGAN , we use the convolution layer before the maxpooling layer as the output of since there is no need to obtain more details for perceiving texture features of LR images in our degradation task.
Figure 4 shows the comparison for different types of degradation with down-sampling scale factor of 4. As shown, the generated LR images by our model contain different pattern of noise and blur compared with those of bicubic and nearest-neighbor degradations.
3.3 Reconstruction Model
The structure of the proposed reconstruction model is demonstrated in Figure 3(b). We set , and for each convolution layer with residual scaling factor 1 for training SR model. For training SR model, we set with residual scaling factor 0.1 . Following , we use sub-pixel convolution layer for up-sampling to avoid the checkerboard artifacts . Note that we use the realistic LR images generated by our degradation model as inputs to ensure the reconstruction model can reconstruct HR images from real-world LR images. The reconstruction model is formulated as
where is the HR image generated by the proposed reconstruction model . To enforce the local smoothness and eliminate artifacts in restored images, we introduce a total variation loss as
where and are gradients of in terms of horizontal and vertical directions, respectively. In our model, we employ loss for the reconstruction loss and have the following formulation
where is the ground-truth HR image.
3.4 Degradation and Reconstruction Consistency
To further jointly improve the reconstruction and degradation models, we introduce a cycle consistency loss as shown with the green circle in Figure 2. In this circle, a real-world LR image is taken as the input of our reconstruction model to generate a HR image . Then, the degradation model tries to degrade the generated HR image to a realistic LR image . To ensure the generated realistic LR is similar to the real-world LR image , the cycle consistency loss is formulated as
To jointly assure these effects mentioned above, the loss for the degradation model is induced as
where and are tradeoff factors.
Considering the cycle consistency, the loss for the reconstruction model is induced as
where and are tradeoff factors. Accordingly, for the proposed DNSR model, we should optimize the following objective function
|2||33.65 / 0.930||37.52 / 0.959||38.09 / 0.960||37.53 / 0.959||37.22 / 0.926||37.81 / 0.953||38.05 / 0.961|
||4||28.42 / 0.810||31.54 / 0.885||31.75 / 0.898||31.59 / 0.887||29.40 / 0.847||31.40 / 0.871||31.76 / 0.891|
|Set14||2||30.34 / 0.870||33.08 / 0.913||33.85 / 0.919||33.12 / 0.914||32.14 / 0.886||33.62 / 0.915||33.83 / 0.922|
||4||26.00 / 0.703||28.19 / 0.772||28.34 / 0.775||28.15 / 0.772||26.64 / 0.710||27.98 / 0.762||28.33 / 0.776|
|Urban100||2||26.88 / 0.841||30.41 / 0.910||33.02 / 0.931||31.33 / 0.920||31.02 / 0.895||32.01 / 0.913||32.99 / 0.928|
||4||23.14 / 0.658||25.21 / 0.756||25.68 / 0.785||25.34 / 0.761||25.11 / 0.725||25.31 / 0.756||25.69 / 0.788|
|BSD100||2||29.56 / 0.844||31.80 / 0.895||32.27 / 0.900||32.05 / 0.898||31.89 / 0.876||31.99 / 0.887||32.24 / 0.901|
||4||25.96 / 0.668||27.32 / 0.728||27.64 / 0.740||27.34 / 0.728||25.16 / 0.668||27.21 / 0.712||27.61 / 0.742|
|DIV2K||2||31.01 / 0.939||34.35 / 0.942||34.82 / 0.947||34.73 / 0.940||33.51 / 0.939||33.69 / 0.941||34.83 / 0.944|
||4||26.66 / 0.852||28.75 / 0.859||28.94 / 0.869||28.72 / 0.856||28.09 / 0.821||28.68 / 0.853||28.87 / 0.865|
|2||26.23 / 0.826||26.12 / 0.813||26.18 / 0.819||26.19 / 0.806||22.56 / 0.697||26.25 / 0.828|
||4||22.34 / 0.716||22.15 / 0.680||22.28 / 0.712||21.79 / 0.713||21.53 / 0.479||22.37 / 0.718|
|Set14||2||25.19 / 0.779||25.15 / 0.777||25.17 / 0.778||25.16 / 0.763||21.45 / 0.649||25.21 / 0.782|
||4||21.62 / 0.657||21.56 / 0.651||21.57 / 0.654||21.02 / 0.587||17.12 / 0.361||21.65 / 0.661|
|Urban100||2||21.18 / 0.715||20.99 / 0.703||21.12 / 0.712||20.94 / 0.698||17.47 / 0.583||21.22 / 0.719|
||4||16.97 / 0.455||16.37 / 0.439||16.95 / 0.438||16.03 / 0.398||12.65 / 0.204||17.01 / 0.457|
|BSD100||2||24.13 / 0.725||24.02 / 0.718||24.11 / 0.726||23.87 / 0.705||20.15 / 0.624||24.19 / 0.732|
||4||19.01 / 0.483||18.53 / 0.467||18.85 / 0.474||18.29 / 0.421||13.90 / 0.183||19.06 / 0.486|
|DIV2K||2||26.88 / 0.814||26.16 / 0.798||26.89 / 0.818||26.25 / 0.789||21.56 / 0.661||26.91 / 0.826|
||4||22.13 / 0.579||21.65 / 0.569||22.25 / 0.587||21.41 / 0.531||15.54 / 0.216||22.27 / 0.593|
4.1 Training Data
We train the proposed DNSR with unpaired real-world HR and LR images. Specifically, the HR images are from the DIV2K dataset (with 800 training images)  and Flickr2K (with 2650 training images) dataset from flickr.com, while we collect the low-quality images from the dataset of Widerface , which consists of various LR images of human urban life with unknown degradation and noise. We select 1600 real-world LR images from Widerface, and randomly crop each real-world LR image with the same size as the generated realistic LR image instead of manually scaling it, which will preserves the original characteristic of real-world LR images.
4.2 Training Details
As shown in Figure 2, the training process of our algorithm can be divided into three subproblems which are trained iteratively. First, we train the degradation model and degradation discriminator with real-world HR images and LR images . For computing , we scale and to the size of 224 224, which is the input size of the first layer of VGG19 network. Second, we train the reconstruction model using the generated realistic LR images . Finally, we take the real-world image as the input of reconstruction model, and subsequently the generated HR image will be degraded to realistic LR image , which is enforced to be similar to the real-world LR image . For the parameters in Eq. (9) and Eq. (10), we set = 1, = 0.5, = 1, and = 0.01. The minibatch size is set to 16 and HR image size is set to pixels. The size of LR images depends on the scale factor, which is set to 2 and 4, respectively. The learning rate is initialized as and decreased by a factor of 2 every minibatch updates for total
iteration. We optimize the total loss functionwith ADAM optimizer  by setting = 0.9 and weight decay to .
We implement the proposed method with TensorFlow platform on NVIDIA TITAN X GPUs, and it takes about 2 days to train our model with the scale factor of 2.
4.3 Evaluation of Bicubic Degradation
Although our main goal is to learn a reconstruction model that can deal with real-world image super-resolution, it is difficult to obtain the ground-truth HR images for evaluating the results. Therefore, to verify the effectiveness of our method, we first compare our reconstruction model with other CNN-based SISR methods, which are specifically designed for super-resolution based on bicubic degradation. The experiments are conducted on five benchmark datasets including Set5 , Set14 , Urban100 , BSD100 , and DIV2K (with 100 validation images) . Each image is down-sampled by bicubic degradation with scale factors of 2 and 4. Table 1 presents the quantitative results of ours and 5 state-of-the-art methods, including LapSRN, DBPN , SRMD , SRGAN , and ESRGAN . As shown, our model can obtain competitive performance to DBPN, which is the winner of NTIRE2018  on classic bicubic track and designed exactly for bicubic degradation.
4.4 Evaluation of Nearest-Neighbor Degradation
To further evaluate the effectiveness of the proposed method for LR images obtained by different degradation way, we generated LR images by nearest-neighbor degradation with scale factors of 2 and 4, and evaluate our method on the five benchmark datasets. Table 2 shows the average performance in terms of PSNR and SSIM. When compared with LapSRN , DBPN , SRMD , SRGAN  and ESRGAN , our DNSR achieves the best performance on all datasets. This As shown in Fig. 5, although bicubic/bilinear/nearest-neighbor interpolations obtains higher PSNR than the state-of-the-art SISR methods, the results contains obvious blurry edges and textures. There are obvious artifacts around the edge of objects for the high-resolution images reconstructed by DBPN, LapSRN and SRMD. Although ESRGAN generates more realistic and natural textures than SRGAN on the degradation of bicubic, the textures generated tend to be unreal for different types of degradations. In contrast, there are fewer artifacts and sharper textures in the super-resolved images generated by our algorithm.
4.5 Evaluation of Real Images
In this section, we evaluate our reconstruction model on the real-world LR image chip (with pixels) and cat (with pixels). Similar to the image shown in Fig. 1, both the high-resolution image and the degradation pattern for chip and cat are unknown, which makes the task rather challenging. As shown in Fig. 6, we compare our method with Bicubic (as ground-truth), DBPN , ESRGAN  and SRMD . For real-world image chip, our DNSR recovers sharper edges of characters. The format of real-world LR image cat is ‘jpg’, which consists of various artifacts with unknown degradation and noise. As shown in Fig. 6 (g)-(i), the model trained with synthetic images generate more artifacts at the edge of the cat’s beard. Benefiting from training with generated realistic LR images, our reconstruction model performs better than the compared ones in restoring sharper edges and less artifacts.
5 Analysis and Discussion
5.1 Ablation Study
To thoroughly investigate the effectiveness of the proposed method, we conduct ablation experiments by removing specific components for comparison, which induces three different framework as shown in Fig. 7.
The first framework, i.e., DNSR w/o DM, trains reconstruction model using bicubic degraded LR images , which is corresponding to only minimizing the reconstruction loss (without ) in Eq. (10). It is observed that the results corresponding DNSR w/o DM still have some artifacts at edges comparing with the results generated by our reconstruction model, as shown in Fig. 8(b) and (e). For the second framework, i.e., DNSR w/o D, which removes degradation discriminator model and trains a reconstruction model using realistic LR images generated by the degradation model and the real-world LR images , it minimizes the total loss (without ) in Eq. (11). The results is obviously unpromising as shown in Fig. 8(c). Because it is difficult for the degradation model to generate realistic LR images without the degradation discriminator, the reconstruction model performs unpromising. For the third framework, i.e., DNSR w/o C, which removes the circle and trains the reconstruction model using . We minimize the total loss (without both in and ). As shown in Fig. 8(d), the generated HR images are rather different from the original images in color, which validates the importance of introducing the cycle training strategy to guarantee the consistency across both reconstruction model and degradation model.
5.2 Robustness to Noise
Our proposed DNSR is robust to noisy images. To evaluate the robustness of DNSR, we down-sample the test images and randomly add Gaussian noise with noise level from 1% to 7% on the Set5 dataset. Fig. 9 shows quantitative results of some state-of-the-art methods on the test dataset with scale factor of 4. As the noise level increases, the performances of these methods decrease with different extents. It is observed that ESRGAN is rather sensitive to noise. The possible reason is that ESRGAN aims to generate realistic images with emphasizing more on textures and less on noise, which makes the noise is strengthened as textures. Due to the unsupervised degradation network for generating realistic LR images, our method performs much better with increased noise level.
5.3 Running Time
Table 3 shows the time cost for different methods on the benchmark dataset DIV2K. We test all the methods on NVIDIA TITAN X GPUs with the scale factor of 4. Though there are some differences in terms of computing ability between the implementation platforms (i.e. Matlab, PyTorch, and TensorFlow), it can be observed that our method achieves rather promising performance in terms of efficiency.
In this paper, we propose an unsupervised degradation network for single image super-resolution, which does not need paired manually generated low-resolution images as the ground-truth. The proposed method jointly learns to generated realistic low-resolution images with a degradation module and super-resolve high-resolution images with the reconstruction network based on generated realistic low-resolution images, which endows with the proposed DNSR the ability of super-resolving real-world low-resolution images. With extensive experiments, our model outperforms the state-of-the-art algorithms for reconstructing real-world images. In the future, we will design more powerful network and employ more diverse data to further investigate the potential of performance promotion.
-  E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPR, 2017.
-  J. Allebach and P. W. Wong. Edge-directed interpolation. In ICIP, 1996.
-  M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. BMVC, 2012.
-  A. Bulat, J. Yang, and G. Tzimiropoulos. To learn image super-resolution, use a gan to learn how to do image degradation first. In ECCV, 2018.
-  D. Capel and A. Zisserman. Super-resolution enhancement of text image sequences. In PR, 2000.
-  C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014.
-  C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 2016.
-  C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In ECCV, 2016.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
-  M. Haris, G. Shakhnarovich, and N. Ukita. Deep backprojection networks for super-resolution. In CVPR, 2018.
-  J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
-  J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.
-  J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In CVPR, 2016.
-  J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
-  D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-  W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate superresolution. In CVPR, 2017.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
-  X. Li and M. T. Orchard. New edge-directed interpolation. TIP, 2001.
-  B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In CVPR, 2017.
-  D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
-  T. Michaeli and M. Irani. Nonparametric blind super-resolution. In ICCV, 2013.
-  A. Odena, V. Dumoulin, and C. Olah. Deconvolution and checkerboard artifacts. Distill, 2016.
-  W.-Z. Shao and M. Elad. Simple, accurate, and robust nonparametric blind super-resolution. In ICIG, 2015.
-  W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016.
-  W. Shi, J. Caballero, C. Ledig, X. Zhuang, W. Bai, K. Bhatia, A. M. S. M. de Marvao, T. Dawes, D. O’Regan, and D. Rueckert. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch. In MICCAI, 2013.
-  K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
-  Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In CVPR, 2017.
-  R. Timofte, S. Gu, J. Wu, and L. Van Gool. Ntire 2018 challenge on single image super-resolution: Methods and results. In CVPR, 2018.
-  Q. Wang, X. Tang, and H. Shum. Patch based blind image super resolution. In ICCV, 2005.
-  X. Wang, K. Yu, C. Dong, and C. C. Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR, 2018.
-  X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, and X. Tang. Esrgan: Enhanced super-resolution generative adversarial networks. In ECCV, 2018.
S. Yang, P. Luo, C.-C. Loy, and X. Tang.
Wider face: A face detection benchmark.In CVPR, 2016.
-  Y. Yuan et al. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In CVPR, 2018.
-  R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International conference on curves and surfaces, 2010.
-  K. Zhang, W. Zuo, and L. Zhang. Learning a single convolutional super-resolution network for multiple degradations. In CVPR, 2018.
-  L. Zhang and X. Wu. An edge-guided image interpolation algorithm via directional filtering and data fusion. TIP, 2006.
-  Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu. Image super-resolution using very deep residual channel attention networks. In ECCV, 2018.
-  Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image super-resolution. In CVPR, 2018.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros.
Unpaired image-to-image translation using cycle-consistent adversarial networkss.In ICCV, 2017.
-  W. W. Zou and P. C. Yuen. Very low resolution face recognition problem. TIP, 2012.