Unsupervised Degradation Learning for Single Image Super-Resolution

12/11/2018 ∙ by Tianyu Zhao, et al. ∙ 8

Deep Convolution Neural Networks (CNN) have achieved significant performance on single image super-resolution (SR) recently. However, existing CNN-based methods use artificially synthetic low-resolution (LR) and high-resolution (HR) image pairs to train networks, which cannot handle real-world cases since the degradation from HR to LR is much more complex than manually designed. To solve this problem, we propose a real-world LR images-guided bi-cycle network for single image super-resolution, in which the bidirectional structural consistency is exploited to train both the degradation and SR reconstruction networks in an unsupervised way. Specifically, we propose a degradation network to model the real-world degradation process from HR to LR via generative adversarial networks, and these generated realistic LR images paired with real-world HR images are exploited for training the SR reconstruction network, forming the first cycle. Then in the second reverse cycle, consistency of real-world LR images are exploited to further stabilize the training of SR reconstruction and degradation networks. Extensive experiments on both synthetic and real-world images demonstrate that the proposed algorithm performs favorably against state-of-the-art single image SR methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image super-resolution (SISR) aims to restore the high-resolution image from a single low-resolution image counterpart, which has been successfully used in many computer vision applications (

e.g., medical imaging [26], security monitoring [41], and image enhancement [5]). Generally, a low-resolution image can be modeled as

(1)

where is the convolution operation between the HR image and the blur kernel , represents the operation of down-sampling image with scale factor of , and

denotes the Gaussian white noise.

Figure 1: SR result for the real HR image ‘0879’ (DIV2K). We directly reconstruct a higher-resolution image based on a real image. Our model (DNSR) can recover sharper edges and more details compared with other state-of-the-art methods.

Recently, a great number of methods have been proposed to learn the mapping between HR images and LR inputs [7, 8, 13, 25, 18, 20, 39]. Dong et al[6]

propose a CNN based image SR framework (SRCNN), which directly learns an end-to-end mapping to restore the HR image from a LR input by upsampling with bicubic interpolation first. Kim

et al[14]

design a pair of convolutional and nonlinear layers with gradient clipping to speed-up the training process, which outperforms SRCNN with a large margin thanks to stacked small filters and residual learning. Lim

et al[20] present an enhanced residual-block based network (EDSR) without normalization layer, which introduces a multi-scale architecture (MDSR) to handle multiple scales for various SR tasks.

All the afore-mentioned CNN models are trained using synthesized LR images by the matched HR images. However, it is difficult to obtain a realistic LR image by directly down-sampling a HR image. To model LR images in real cases, Zhang et al[36] propose a multiple degradations super-resolution network (SRMD) by taking the degradation maps and LR images as input to jointly consider noise, blur kernel, and down-sampler. Nevertheless, the noise level and blur kernel size are manually predefined, which weakens the ability to handle more general degradations and diverse LR images in real-world.

Since the patterns of real-world LR images and artificially degraded images have different characteristics, the models trained by synthesized LR images may be unpromising when applied to real-world LR images with complex combination of noise and blur, or for the case that LR image are obtained using different down-sampling methods.

To address the above limitations, inspired by the success of generative adversarial networks [9] in image style translating [40], we propose a novel unsupervised cycle super-resolution framework equipped with a degradation model to generate the realistic pattern in LR images, which acts as input of the reconstruction network. In this way, our model is applicable for complex degradation patterns rather than simple interpolations (e.g., bicubic and nearest-neighbor). As shown in Fig. 1, we directly reconstruct a higher-resolution image based on a real HR image with scale factor of 4. Our model can recover sharper edges and more details compared with other state-of-the-art methods (DBPN [10], SRMD [36], SRGAN [18]).

This paper makes the following contributions:

  • We propose an unsupervised learning network which consists of a degradation module and a reconstruction net. The degradation module is learned to generate realistic LR images for the reconstruction net in an unsupervised way.

  • The process of generating LR images does not rely on the widely used down-sampling strategy. We introduce structure perceptual loss in the degradation network to preserve the structural similarity of generated LR images and the corresponding HR images.

  • We develop a novel bi-cycle structure, where one cycle is designed for enforcing structural consistency between the degradation and SR reconstruction networks in an unsupervised way, and the other further stabilizes the training of SR reconstruction and degradation networks.

  • Extensive experiments on benchmark datasets and real-world images demonstrates that the proposed algorithm performs favorably against the state-of-the-art SR methods.

Figure 2: Overview of the proposed DNSR network. For the cycle with blue arrows, given the input HR image , is the generated realistic LR image with the degradation model, based on which the HR image is reconstructed with the reconstruction model. is the loss for reconstruction model. For the cycle with green arrows, given a real-world LR image , is the HR image generated by the reconstruction model and is the generated realistic LR image degraded from

. The degradation discriminator enhances the probability that

is a real LR image. For testing, only reconstruction model is in used, with the input real LR images.

2 Related Works

In this section, we briefly review non-blind SISR and related blind SISR methods.

2.1 Non-Blind SISR

Early methods [2, 19, 37] super-resolve images based on the the interpolation-based theory. However, it is difficult to reconstruct detailed textures in the super-resolved results. Dong et al[7] propose a pioneer 3-layer CNN (SRCNN) for bicubic up-sampled image SR, which then brings outs a series of CNN-based SISR methods with more descent effectiveness and higher efficiency. On one hand, more effective CNN architectures are designed to improve SR performance, including very deep CNN with residual learning  [14] , residual and dense block [18, 20], recursive structure [13, 28] and channel attention [38]

. On the other hand, separate research efforts are paid to speed up computational efficiency, where deep features are extracted from original LR image 

[8, 17, 25]. Taking both effectiveness and efficiency into account, this speed up strategy has also been succesively adopted in  [18, 20, 36, 38].

Recently, SRGAN [18] and ESRGAN [32] introduce perceptual loss and adversarial loss into the reconstruction network. Spatial feature transform  [31] are suggested to enhance texture details for photo-realistic SISR. Furthermore, CinCGAN [34] resorts to unsupervised learning with unpaired data. These methods, however, are all tailored to specific bicubic down-sampling, and usually perform limited on real-world LR images. Although SRMD [36] can handle multiple down-samplers by taking degradation parameters as input, these degradation parameters should be accurately provided, limiting its practical applications.

In contrast, our proposed unsupervised degradation network could effectively model complex down-samplers and degradations learned from real-world LR training samples.

2.2 Blind SISR

Albeit there exist diverse degradations in real SISR applications, blurring is one of the vital aspect in degradation. There are several successive work [30, 22, 24]

to estimate blur kernels from LR images, in which blurring and down-sampling are considered in the degradation model. But these methods rely on hand-crafted image priors and are also limited to diverse degradations. Recently, motivated by CycleGAN 

[40], several deep CNN-based methods are suggested to learn blind SR from unpaired HR-LR images. Yuan et al[34] present a Cycle-in-Cycle network to learn SISR and degradation models, but the degradation model is deterministic, making it limited in generating diverse and real-world LR images.

Closest to ours is the work of Bulat et al. [4] in which the authors learn a high-to-low GAN to degrade and down-sample HR images, and then employ the LR-HR pairs to train a low-to-high GAN for blind SISR. Our method differs from [4] in several important ways. First, both the structural consistency between the LR and HR images, and the relationship between reconstruction and degradation are explored by our bi-cycle structure, which jointly stabilizes the training of SR reconstruction and degradation networks. Second, since there are no pairs of LR-HR images in practice, our degradation model is trained in an unsupervised way, i.e., without using paired images. We introduce unpaired real-world LR images into the GAN model for generating realistic LR images, and also exploit them to enhance the reconstruction model and degradation model jointly in a cycle.

In our bi-cycle degradation network, the bi-cycle consistency of LR images and HR images stabilize the training of both High-to-Low GAN and Low-to-High SR network, further boosting the superior SR performance.

3 Proposed Method

In this section, we present the unsupervised degradation learning for single image super-resolution, which effectively learns to generate LR images with realistic noise and blur patterns. We refer to this framework as Degradation Network for Super-Resolution (DNSR).

3.1 Overview of DNSR

The proposed DNSR network architecture is illustrated in Fig. 2 which consists of the following three models: the degradation module, degradation discriminator, and reconstruction model. The degradation module aims to model the real-world degradation process from HR to LR images, and thus generates realistic LR images. The degradation discriminator is employed to ensure the degraded pattern in generated LR images to be similar to the real case. With the generated realistic LR and the corresponding HR images, the reconstruction model is trained to recover real structures and textures in HR images.

Specifically, given a HR image as input, the degradation model down-samples it into a LR image , accordingly, the reconstruction model tries to recover the corresponding HR image that is approximates . This process is shown as the blue circle in Fig. 2. To fully exploit the real-world LR images , we used them in two ways. First, they are used to train the discriminator to promote the similarity between synthesized LR images and real ones. Second, as shown by the green circle, the real-world LR images are input into the reconstruction model to generate synthesized HR images which in turn act as input into the degradation model to reconstruct the original real-world LR images. This CycleGAN inspired manner further jointly enhances the relationship between the reconstruction model and degradation model.

Different from previous work [4, 34], the LR image generated by our degradation model has no paired manually generated LR image as the ground-truth.

Figure 3:

The architecture of the proposed degradation model (a) and reconstruction model (b). Conv, ReLU and SP indicate the convolution layer composition, activation function, and sub-pixel convolution layer, respectively.

3.2 Degradation Model

To obtain more realistic LR images, we propose to model the mapping process from HR to real-world LR images by jointly using the degradation model and degradation discriminator. The degradation discriminator aims to distinguish whether a LR image generated by the degradation model is close to real-world LR images. Degradation model in turn tries to generate more realistic images to fool the degradation discriminator. Different from the architecture proposed in SRGAN [18], our degradation discriminator enforces the generated LR image be similar to the real-world LR images instead of synthesized LR images.

Figure 3(a) shows the architecture of the proposed degradation model. Specifically, we employ one convolution layer with the ReLU [16]

activation function as the first layer, and 8 residual blocks as the middle layers. We employ a convolution layer instead of the conventional down-sampling method, with the stride size

and for scale factor of 2 and 4, respectively. We set kernel size as , number of filters as for each convolution layer, and stride size as for convolutional layers before the last one. We formulate the the degradation model as

(2)

where is the LR image generated by the degradation model .

The degradation model outputs the LR image which tries to fool the discriminator and thus induces the GAN loss as

(3)

where represents degradation discriminator and is the number of input image patches. In addition, since it is difficult to preserve the structure similarity between the generated LR and HR pair by using only the GAN loss, we introduce a structural perceptual loss [12] to ensure the consistency in structure, and the loss is defined as

(4)

where is the generated realistic LR image by the degradation model, denotes the real HR image, is the maxpooling layer of the pre-trained VGG network [27]. For matching the input size of VGG19 network, and are scaled to the same size.

ESRGAN [32] shows the difference of features obtained by different layers of VGG19 network. The convolution layer before the maxpooling layer representing high-level features. The convolution layer before the maxpooling layer representing low-level features which contains more edges. The features of generated realistic LR image should be more blurry on the edges, close to real-world LR images. So, different from the conventional perceptual loss in ESRGAN , we use the convolution layer before the maxpooling layer as the output of since there is no need to obtain more details for perceiving texture features of LR images in our degradation task.

Figure 4 shows the comparison for different types of degradation with down-sampling scale factor of 4. As shown, the generated LR images by our model contain different pattern of noise and blur compared with those of bicubic and nearest-neighbor degradations.

Figure 4: Comparison on different degradations. The patterns of noise and blur are different between the LR images degraded with our degradation model and those of using bicubic and nearest-neighbor degradations. For all these methods, the down-sampling scale factor is 4.

3.3 Reconstruction Model

The structure of the proposed reconstruction model is demonstrated in Figure 3(b). We set , and for each convolution layer with residual scaling factor 1 for training SR model. For training SR model, we set with residual scaling factor 0.1 [20]. Following [25], we use sub-pixel convolution layer for up-sampling to avoid the checkerboard artifacts [23]. Note that we use the realistic LR images generated by our degradation model as inputs to ensure the reconstruction model can reconstruct HR images from real-world LR images. The reconstruction model is formulated as

(5)

where is the HR image generated by the proposed reconstruction model . To enforce the local smoothness and eliminate artifacts in restored images, we introduce a total variation loss as

(6)

where and are gradients of in terms of horizontal and vertical directions, respectively. In our model, we employ loss for the reconstruction loss and have the following formulation

(7)

where is the ground-truth HR image.

3.4 Degradation and Reconstruction Consistency

To further jointly improve the reconstruction and degradation models, we introduce a cycle consistency loss as shown with the green circle in Figure 2. In this circle, a real-world LR image is taken as the input of our reconstruction model to generate a HR image . Then, the degradation model tries to degrade the generated HR image to a realistic LR image . To ensure the generated realistic LR is similar to the real-world LR image , the cycle consistency loss is formulated as

(8)

To jointly assure these effects mentioned above, the loss for the degradation model is induced as

(9)

where and are tradeoff factors.

Considering the cycle consistency, the loss for the reconstruction model is induced as

(10)

where and are tradeoff factors. Accordingly, for the proposed DNSR model, we should optimize the following objective function

(11)

Dateset
Scale Bicubic LapSRN DBPN SRMD SRGAN ESRGAN DNSR


Set5
2 33.65 / 0.930 37.52 / 0.959 38.09 / 0.960 37.53 / 0.959 37.22 / 0.926 37.81 / 0.953 38.05 / 0.961

4 28.42 / 0.810 31.54 / 0.885 31.75 / 0.898 31.59 / 0.887 29.40 / 0.847 31.40 / 0.871 31.76 / 0.891
Set14 2 30.34 / 0.870 33.08 / 0.913 33.85 / 0.919 33.12 / 0.914 32.14 / 0.886 33.62 / 0.915 33.83 / 0.922

4 26.00 / 0.703 28.19 / 0.772 28.34 / 0.775 28.15 / 0.772 26.64 / 0.710 27.98 / 0.762 28.33 / 0.776
Urban100 2 26.88 / 0.841 30.41 / 0.910 33.02 / 0.931 31.33 / 0.920 31.02 / 0.895 32.01 / 0.913 32.99 / 0.928

4 23.14 / 0.658 25.21 / 0.756 25.68 / 0.785 25.34 / 0.761 25.11 / 0.725 25.31 / 0.756 25.69 / 0.788
BSD100 2 29.56 / 0.844 31.80 / 0.895 32.27 / 0.900 32.05 / 0.898 31.89 / 0.876 31.99 / 0.887 32.24 / 0.901

4 25.96 / 0.668 27.32 / 0.728 27.64 / 0.740 27.34 / 0.728 25.16 / 0.668 27.21 / 0.712 27.61 / 0.742
DIV2K 2 31.01 / 0.939 34.35 / 0.942 34.82 / 0.947 34.73 / 0.940 33.51 / 0.939 33.69 / 0.941 34.83 / 0.944

4 26.66 / 0.852 28.75 / 0.859 28.94 / 0.869 28.72 / 0.856 28.09 / 0.821 28.68 / 0.853 28.87 / 0.865


Table 1: SR results for bicubic degradation in terms of PSNR (dB) and SSIM. The values in red and blue indicate the best and second performances, respectively.

Dateset
Scale LapSRN DBPN SRMD SRGAN ESRGAN DNSR


Set5
2 26.23 / 0.826 26.12 / 0.813 26.18 / 0.819 26.19 / 0.806 22.56 / 0.697 26.25 / 0.828

4 22.34 / 0.716 22.15 / 0.680 22.28 / 0.712 21.79 / 0.713 21.53 / 0.479 22.37 / 0.718
Set14 2 25.19 / 0.779 25.15 / 0.777 25.17 / 0.778 25.16 / 0.763 21.45 / 0.649 25.21 / 0.782

4 21.62 / 0.657 21.56 / 0.651 21.57 / 0.654 21.02 / 0.587 17.12 / 0.361 21.65 / 0.661
Urban100 2 21.18 / 0.715 20.99 / 0.703 21.12 / 0.712 20.94 / 0.698 17.47 / 0.583 21.22 / 0.719

4 16.97 / 0.455 16.37 / 0.439 16.95 / 0.438 16.03 / 0.398 12.65 / 0.204 17.01 / 0.457
BSD100 2 24.13 / 0.725 24.02 / 0.718 24.11 / 0.726 23.87 / 0.705 20.15 / 0.624 24.19 / 0.732

4 19.01 / 0.483 18.53 / 0.467 18.85 / 0.474 18.29 / 0.421 13.90 / 0.183 19.06 / 0.486
DIV2K 2 26.88 / 0.814 26.16 / 0.798 26.89 / 0.818 26.25 / 0.789 21.56 / 0.661 26.91 / 0.826

4 22.13 / 0.579 21.65 / 0.569 22.25 / 0.587 21.41 / 0.531 15.54 / 0.216 22.27 / 0.593


Table 2: SR results for nearest-neighbor degradation in terms of PSNR (dB) and SSIM. The values in red and blue indicate the best and second performances, respectively.

4 Experiments

4.1 Training Data

We train the proposed DNSR with unpaired real-world HR and LR images. Specifically, the HR images are from the DIV2K dataset (with 800 training images) [1] and Flickr2K (with 2650 training images) dataset from flickr.com, while we collect the low-quality images from the dataset of Widerface [33], which consists of various LR images of human urban life with unknown degradation and noise. We select 1600 real-world LR images from Widerface, and randomly crop each real-world LR image with the same size as the generated realistic LR image instead of manually scaling it, which will preserves the original characteristic of real-world LR images.

4.2 Training Details

As shown in Figure 2, the training process of our algorithm can be divided into three subproblems which are trained iteratively. First, we train the degradation model and degradation discriminator with real-world HR images and LR images . For computing , we scale and to the size of 224 224, which is the input size of the first layer of VGG19 network. Second, we train the reconstruction model using the generated realistic LR images . Finally, we take the real-world image as the input of reconstruction model, and subsequently the generated HR image will be degraded to realistic LR image , which is enforced to be similar to the real-world LR image . For the parameters in Eq. (9) and Eq. (10), we set = 1, = 0.5, = 1, and = 0.01. The minibatch size is set to 16 and HR image size is set to pixels. The size of LR images depends on the scale factor, which is set to 2 and 4, respectively. The learning rate is initialized as and decreased by a factor of 2 every minibatch updates for total

iteration. We optimize the total loss function

with ADAM optimizer [15] by setting = 0.9 and weight decay to .

We implement the proposed method with TensorFlow platform on NVIDIA TITAN X GPUs, and it takes about 2 days to train our model with the scale factor of 2.

4.3 Evaluation of Bicubic Degradation

Although our main goal is to learn a reconstruction model that can deal with real-world image super-resolution, it is difficult to obtain the ground-truth HR images for evaluating the results. Therefore, to verify the effectiveness of our method, we first compare our reconstruction model with other CNN-based SISR methods, which are specifically designed for super-resolution based on bicubic degradation. The experiments are conducted on five benchmark datasets including Set5 [3], Set14 [35], Urban100 [11], BSD100 [21], and DIV2K (with 100 validation images) [1]. Each image is down-sampled by bicubic degradation with scale factors of 2 and 4. Table 1 presents the quantitative results of ours and 5 state-of-the-art methods, including LapSRN[17], DBPN [10], SRMD [36], SRGAN [18], and ESRGAN [32]. As shown, our model can obtain competitive performance to DBPN, which is the winner of NTIRE2018 [29] on classic bicubic track and designed exactly for bicubic degradation.

4.4 Evaluation of Nearest-Neighbor Degradation

To further evaluate the effectiveness of the proposed method for LR images obtained by different degradation way, we generated LR images by nearest-neighbor degradation with scale factors of 2 and 4, and evaluate our method on the five benchmark datasets. Table 2 shows the average performance in terms of PSNR and SSIM. When compared with LapSRN [17], DBPN [10], SRMD [36], SRGAN [18] and ESRGAN [32], our DNSR achieves the best performance on all datasets. This As shown in Fig. 5, although bicubic/bilinear/nearest-neighbor interpolations obtains higher PSNR than the state-of-the-art SISR methods, the results contains obvious blurry edges and textures. There are obvious artifacts around the edge of objects for the high-resolution images reconstructed by DBPN, LapSRN and SRMD. Although ESRGAN generates more realistic and natural textures than SRGAN on the degradation of bicubic, the textures generated tend to be unreal for different types of degradations. In contrast, there are fewer artifacts and sharper textures in the super-resolved images generated by our algorithm.

Figure 5: Qualitative comparison ( SR) between the proposed DNSR model and other state-of-the-arts on nearest-neighbor down-sampled images. The values in red and blue indicate the best and second performances, respectively. It is observed that there are fewer artifacts in the images reconstructed by our model.

4.5 Evaluation of Real Images

Figure 6: SR results on real-world images of chip and cat. It is observed that there are sharper edges and fewer artifacts in the images reconstructed by our model.

In this section, we evaluate our reconstruction model on the real-world LR image chip (with pixels) and cat (with pixels). Similar to the image shown in Fig. 1, both the high-resolution image and the degradation pattern for chip and cat are unknown, which makes the task rather challenging. As shown in Fig. 6, we compare our method with Bicubic (as ground-truth), DBPN [10], ESRGAN [32] and SRMD [36]. For real-world image chip, our DNSR recovers sharper edges of characters. The format of real-world LR image cat is ‘jpg’, which consists of various artifacts with unknown degradation and noise. As shown in Fig. 6 (g)-(i), the model trained with synthetic images generate more artifacts at the edge of the cat’s beard. Benefiting from training with generated realistic LR images, our reconstruction model performs better than the compared ones in restoring sharper edges and less artifacts.

5 Analysis and Discussion

5.1 Ablation Study

Figure 7: Three models used in our ablation experiments. RM, DM and D indicate reconstruction model, degradation model and degradation discriminator, respectively. The definitions for the notations used here are the same with those in Fig. 2.
Figure 8: SR results on ‘0882’ (DIV2K) and ‘img_095’ (BSD100) with different frameworks shown in Fig. 7. Our DNSR outperforms the other 3 frameworks with the sharper edge and more realistic color.
Figure 9: Quantitative evaluations of several SR methods on the Set5 dataset for SR with different noise level.

To thoroughly investigate the effectiveness of the proposed method, we conduct ablation experiments by removing specific components for comparison, which induces three different framework as shown in Fig. 7.

The first framework, i.e., DNSR w/o DM, trains reconstruction model using bicubic degraded LR images , which is corresponding to only minimizing the reconstruction loss (without ) in Eq. (10). It is observed that the results corresponding DNSR w/o DM still have some artifacts at edges comparing with the results generated by our reconstruction model, as shown in Fig. 8(b) and (e). For the second framework, i.e., DNSR w/o D, which removes degradation discriminator model and trains a reconstruction model using realistic LR images generated by the degradation model and the real-world LR images , it minimizes the total loss (without ) in Eq. (11). The results is obviously unpromising as shown in Fig. 8(c). Because it is difficult for the degradation model to generate realistic LR images without the degradation discriminator, the reconstruction model performs unpromising. For the third framework, i.e., DNSR w/o C, which removes the circle and trains the reconstruction model using . We minimize the total loss (without both in and ). As shown in Fig. 8(d), the generated HR images are rather different from the original images in color, which validates the importance of introducing the cycle training strategy to guarantee the consistency across both reconstruction model and degradation model.

5.2 Robustness to Noise

Our proposed DNSR is robust to noisy images. To evaluate the robustness of DNSR, we down-sample the test images and randomly add Gaussian noise with noise level from 1% to 7% on the Set5 dataset. Fig. 9 shows quantitative results of some state-of-the-art methods on the test dataset with scale factor of 4. As the noise level increases, the performances of these methods decrease with different extents. It is observed that ESRGAN is rather sensitive to noise. The possible reason is that ESRGAN aims to generate realistic images with emphasizing more on textures and less on noise, which makes the noise is strengthened as textures. Due to the unsupervised degradation network for generating realistic LR images, our method performs much better with increased noise level.

5.3 Running Time

Platform MATLAB PyTorch TensorFlow
Method LapSRN SRMD DBPN ESRGAN SRGAN DNSR
Time 8.433 0.697 21.64 3.31 0.657 0.645
Table 3: Average running time (in second) on the dataset DIV2K.

Table 3 shows the time cost for different methods on the benchmark dataset DIV2K. We test all the methods on NVIDIA TITAN X GPUs with the scale factor of 4. Though there are some differences in terms of computing ability between the implementation platforms (i.e. Matlab, PyTorch, and TensorFlow), it can be observed that our method achieves rather promising performance in terms of efficiency.

6 Conclusion

In this paper, we propose an unsupervised degradation network for single image super-resolution, which does not need paired manually generated low-resolution images as the ground-truth. The proposed method jointly learns to generated realistic low-resolution images with a degradation module and super-resolve high-resolution images with the reconstruction network based on generated realistic low-resolution images, which endows with the proposed DNSR the ability of super-resolving real-world low-resolution images. With extensive experiments, our model outperforms the state-of-the-art algorithms for reconstructing real-world images. In the future, we will design more powerful network and employ more diverse data to further investigate the potential of performance promotion.

References

  • [1] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPR, 2017.
  • [2] J. Allebach and P. W. Wong. Edge-directed interpolation. In ICIP, 1996.
  • [3] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. BMVC, 2012.
  • [4] A. Bulat, J. Yang, and G. Tzimiropoulos. To learn image super-resolution, use a gan to learn how to do image degradation first. In ECCV, 2018.
  • [5] D. Capel and A. Zisserman. Super-resolution enhancement of text image sequences. In PR, 2000.
  • [6] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014.
  • [7] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 2016.
  • [8] C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In ECCV, 2016.
  • [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
  • [10] M. Haris, G. Shakhnarovich, and N. Ukita. Deep backprojection networks for super-resolution. In CVPR, 2018.
  • [11] J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
  • [12] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.
  • [13] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In CVPR, 2016.
  • [14] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
  • [15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  • [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
  • [17] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate superresolution. In CVPR, 2017.
  • [18] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
  • [19] X. Li and M. T. Orchard. New edge-directed interpolation. TIP, 2001.
  • [20] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In CVPR, 2017.
  • [21] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
  • [22] T. Michaeli and M. Irani. Nonparametric blind super-resolution. In ICCV, 2013.
  • [23] A. Odena, V. Dumoulin, and C. Olah. Deconvolution and checkerboard artifacts. Distill, 2016.
  • [24] W.-Z. Shao and M. Elad. Simple, accurate, and robust nonparametric blind super-resolution. In ICIG, 2015.
  • [25] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016.
  • [26] W. Shi, J. Caballero, C. Ledig, X. Zhuang, W. Bai, K. Bhatia, A. M. S. M. de Marvao, T. Dawes, D. O’Regan, and D. Rueckert. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch. In MICCAI, 2013.
  • [27] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
  • [28] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In CVPR, 2017.
  • [29] R. Timofte, S. Gu, J. Wu, and L. Van Gool. Ntire 2018 challenge on single image super-resolution: Methods and results. In CVPR, 2018.
  • [30] Q. Wang, X. Tang, and H. Shum. Patch based blind image super resolution. In ICCV, 2005.
  • [31] X. Wang, K. Yu, C. Dong, and C. C. Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR, 2018.
  • [32] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, and X. Tang. Esrgan: Enhanced super-resolution generative adversarial networks. In ECCV, 2018.
  • [33] S. Yang, P. Luo, C.-C. Loy, and X. Tang.

    Wider face: A face detection benchmark.

    In CVPR, 2016.
  • [34] Y. Yuan et al. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In CVPR, 2018.
  • [35] R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International conference on curves and surfaces, 2010.
  • [36] K. Zhang, W. Zuo, and L. Zhang. Learning a single convolutional super-resolution network for multiple degradations. In CVPR, 2018.
  • [37] L. Zhang and X. Wu. An edge-guided image interpolation algorithm via directional filtering and data fusion. TIP, 2006.
  • [38] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu. Image super-resolution using very deep residual channel attention networks. In ECCV, 2018.
  • [39] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image super-resolution. In CVPR, 2018.
  • [40] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networkss.

    In ICCV, 2017.
  • [41] W. W. Zou and P. C. Yuen. Very low resolution face recognition problem. TIP, 2012.