Natural and Realistic Single Image Super-Resolution with Explicit Natural Manifold Discrimination

11/09/2019 ∙ by Jae Woong Soh, et al. ∙ Seoul National University 13

Recently, many convolutional neural networks for single image super-resolution (SISR) have been proposed, which focus on reconstructing the high-resolution images in terms of objective distortion measures. However, the networks trained with objective loss functions generally fail to reconstruct the realistic fine textures and details that are essential for better perceptual quality. Recovering the realistic details remains a challenging problem, and only a few works have been proposed which aim at increasing the perceptual quality by generating enhanced textures. However, the generated fake details often make undesirable artifacts and the overall image looks somewhat unnatural. Therefore, in this paper, we present a new approach to reconstructing realistic super-resolved images with high perceptual quality, while maintaining the naturalness of the result. In particular, we focus on the domain prior properties of SISR problem. Specifically, we define the naturalness prior in the low-level domain and constrain the output image in the natural manifold, which eventually generates more natural and realistic images. Our results show better naturalness compared to the recent super-resolution algorithms including perception-oriented ones.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image super-resolution (SISR) is a classical image restoration problem which aims to recover a high-resolution (HR) image from the corresponding low-resolution (LR) image. In SISR problems, the given image is usually assumed to be a low-pass filtered and downsampled version of an HR image. Hence, recovering the HR is an ill-posed problem since multiple HR images can correspond to one LR image. That is, the SISR is a challenging one-to-many problem which attracted researchers to find many interesting solutions and applications, and thus numerous algorithms have been proposed so far.

Recently, convolutional neural networks (CNNs) have shown great success in most computer vision areas including the SISR. In typical CNN-based SISR methods, the distortion-oriented loss functions are considered. Specifically, the CNNs attempt to achieve higher peak-signal-to-noise ratio (PSNR),

i.e., low distortion in terms of mean squared error (MSE). There have been lots of distortion-oriented CNNs for SISR [5, 18, 30, 19, 21, 32, 36, 23, 42, 15, 11], and the performance of SISR is ever increasing as many researchers are still creating innovative architectures and also as the possible depth and connections of the networks are growing. However, they yield somewhat blurry results and do not recover the fine details even with very deep and complex networks. It is because the distortion-oriented models’ results are the average of possible HR images.

To resolve the above-stated issues, perception-oriented models have also been proposed for obtaining better perceptual quality HR images. For some examples, the perceptual loss was introduced in [16], which is defined as the distance in the feature domain. More recently, SRGAN [22] and EnhanceNet [29] have been proposed for producing better perceptual quality. The SRGAN employed generative models, particularly the generative adversarial nets (GAN) [8], and adopted the perceptual loss. The EnhanceNet added an additional texture loss [7] for better texture reconstruction. However, they sometimes generate unpleasant and unnatural artifacts along with the reconstructed details.

There have also been some methods that consider the naturalness of super-resolved images. One of these approaches is to implicitly supervise the naturalness through the refined dataset. Specifically, as the CNN is very sensitive to the training dataset, several methods [23, 42] considered using the refined dataset. For example, patches with low gradient magnitudes are discarded from the training dataset, which provides better naturalness implicitly. This approach might increase the PSNR performance by constraining the possible HR space to the rich-textured one. Another approach is to provide explicit supervision by conditioning the feature spaces. For example, the recently developed SFT-GAN [37] has shown great perceptual quality by constraining the features with its high-level semantics while adopting the adversarial loss. However, its practical usage is limited because it requires the categorical prior, and also it is limited to the categories which are included in the training process. For the out-of-category inputs, this framework is the same as SRGAN [22]. Moreover, SFT-GAN strongly relies on the ability of the adopted semantic segmentation method because the wrong designation of semantics might cause worse perceptual quality.

For obtaining realistic and natural perceptual quality HR images, we propose a new SISR approach which constrains the low-level domain prior instead of high-level semantics. For this, we first investigate the process and the domain knowledge of SISR. By exploiting the domain knowledge, we explicitly model the HR space of corresponding LR image, and build a discriminator which determines the decision boundary between the natural manifold and unnatural manifold. By constraining the output image into the natural manifold, our generative model can target only one of the multi-modal outputs in the desired target space. As a results, our method shows less artifacts than other perception-oriented methods as shown in Figure 1.

In summary, the main contributions of this paper are as follows.

  • We model the SISR problem explicitly and investigate the desirable HR space.

  • We design a CNN-based natural manifold discriminator and show our model is reasonable.

  • We adopt a CNN structure with fractal residual learning (FRL) and demonstrate a distortion-oriented model named fractal residual super-resolution (FRSR), which achieves comparable results to recent CNNs.

  • We propose a perception-oriented SISR method named as natural and realistic super-resolution (NatSR), which generates realistic textures and natural details effectively while achieving high perceptual quality.

The rest of this paper is organized as follows. In Sec. 3, we explicitly model the LR-HR space and the SISR problem, and investigate its inherent properties. Then in Sec. 4, we divide the target HR space into three disjoint sets where two sets are in the unnatural manifold and the one is in the natural manifold. In Sec. 5, we demonstrate our main method and the NatSR, and in Sec. 6 we discuss and analyze the feasibility in several ways. The experimental results are shown in Sec. 7.

2 Related Work

2.1 Single Image Super-Resolution

The conventional non-CNN methods mainly focused on the domain and feature priors. Early methods explored the domain priors to predict missing pixels. For example, interpolation methods such as bicubic and Lanczos generate the HR pixels by the weighted average of neighboring LR pixels. Later, the priors such as edge feature, gradient feature 

[33, 31] and internal non-local similarity [14] were investigated. Also, dictionary learning sparse coding methods were exploited for the SISR [40, 6, 39, 35]. Recently, it has been shown that CNN-based methods outperform the earlier non-CNN algorithms, showing great breakthrough in accuracy. These CNN-based methods implicitly adopt image and domain priors which are inscribed in training datasets. The SRCNN [5] was the first CNN-based method which uses three convolution layers, and many other works with deeper and heavier structure have been proposed afterward [18, 30, 19, 32, 21, 36, 23, 42, 15, 11]. All these methods are discriminative and distortion-oriented approaches, which aim to achieve higher PSNR.

2.2 Perception Oriented Super-Resolution

The problem of distortion-oriented models recently drew the attention of researchers that the super-resolved results often lack the high-frequency details and are not perceptually satisfying. Also, Blau et al[4] showed that there is a trade-off between the perceptual quality and distortion, and some perception-oriented models have been proposed accordingly. For example, Johnson et al. [16] have shown that the loss in the pixel domain is not optimal for the perceptual quality, and instead, the loss in the feature space might be closer to the human perception model. Then, Ledig et al. [22] introduced the SRGAN which adopted the generative model with GAN [8] and employed the perceptual loss as in [16]. Hence, unlike the distortion-oriented methods that produce the average of possible HR images, the SRGAN generates one of the candidates in the multi-modal target HR space. EnhanceNet [29] goes one step further by exploiting the texture loss [7] for better producing image details. However, due to the inherent property of one-to-many inverse problem, it is required to consider the semantics for the generated pixels. In this respect, SFT-GAN [37] restricts the feature space by conditioning the semantic categories of target pixels.

3 Modeling the SISR

(a) HR
(b) LR
(c) Blurry HR
(d) Noisy HR
Figure 2:

A simple explanation of LR-HR relationship and SISR in the frequency domain.

In this section, we explicitly define and model the LR-HR space and the SISR problem. First of all, let us define the LR image as the low-pass filtered and downsampled HR image . Formally, the LR-HR relation is described as

(1)

where denotes a low-pass filter and denotes downsampling. Figure 1(a) and Figure 1(b) show a simple explanation of HR and LR correspondence in the frequency domain where we assume that the spatial domain is infinite. Both Figure 1(c) and Figure 1(d) are possible HRs for the corresponding LR in Figure 1(b), and moreover, there can be infinite number of possible HRs that have the same low frequency components but different high-frequency parts (denoted noisy in Figure 1(d)). As the SISR is to find an HR for the given LR, it is usually modeled as finding the conditional likelihood . Due to its one-to-many property, it is better to model it as a generative model rather than a discriminative one.

4 Natural Manifold Discrimination

4.1 Designing Natural Manifold

Figure 3: Our proposed LR-HR model of the natural manifold and its discrimination for SISR. is the image space, is the possible HR space, and and are three disjoint sets of . and control the boundary between the manifolds.

We now go into the real situation to find the natural manifold. Figure 3 shows our LR-HR image space modeling, where is the overall image set with height , width , and channel with the normalized pixel value. For a certain , is the space whose elements all results into the same by the low-pass filtering and downsampling. Conversely, an LR image is mapped to an element in by any SR method. We may also interpret the early CNNs with our LR-HR model. For the distortion-oriented models, the output is the average of the elements in the HR space, i.e., where , for some and weights , and thus the result is blurry. To alleviate this problem, some methods [23, 42] refined the training set. Specifically, they discarded the training patches with low gradient magnitudes, which gives implicit constraints on the candidate ’s to keep the resulting outputs away from the blurry images.

To model the natural manifold, we divide into three disjoint sets as illustrated in Figure 3. The first one is the blurry set , the elements of which are modeled as the convex combination of interpolated LR and the original HR. Specifically, the set is defined as

(2)

where is the same low-pass filter as in eq. (1), and denotes upsampling with zero insertion between original values. Hence, corresponds to Figure 1(c) which also means the interpolation of to the size of . Also, the is a hyper-parameter which decides the decision boundary between the set and , i.e., between the Figure 1(c) and Figure 1(a). We can easily show that the defined above is also an element of , i.e., . To be specific, if we apply low-pass filtering and downsampling to the , it becomes an LR as follows:

(3)
(4)
(5)
(6)
(7)
(8)

Hence, from eq.(1), it is shown that . In other words, the weighted sum of Figure 1(c) and Figure 1(a) is of course in the .

The second set to consider is the noisy set , which contains the images like Figure 1(d). Specifically, we can model the set as:

(9)

where

is the noise in the high-frequency, with standard deviation

. We can also see that , because

(10)
(11)
(12)
(13)
(14)

Also, can be interpreted as the convex combination of and (weighted sum of Figure 1(a) and Figure 1(d)), because

(15)
(16)
(17)

where .

(a) DCT.
(b) DCT.
Figure 4: DCT coefficients of bicubic up/downsampling kernels for the scaling factor of .

The blurry and noisy are used for training our natural manifold discriminator that will be explained in the next subsection. In practice, we perform the noise injection in the frequency domain using 2D-discrete cosine transform (DCT). We set the low-pass filter for up/downsampling in eq.(1) and eq.(2) as the bicubic filter, and its DCT is shown in Figure 4. To generate a wide range of noisy images, we inject the noise into the last column and row. In the experiments, we use the 2D-DCT for brevity.

4.2 Natural Manifold Discriminator

To narrow the target space to the natural manifold, we design a discriminator that differentiates the natural image (the elements that belong to as in Figure 3) from the blurry/noisy ones ( or

). For this, we design a CNN-based classifier that discriminates

(natural manifold) and (unnatural manifold), which will be called natural manifold discriminator (NMD). The training is performed with the sigmoid binary cross entropy loss function defined as

(18)

where denotes the output sigmoid value of NMD. For the expectation, we use the empirical mean of the training dataset. The network architecture of our NMD is shown in Figure 5, which is a simple VGG-style CNN. Fully-connected layers for the last stage is not used in our case. Instead, one convolution layer and global average pooling are used.

Figure 5: Our NMD network architecture.

For the training, we start from and . We update both hyper-parameters according to the average of 10 validation accuracies (AVA). When it reaches above , we update and following the rules below:

if (19)
(20)
if (21)
(22)

We stop training with the final and equal to and , respectively.

5 Natural and Realistic Super-Resolution

In this section, we explain the proposed natural and realistic super-resolution (NatSR) generator model and the training loss function.

5.1 Network Architecture

Figure 6: Our NatSR network architecture. We adopt fractal residual learning for mid- and long-path skip connection and employ the residual dense block (RDBlock) for short-path connection.
Figure 7: Residual Dense Block (RDBlock) that we employ for our NatSR.

The overall architecture of our NatSR is shown in Figure 6, which takes the a the input and generates the SR output. As shown in the figure, our network is based on residual learning, which has long been used as a basic skill to mitigate the degradation problem in very deep networks. Typically, two types of residual learnings are used: local residual learning (LRL) which bypasses the input to the output in a local range [12], and global residual learning (GRL) which provides the skip-connection between the input and the output in a global scale of the network [18]. Former approaches [18, 10] have shown that learning the sparse features is much more effective than learning the pixel domain values directly. Hence, recent models adopt both local residual learning (short-path) and global residual learning (long-path) [22, 23, 42].

Inspired by former studies, we adopt a connection scheme shown in Figure 6, named as fractal residual learning (FRL) structure in that the connection has a fractal pattern. Also, as a basic building block of our NatSR, we employ the residual dense block (RDBlock) [42] shown in Figure 7, and adopt the residual scaling [23] in our RDBlock. By using the FRL and RDBlock, all from short- to long-path skip-connection can be employed.

As a discriminator for GAN, we apply a similar network architecture as NMD. Instead of using only convolution layers, we adopt spectral normalization [28]

to make the discriminator satisfy Lipschitz condition. Also, we use strided convolutions instead of max-pooling layers. Specific architecture details are provided in the

supplementary material.

5.2 Training Loss Function

5.2.1 Reconstruction Loss

To model the , we adopt the pixel-wise reconstruction loss, specifically the mean absolute error (MAE) between the ground-truths and the super-resolved images:

(23)

where denotes the super-resolved output. Although all the perception-oriented models apply perceptual losses, we do not adopt such losses, because it is found that the perceptual loss causes undesirable artifacts in our experiments. To boost high-frequency details, we instead use our NMD as a solution.

5.2.2 Naturalness Loss

We design the naturalness loss based on our pre-trained natural manifold discriminator (NMD). To concentrate the target manifold within the natural manifold, the output of NMD should be nearly . We may use the loss as a negative of the sigmoid output, but we use its log-scale to boost the gradients:

(24)

where denotes the output sigmoid value of NMD.

5.2.3 Adversarial Loss

As it is well-known that GANs are hard to train and unstable, there have been lots of variations of GANs [43, 2, 9, 25, 17]. Recently, GAN with relativistic discriminator has been proposed [17], which shows quite robust results with standard GAN [8] in generating fake images in terms of Fréchet Inception Distance [13]. Thus, we employ RaGAN for our adversarial training, which is described as:

(25)
(26)

where and are distributions of HR and SR respectively, and mean real and fake data respectively, and

(27)
(28)

where

denotes the output logit of discriminator. In our case, the motivation of RaGAN discriminator is to measure “the probability that the given image is closer to real HR images than the generated SR images on average.”

5.2.4 Overall Loss

The overall loss term to train our NatSR is defined as the weighted sum of loss terms defined above:

(29)

As our baseline, we train the distortion-oriented model where , which means that the overall loss is just the reconstruction loss . We name our baseline model as fractal residual super-resolution network (FRSR). For our NatSR which is perception-oriented, we use the full loss above with , and .

6 Discussion and Analysis

6.1 Effectiveness of Proposed Discriminator

To demonstrate the meaning and effectiveness of our NMD, we test the NMD scores for the perception-oriented methods such as SRGAN variants [22], EnhanceNet, NatSR, and also for the distortion-oriented methods including our FRSR. Table 1 shows the results on BSD100 [26], where the NMD is designed to output score 1 when the input image is close to the natural original image, and output lower score when the input is blurry or noisy. We can see that previous perception-oriented methods score between and which means that they lie near the boundary of the natural and unnatural manifold in our LR-HR model. Also, the original HR scores and bicubic interpolation scores , which means that our NMD discriminates HR and LR with high confidence. Additionally, SRResNet, EDSR, and our FRSR, which are distortion-oriented, score almost . We may interpret the result that the distortion-oriented methods produce the image which also lie on the blurry manifold. On the other hand, our NatSR results in the scores close to which is much higher than the other perception-oriented algorithms. In summary, it is believed that our model of natural manifold and NMD are reasonable, and the NMD well discriminates the natural and unnatural manifold.

Method NMD Score
HR
Bicubic
SRResNet
EDSR
FRSR (Ours)
SRGAN-MSE
SRGAN-VGG22
SRGAN-VGG54
EnhanceNet-PAT
NatSR (Ours)
Table 1: Results of NMD score.

6.2 Study on the Plausibility of SR Images

As we approach the SISR by interpreting the input and output images in our LR-HR space model, we analyze the plausibility of super-resolved images of various methods according to our model. The super-resolved images must lie on the set in Figure 3, which means that the downsampling of a super-resolved image must be in the LR space, i.e., it must be similar to the input LR image as

(30)

For the analysis, we show the RGB-PSNR between and in Table 2 which are tested on Set5 [3]. The results are in the ascending order of SRGAN, EnhanceNet, and our NatSR. Even though we do not give any constraints on the LR space, our NatSR results mostly lie on the feasible set . On the other hand, SRGAN result is about dB, which means that the SRGAN barely reflects the LR-HR properties.

Method RGB-PSNR (dB)
SRGAN
ENet-PAT
NatSR
Table 2: Results of RGB-PSNR between LR input and downsampled SR image in LR domain.

7 Experimental Results

7.1 Implementation details

We train both NMD and NatSR (including FRSR) with recently released DIV2K [34] dataset which consists of high-quality (2K resolution) training images, validation images, and test images. The size of the input LR patch is set to , and we only train with scaling factor . ADAM optimizer [20] is used for training with the initial learning rate of

, and halved once during the training. We implement our code with Tensorflow

[1]. For the test, we evaluate our model with famous SISR benchmarks: Set5 [3], Set14 [41], BSD100 [26], and Urban100 [14].

7.2 Evaluation Metrics and Comparisons

For the evaluation of distortion-oriented models, popular FR-IQA (full reference image quality assessment), PSNR and SSIM (structure similarity) [38] are used. But since these measures are not appropriate for measuring the quality of perceptual models, we use one of the recently proposed NR-IQA (no reference image quality assessment) called NQSR [24] which is for SISR and well-known for Ma et al.’s score. Additionally, another NR-IQA, NIQE [27] is used to measure the naturalness of images. The higher NQSR and the lower NIQE mean the better perceptual quality. However, it is questionable whether so many variants of NR-IQA methods perfectly reflect the human perceptual quality. Hence, we need to use the NR-IQA results just for rough reference.

We compare our FRSR with other distortion-oriented methods such as LapSRN, SRDenseNet, DSRN, and EDSR [21, 36, 11, 23], and compare our NatSR with other perception-oriented ones such as SRGAN, ENet, and SFT-GAN [22, 29, 37] (We denote SRGAN-VGG54 as SRGAN and EnhanceNet-PAT as ENet for short).

7.3 FR-IQA Results

Dataset Scale Bicubic LapSRN SRDenseNet DSRN EDSR FRSR SRGAN ENet NatSR
Set5 4 28.42/0.8104 31.54/0.8850 32.02/0.8934 31.40/0.8830 32.46/0.8976 32.20/0.8939 29.41/0.8345 28.56/0.8093 30.98/0.8606
Set14 4 26.00/0.7027 28.19/0.7720 28.50/0.7782 28.07/0.7700 28.71/0.7857 28.54/0.7808 26.02/0.6934 25.67/0.6757 27.42/0.7329
BSD100 4 25.96/0.6675 27.32/0.7280 27.53/0.7337 27.25/0.7240 27.72/0.7414 27.60/0.7366 25.18/0.6401 24.93/0.6259 26.44/0.6827
Urban100 4 23.14/0.6577 25.21/0.7560 26.05/0.7819 25.08/0.7470 26.64/0.8029 26.21/0.7904 - 23.54/0.6926 25.46/0.7602
Parameters 4 - 0.8 M 2.0 M 1.2 M 43 M 4.9 M 1.5 M 0.8 M 4.9 M
Table 3: FR-IQA results. The average PSNR/SSIM values on benchmarks. Red color indicates the best results, and the blue indicates the second best.

In this subsection, we discuss the distortion-oriented methods and their results. The overall average PSNR/SSIM results are listed in Table 3, which shows that our FRSR shows comparable or better results compared to the others. The EDSR [23] shows the best result, however, considering the number of parameters shown in the last row of Table 3, our FRSR is also a competent method. As a sub-experiment, we also evaluate the FR-IQA results on the perception-oriented methods. Of course, the results are worse than the distortion-oriented algorithms, sometimes even worse than the bicubic interpolated images. Nonetheless, ours are slightly nearer to the original image in the pixel-domain than the SRGAN and EnhanceNet.

7.4 NR-IQA Results

(a) NIQE in ascending order.
(b) NQSR in descending order.
Figure 8: NR-IQA results in the sorted order (left: NIQE [27], and right: NQSR [24]). The best is at the top and the worst is at the bottom.Our NatSR result is highlighted with darker color.

We assess the methods with the NR-IQAs and the results are summarized in Figure 8, which shows the average NIQE and NQSR tested with BSD100. As can be observed, our NatSR is not the best but yields comparable measures to other perception-oriented methods and the original HR. As expected, one of the state-of-the-art distortion-oriented methods, EDSR scores the worst in both metrics except for the bicubic interpolation. For NIQE, besides the ground-truth HR, SRGAN scores the best. Our NatSR scores the second best for this metric. For NQSR, SRGAN scores the best among all methods including the HR. Our NatSR ranks lower than SRGAN and ENet, but the scores of all the methods including the HR show a slight difference. Although the NatSR is not the best in both scores, we believe NatSR shows quite consistent results to human visual perception as shown in Figures 1 and 9, by suppressing the noisy and blurry outputs through the NMD cost.

HR
Bicubic
EDSR
FRSR (Ours)
ENet
NatSR (Ours)
Figure 9: Visualized results on “img031” of Urban100.

8 Subjective Assessments

8.1 Mean Opinion Score (MOS)

To better assess the perceptual quality of several results, we conduct a mean opinion score (MOS) test with DIV2K validation set [34]. For the fair comparison with recent perception-oriented methods, SFT-GAN [37] is evaluated with proper semantic segmentation mask to generate the best performance. The details are in supplementary material.

8.2 Visual Comparisons

We visualize some results in Figure 1, 9. As shown in Figure 1, our NatSR shows the least distortion compared to other perception-oriented methods. Also, Figure 9 shows that distortion-oriented methods show blurry results while perception-oriented ones show better image details. However, ENet produces unnatural cartoony scenes, and SFT-GAN fails to produce natural details in buildings. More results can be found in supplementary material.

9 Conclusion

In this paper, we have proposed a new approach for SISR which hallucinates natural and realistic textures. First, we start from the modeling of LR-HR space and SISR process. From this work, we developed a CNN-based natural manifold discriminator, which enables to narrow the target space into the natural manifold. We have also proposed the SR generator based on the residual dense blocks and fractal residual learning. The loss function is designed such that our network works either as a distortion-oriented or perception-oriented model. From the experiments, it is shown that our distortion-oriented network (FRSR) shows considerable gain compared to the models with similar parameters. Also, our perception-oriented network (NatSR) shows perceptually plausible results compared to others. We expect that with deeper and heavier network for generating better super-resolved images and also with better classifier as NMD, our method would bring more naturalness and realistic details. The codes are publicly available at https://github.com/JWSoh/NatSR.

Acknowledgments

This research was financially the Ministry of Trade, Industry, and Energy (MOTIE), Korea, under the “Regional Specialized Industry Development Program(R&D, P0002072)” supervised by the Korea Institute for Advancement of Technology (KIAT).

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al.

    Tensorflow: a system for large-scale machine learning.

    In OSDI, volume 16, pages 265–283, 2016.
  • [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.
  • [3] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
  • [4] Y. Blau and T. Michaeli. The perception-distortion tradeoff. In

    Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA

    , pages 6228–6237, 2018.
  • [5] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In European conference on computer vision. Springer, 2014.
  • [6] X. Gao, K. Zhang, D. Tao, and X. Li. Image super-resolution with sparse neighbor embedding. IEEE Transactions on Image Processing, 21(7):3194–3205, 2012.
  • [7] L. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems, pages 262–270, 2015.
  • [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • [9] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
  • [10] T. Guo, H. S. Mousavi, T. H. Vu, and V. Monga. Deep wavelet prediction for image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
  • [11] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. S. Huang. Image super-resolution via dual-state recurrent networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [13] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
  • [14] J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2015.
  • [15] Z. Hui, X. Wang, and X. Gao. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 723–731, 2018.
  • [16] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016.
  • [17] A. Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.
  • [18] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
  • [19] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
  • [20] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [21] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate superresolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  • [22] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  • [23] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE conference on computer vision and pattern recognition (CVPR) workshops, 2017.
  • [24] C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang. Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding, 158:1–16, 2017.
  • [25] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2813–2821. IEEE, 2017.
  • [26] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423. IEEE, 2001.
  • [27] A. Mittal, R. Soundararajan, and A. C. Bovik. Making a” completely blind” image quality analyzer. IEEE Signal Process. Lett., 20(3):209–212, 2013.
  • [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
  • [29] M. S. Sajjadi, B. Schölkopf, and M. Hirsch. Enhancenet: Single image super-resolution through automated texture synthesis. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 4501–4510. IEEE, 2017.
  • [30] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
  • [31] J. Sun, Z. Xu, and H.-Y. Shum. Image super-resolution using gradient profile prior. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
  • [32] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  • [33] Y.-W. Tai, S. Liu, M. S. Brown, and S. Lin. Super resolution using edge prior and single image detail synthesis. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2400–2407. IEEE, 2010.
  • [34] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, et al. Ntire 2017 challenge on single image super-resolution: Methods and results. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1110–1121. IEEE, 2017.
  • [35] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision, pages 111–126. Springer, 2014.
  • [36] T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In 2017 IEEE international conference on computer vision. IEEE, 2017.
  • [37] X. Wang, K. Yu, C. Dong, and C. C. Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  • [38] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  • [39] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang. Coupled dictionary training for image super-resolution. IEEE transactions on image processing, 21(8):3467–3478, 2012.
  • [40] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-resolution via sparse representation. IEEE transactions on image processing, 19(11):2861–2873, 2010.
  • [41] R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
  • [42] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  • [43] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016.