Mapping Low-Resolution Images To Multiple High-Resolution Images Using Non-Adversarial Mapping

06/21/2020
by   Vasileios Lioutas, et al.
1

Several methods have recently been proposed for the Single Image Super-Resolution (SISR) problem. The current methods assume that a single low-resolution image can only yield a single high-resolution image. In addition, all of these methods use low-resolution images that were artificially generated through simple bilinear down-sampling. We argue that, first and foremost, the problem of SISR is an one-to-many mapping problem between the low resolution and all possible candidate high-resolution images and we address the challenging task of learning how to realistically degrade and down-sample high-resolution images. To circumvent this problem, we propose SR-NAM which utilizes the Non-Adversarial Mapping (NAM) technique. Furthermore, we propose a degradation model that learns how to transform high-resolution images to low-resolution images that resemble realistically taken low-resolution photos. Finally, some qualitative results for the proposed method along with the weaknesses of SR-NAM are included.

READ FULL TEXT VIEW PDF

page 3

page 5

page 6

page 7

page 8

page 9

02/12/2021

A Generative Model for Hallucinating Diverse Versions of Super Resolution Images

Traditionally, the main focus of image super-resolution techniques is on...
03/04/2020

Creating High Resolution Images with a Latent Adversarial Generator

Generating realistic images is difficult, and many formulations for this...
01/03/2017

Learning a Mixture of Deep Networks for Single Image Super-Resolution

Single image super-resolution (SR) is an ill-posed problem which aims to...
12/10/2020

Image Matching with Scale Adjustment

In this paper we address the problem of matching two images with two dif...
10/10/2019

Breathing deformation model – application to multi-resolution abdominal MRI

Dynamic MRI is a technique of acquiring a series of images continuously ...
07/30/2022

Resolution enhancement of placenta histological images using deep learning

In this study, a method has been developed to improve the resolution of ...
11/11/2021

Super-resolving Dark Matter Halos using Generative Deep Learning

Generative deep learning methods built upon Convolutional Neural Network...

1. Introduction

The Single Image Super-Resolution (SISR), a technique for restoring a visually pleasing high-resolution (HR) image from its low-resolution (LR) version, is still a challenging task within computer vision research community

(Caballero et al., 2016; Dong et al., 2015; Kappeler et al., 2016; Kim et al., 2015; Ledig et al., 2016; Liu et al., 2017; Sajjadi et al., 2016; Shi et al., 2016; Tao et al., 2017). Since multiple solutions exist for the mapping from LR to HR space, SISR is highly ill-posed and a variety of algorithms, especially the current leading learning-based methods are proposed to address this problem.

Understanding what the SISR problem represents is crucial in order to develop a method that is capable of solving it. Having a low resolution image at inference time means that there is no ground truth answer on how the high resolution counterpart image is generated. That being said, in order to recover a higher resolution image, assumptions need to be made that do not violate the visible artifacts taken from the low resolution image. The fine details added to the higher resolution image are subjective since they only need to follow certain already visible artifacts from the low resolution image. The task in SISR is to find a model that learns how to make these assumptions and generate high resolution images as plausible as possible according to the specific task that is being undertaken like, face SISR. To this day, all current solutions for the SISR problem attempt to reconstruct a single high resolution image based on a given low resolution input image. In other words, the process of generating a high resolution image is deterministic and given the same low resolution image as input multiple times will yield the same high resolution image.

In this paper, we argue that a method for solving the SISR problem should yield multiple high resolution candidates for the same low resolution image and we propose an approach to solve this problem. Specifically, the proposed SR-NAM method is an unsupervised method of mapping high resolutions images to a given low resolution image. The advantage of this method over others is that it is fast and it requires to optimize only a single representation. This representation attempts to match pre-trained fixed knowledge of both the high resolution image space as well as the degradation method. To the best of our knowledge, all previous works on SISR, degraded the high resolution images artificially using down-sampling methods such as bilinear and bicubic algorithms in order to create a dataset of high resolution images and the associated low resolution image. Usually, these methods do not perform well when they are used with real-world low resolution images as shown in (Shocher et al., 2017; Bulat et al., 2018). In contrast to these approaches, following the work from (Bulat et al., 2018) we propose to use a degradation model to generate a low resolution image from a high resolution image that visually appears to be taken from a low quality camera.

Figure 1. Overall proposed architecture and training pipeline for the degradation model. Image taken from (Bulat et al., 2018).

2. Related Work

2.1. Image Super-Resolution

The problem of SISR has been widely studied. Early approaches either rely on natural image statistics (Kim and Kwon, 2010; Zhang et al., 2010) or predefined models (M. Irani and S. Peleg (1991); R. Fattal (2007); 18). Later, mapping functions between LR images and HR images are investigated, such as sparse coding based SR methods (Zeyde et al., 2012; Yang et al., 2010).

Recently, deep convolution neural networks (CNN) have been shown to be powerful and capable of improving the quality of SR results

(Zhang et al., 2019; Ahn et al., 2018; Park et al., 2018; Xintao Wang and Loy, 2018; Tong et al., 2017). It needs to be highlighted, that all the aforementioned image super-resolution methods can be applied to all types of images and hence do not incorporate face-specific information, as proposed in our work.

Face Super-Resolution. There are many works in the literature focusing specifically on applying SISR techniques to face images. The recent works of (Yu and Porikli, 2016; Yu et al., 2018; Bulat et al., 2018; Bulat and Tzimiropoulos, 2017b) use a GAN-based approach. Other works like (Cao et al., 2017a)

used Reinforcement Learning to learn to progressively attend on specific parts of a face image in order to restore them in a sequence procedure. Some other methods

(Chen et al., 2017) work with introducing facial prior knowledge that could be leveraged for better super-resolving face images. The method of (Zhu et al., 2016) performs super-resolution and dense landmark localization in an alternating manner which is shown to improve the quality of the super-resolved faces.

2.2. Unsupervised domain alignment

Due to the rise of generative adversarial networks (GANs), unsupervised translation across different domains began to generate strong results. All of the state-of-the-art unsupervised translation methods employ GAN technique. The most popular extension to the traditional GAN approach is the use of the cycle-consistency which enforces the generated samples mapped between the two domains to be the same. This approach is widely used by DiscoGAN (Kim et al., 2017), CycleGAN (Zhu et al., 2017) and DualGAN (Yi et al., 2017). Recently, StarGAN (Choi et al., 2017) extended the approach to more than two domains. Our work is built upon Non-Adversarial Mapping (NAM) method (Hoshen and Wolf, 2018) and the details are described on Section 3.4.

3. Super-Resolution using NAM

As mentioned in Section 1, we propose two main models: the degradation model and the Super-Resolution NAM model. The degradation model is designed to take as input a HR image and produce an output of it as a realistically taken LR image. The SR-NAM model then uses the pre-trained generator and degradation model to learn to infer with no supervision the predicted HR image.

3.1. Datasets

This section describes the HR and LR datasets used during training and testing. In order to train the degradation model, a dataset with real-world LR images is needed. Searching the literature, the only available dataset that fulfills the requirements is the one described in (Bulat et al., 2018). Thus, we decided to contact the authors in order to get access to the exact subset of data that they used on their research.

HR dataset. Following the (Bulat et al., 2018), the High Resolution (HR) image dataset is composed of 182,866 face images of size 6464. The authors of the dataset aimed to create a dataset that is as balanced as possible in terms of facial poses. The dataset is a combination of subsets of 4 popular face datasets. Specifically, from Celeb-A (Liu et al., 2014), they randomly selected 60,000 faces. Additionaly, they used the whole AFLW (Köstinger et al., 2011) dataset. Finally, a subset of the LS3D-W (Bulat and Tzimiropoulos, 2017a) and VGGFace2 (Cao et al., 2017b) datasets is been used. The dataset includes many face images of various poses, illuminations, expressions and occlusions.

LR dataset. The authors in (Bulat et al., 2018) created a real-world Low Resolution (LR) image dataset from the Widerface (Yang et al., 2016) face dataset. This dataset is very large in scale and diverse in terms of faces and it contains real-world taken pictures with various forms of noise and degradation. The dataset is composed of 50,000 images of size 1616 where 3,000 randomly selected and kept for testing.

Figure 2. The training starts with both the generator (G) and discriminator (D) having a low spatial resolution of 44 pixels. As the training advances, incrementally layers are added to G and D, thus increasing the spatial resolution of the generated images. All existing layers remain trainable throughout the process. Image taken from (Karras et al., 2017).

3.2. Degradation Model

The degradation model is inspired by (Bulat et al., 2018). The overall architecture is composed by a generator and a discriminator networks. Both models are based on ResNet architecture (He et al., 2015). The overall architecture is shown on Figure 1.

Degradation Generator. A HR image coming from the HR dataset is used as input to the degradation generator. The architecture is similar to the one used in (Bulat et al., 2018). The network is following an encoder-decoder schema and it composed by 12 residual blocks equally distributed in 6 groups. The resolution is dropped 4 times using a pooling layer. Specifically, from the input size of 6464 it is degraded to 44 px. Next, it is increased twice to 1616 using a pixel shuffle layer.

In addition to the HR image, a noise vector is concatenated that was projected and then reshaped using a fully connected layer in order to have the same size as one image channel. The intuition behind this, is that the degrading a HR image to LR image is an one-to-many problem where an HR image can have multiple corresponding LR images.

Degradation Discriminator.

The discriminator is similar to the ResNet architecture and it consists of 6 residual blocks without any batch normalization in between, followed by a fully connected layer. To drop the resolution of the 16

16 image, max-pooling is been used for the last two blocks.

Degradation Loss. The degradation generator and discriminator networks were trained with a total loss which is a combination of a GAN loss and a pixel loss defined as:

(1)

where and are the corresponding weights.

Following (Bulat et al., 2018), we used the GAN loss defined as:

(2)

where is the LR data distribution and is the generator distribution defined by . For the GAN loss, according to the author an “unpaired” training setting is used where the real-world images from the LR dataset are enforcing the output of the generator (whose input is images from the HR dataset) to be contaminated with real-world noisy artifacts. According to (Arjovsky et al., 2017), using Wasserstein distance as GAN loss, greatly improves the stability of the GAN model. In (Arjovsky et al., 2017), in order to enforce the Lipschitz constraint the authors used weight clipping. We decided to enforce this constraint using the more recent and improved approach of gradient penalty as described in (Gulrajani et al., 2017).

Finally, the loss is used to enforce the output of the generator to have similar content (i.e. face identity, pose and expression) with the original HR image and is defined as:

(3)

where and are the corresponding weights. The loss is defined as:

(4)

where is an up-scaling function. We also decided to use the perceptual loss (Johnson et al., 2016) which was found to give perceptually pleasing results. This is defined as:

(5)

where

be the features extracted from a deep-network at the end of the i’th block (we use VGG

(Liu and Deng, 2015))

Figure 3. Given a HR generator and training samples , SR-NAM jointly learns the degradation network and the latent vectors that give rise to samples that resemble the training images in

.

3.3. HR Generative Model

This section describes the HR generator that is used to generate HR images given latent representation. Pre-training a good, generalized face generator is crucial for the success of the SR-NAM model. For this reason, we decided to experiment with Progressive GAN architecture as described in (Karras et al., 2017). The authors of that paper showed that their model is very effective on generating good quality HR images given enough face images. The overall architecture can be seen on Figure 2.

Progressive GAN. Following (Karras et al., 2017), the idea behind the progressive generator architecture is to start with a low-resolution image, and then progressively increase the resolution by adding layers to the network. This is visualized on Figure 2. This incremental procedure helps the training to first, find the large-scale structure of the image distribution ,and to then shift attention to progressively finer scale details, instead of trying to learn everything simultaneously. The discriminator and generator networks are mirrored and grow in a synchronous way. All the previous layers up to the new resolution remain trainable throughout the training process. The new layers that are added to the network are faded in smoothly to avoid sudden changes to the already well-trained lower resolution layers.

Since the project was implemented in PyTorch, we decided to use an existing implementation

111https://github.com/akanimax/pro_gan_pytorch of the network. The reader is encouraged to learn more about this interesting and exciting network by reading the original paper (Karras et al., 2017).

3.4. Sr-Nam

This section describes the Super-Resolution using Non-Adversarial Mapping approach in order to retrieve multiple HR images from a single LR image, which is the main focus of the paper. Let be the low resolution space and be a high resolution space, consisting of sets of images and respectively. The objective is to find every image in the high resolution space, that is analogous to an image in the low resolution domain. Each must appears to come from the high resolution space but preserve the unique content of the original image.

Non-Adversarial Mapping. NAM (Hoshen and Wolf, 2018) is a method for unsupervised mapping across image domains. For using this approach, you must have a pre-trained unconditional model of the domain, which in our case if the high resolution space. In addition, you must have a set of domain training images which in our case corresponds to from the low resolution space.

Given a pre-trained high resolution domain generative model , a pre-trained degradation model and a set of

training images, NAM estimates the latent code

for every training image so that the generated image from this latent code maps to low resolution image . Figure 3 shows exactly this process. The entire optimization problem is:

(6)

The advantages of NAM include that it does not use adversarial training to learn the mapping between high and low resolution images. In addition, the mapping can be applied to many situations and multiple solutions can be recovered for a single low resolution input image. NAM is also capable of using a pre-trained high resolution model as well as a pre-trained low resolution model that they only need to be estimated once.

In contrast to (Hoshen and Wolf, 2018), we decided not to include the perceptual loss as part of the optimization objective. This is because, although perceptual loss successfully yields perceptually pleasing results (i.e. following perceptually the content of the low resolution image such as similar pose, expression, face geometry etc.) it is of little use when the goal is to recover as close as possible a low resolution image to the higher resolution counterpart since it can produce images that are correct perceptually but totally different visually. Thus, we decided to only minimize the loss between and .

Inference. Since all the networks inside the SR-NAM model are already pre-trained and fixed, only the latent codes need to be trained each time. To infer an analogy of a new image, we need to recover the latent code which would yield the optimal reconstruction. The generated high resolution image is the proposed solution to the low resolution image .

Multiple Solutions. In order to produce multiple HR images from a LR image, it is sufficient to differently initialize the latent code . This is because the problem space is non-convex, thus starting from a different point in the space can yield different final analogies.

Figure 4. Examples of different low-resolution samples produced by the degradation network for different input noise vectors.

4. Experiments

In this section, we demonstrate the effectiveness of the SR-NAM approach by reporting some qualitative results on both the HR as well as the LR dataset. Further, we show the performance of both the degradation network and the progressive GAN procedure.

4.1. Implementation Details

In this section, we give a detailed description of the procedure used to generate the experiments presented in this project.

Degradation Model. The intuition behind the degradation model is to create a model that can generate a realistically taken LR face image based on a HR image. This model is used both to create the ground truth LR images of the HR image dataset as well as the degradation model for the SR-NAM approach, which is responsible to convert the generated HR candidate back to the LR image in order to be compared with the ground truth LR image. We trained the model for 500,000 iterations and we used the Adam(Kingma and Ba, 2015) optimizer with default settings. The discriminator was set to be trained for 5 more iterations on each iteration step before training the generator. The value for gradient penalty was set to 10. The latent size for the input noise was set to be 100. In addition, the batch size was chosen to be 64. Finally, a pre-trained version of the VGG network with 19 layers has been used for the perceptual loss.

HR Generative Model. SR-NAM takes as input a pre-trained generative model of the HR image domain. As mentioned in Section 3.3, we decided to use the ProGAN model. The depth of resolution was set to 5 which translates to 64

64 images. Each resolution was trained for 10, 20, 20, 20 and 50 epochs respectively. The batch sizes were also set to 64, 64, 64, 32 and 16 for each resolution. Due to the complexity of the face generation problem, we decided to use a latent size of 512. In addition, we used Adam optimizer with default settings for the optimization procedure. Finally, the training of the model took approximately two weeks to complete in a single NVIDIA GeForce GTX 1080 Ti GPU card.

SR-NAM Model. Since SR-NAM takes as input both the HR generative models and the degradation model as pre-trained and fixed networks, the only optimization that is needed to be done is on the latent codes for each training/testing example. Again, we chose to use the Adam optimizer with default settings. Since the results are sensitive to how much generalized is the HR generative model, the number of iterations that each example needs in order to find and recover a corresponding HR image from the learned HR space varies. Empirically, we found a good number of iterations to be between 250 and 500 iterations.

4.2. Degradation Model Results

Figure 4 shows the results of the trained degradation model. It is clear that the model is able to produce a 1616 low resolution image given a 64

64 high resolution image. It is worth noting that the network can model a variety of image degradation styles, in different levels, such as blurriness, distortion, colouring, illumination, face geometry etc. Thus, it learns the different types of noise that is more probably to be produced in a real-world setting and as like the image was taken using a low quality camera.

4.3. Progressive Generator Results

We show examples of a variety of face images generated at 6464 by ProGAN. Figure 5 shows generated results of faces at 44, 88, 1616, 3232 and finally at 6464 using a fixed random noise given as input on each resolution. The progressive generator successfully learns to produce clear 6464 face images.

Figure 5. Qualitative results showing the effectiveness of the progressive generator for five different resolutions: 44, 88, 1616, 3232 and 6464. All the examples were generated using a fixed random noise input.
Figure 6. Results of the SR-NAM method on the HR dataset that described on Section 3.1. The first set of columns shows the original HR image and the degraded corresponding image after using the pre-trained degradation model. The rest three set of columns show the multiple generated HR images with different random initialization each time along with the associated LR image coming from the generated HR image.

4.4. SR-NAM Results

In this section, we evaluate the performance of SR-NAM. The details of our experiments are as follows:

Performance metrics. The scope of this paper is to create an approach capable of generating multiple HR face images that correspond to a LR input face image. To date, based on my knowledge, there is no quantitative metric that can be used to measure if an image follows both perceptually and visually a given ground truth image. To overcome this issue, we propose the following new metric that can be used to measure the performance of each different generated HR image. This metric will use a facial landmark localization algorithm such as (Bulat and Tzimiropoulos, 2017b), to find all the facial landmarks from the LR image and compare them with the landmarks from the original HR image. The metric is defined as:

(7)

where is the heatmap corresponding to the n–th landmark at pixel produced by the facial landmark localization algorithm with input the generated HR image and is the heatmap obtained by running the algorithm on the original HR image . Due to time constrains, we did not perform quantitative evaluation on the proposed approach using this metric but it is worth noting that this can be a possible new metric to the problem.

For evaluating the quality of the generated HR images, in literature there are mainly two standard metrics that have been used, namely the PSNT and SSIM (Bovik et al., 2004) metrics. Up to date, those metrics are heavily criticized by the research community (Ledig et al., 2016; Bulat and Tzimiropoulos, 2017c) as they fail to perceive the real image quality and are considered poor measures. Since, the scope of the project is not to produce better quality HR images but to find a way of producing multiple HR corresponding images from a LR input image, there was no reason for computing these metrics.

Qualitative Results. Figure 6 shows qualitative results for several images from the HR dataset. The first set of columns shows the original HR image and the corresponding degraded LR image. The rest set of columns show the generated HR image and the corresponding degraded LR image on three different reconstructions using different random initialization on the each time. The proposed approach is unsupervised i.e. on inference time, the needs to be learned using the objective of comparing the input LR image with the degraded HR image until they match. Thus, it is worth noting that the model successfully learns to match the input LR image with the generated LR images as it is visualized on Figure 6. In addition, it is successfully generates new plausible reconstruction of the input LR image each time.

We also show results on Figure 7 using the LR dataset that described in Section 3.1. It is clear that it successfully reconstructs a more clear HR image compared to the LR image used as input. The faces have still a lot of noise and not all the times follow exactly the geometry of the faces and all the other artifacts but still closely resembles the LR input image.

Figure 7. SR-NAM results on the real-world LR dataset. Top row on each set of examples shows the LR input image, middle row shows the generated HR output image and bottom row shows the degradation of the generated HR output that is used to match the LR input image.

4.5. Failure Cases and Discussion

The success of this approach lies on the very well trained and generalized HR Generative model. Without a generator that can function from a space of learned faces vast enough to simulate practically all the possible face generations, this method would not work in practice. By no-means we claim that the current training that we performed is enough to solve this problem but as a proof of concept it is clear that given the appropriately trained and generalized models this method can yield different reconstructions each time. On Figure 8, we demonstrate some of the failure cases where the face either does not resemble the LR face or is not successful on reconstructing a face at all. Furthermore, Figure 8 depicts some cases where reconstructing a HR face can be challenging due to distortion and illumination.

5. Conclusion and Future Work

In this paper, we presented a method for face super-resolution which does not assume that there is only a single HR image from a LR input image but rather maps the LR image to multiple candidate HR images. In addition, the presented method does not assume as input an artificially generated LR image but aims to produce results applied to real-world LR images. We discussed the advantages and disadvantages of the presented method including that the power of the method lies mainly on the training of the HR generator and that this model needs to be well generalized in all possible faces in order to perform well given new unseen examples. Finally, we demonstrated some qualitative results of the SR-NAM method on both real-world low resolution images and degraded high resolution images.

Figure 8. SR-NAM failure cases on the real-world LR dataset. Top row shows the input LR image and bottom row shows the generated HR image.

References

  • N. Ahn, B. Kang, and K. Sohn (2018) Fast, accurate, and, lightweight super-resolution with cascading residual network. CoRR abs/1803.08664. External Links: Link, 1803.08664 Cited by: §2.1.
  • M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In

    Proceedings of the 34th International Conference on Machine Learning

    , D. Precup and Y. W. Teh (Eds.),
    Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 214–223. External Links: Link Cited by: §3.2.
  • A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Document, ISSN 1057-7149 Cited by: §4.4.
  • A. Bulat and G. Tzimiropoulos (2017a) How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230, 000 3d facial landmarks). CoRR abs/1703.07332. External Links: Link, 1703.07332 Cited by: §3.1.
  • A. Bulat and G. Tzimiropoulos (2017b) How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In International Conference on Computer Vision, Cited by: §2.1, §4.4.
  • A. Bulat and G. Tzimiropoulos (2017c) Super-fan: integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. CoRR abs/1712.02765. External Links: Link, 1712.02765 Cited by: §4.4.
  • A. Bulat, J. Yang, and G. Tzimiropoulos (2018) To learn image super-resolution, use a GAN to learn how to do image degradation first. CoRR abs/1807.11458. External Links: Link, 1807.11458 Cited by: Figure 1, §1, §2.1, §3.1, §3.1, §3.1, §3.2, §3.2, §3.2.
  • J. Caballero, C. Ledig, A. P. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi (2016) Real-time video super-resolution with spatio-temporal networks and motion compensation. CoRR abs/1611.05250. External Links: Link, 1611.05250 Cited by: §1.
  • Q. Cao, L. Lin, Y. Shi, X. Liang, and G. Li (2017a) Attention-aware face hallucination via deep reinforcement learning. CoRR abs/1708.03132. External Links: Link, 1708.03132 Cited by: §2.1.
  • Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman (2017b) VGGFace2: A dataset for recognising faces across pose and age. CoRR abs/1710.08092. External Links: Link, 1710.08092 Cited by: §3.1.
  • Y. Chen, Y. Tai, X. Liu, C. Shen, and J. Yang (2017) FSRNet: end-to-end learning face super-resolution with facial priors. CoRR abs/1711.10703. External Links: Link, 1711.10703 Cited by: §2.1.
  • Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2017)

    StarGAN: unified generative adversarial networks for multi-domain image-to-image translation

    .
    CoRR abs/1711.09020. External Links: Link, 1711.09020 Cited by: §2.2.
  • C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image super-resolution using deep convolutional networks. CoRR abs/1501.00092. External Links: Link, 1501.00092 Cited by: §1.
  • R. Fattal (2007) Image upsampling via imposed edge statistics. ACM Trans. Graph. 26 (3). External Links: ISSN 0730-0301, Link, Document Cited by: §2.1.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. CoRR abs/1704.00028. External Links: Link, 1704.00028 Cited by: §3.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep residual learning for image recognition. CoRR abs/1512.03385. External Links: Link, 1512.03385 Cited by: §3.2.
  • Y. Hoshen and L. Wolf (2018) NAM: non-adversarial unsupervised domain mapping. CoRR abs/1806.00804. External Links: Link, 1806.00804 Cited by: §2.2, §3.4, §3.4.
  • [18] (2008-06) Image super-resolution using gradient profile prior. In

    2008 IEEE Conference on Computer Vision and Pattern Recognition

    ,
    Vol. , pp. 1–8. External Links: Document, ISSN 1063-6919 Cited by: §2.1.
  • M. Irani and S. Peleg (1991) Improving resolution by image registration. CVGIP: Graph. Models Image Process. 53 (3), pp. 231–239. External Links: ISSN 1049-9652, Link, Document Cited by: §2.1.
  • J. Johnson, A. Alahi, and F. Li (2016) Perceptual losses for real-time style transfer and super-resolution. CoRR abs/1603.08155. External Links: Link, 1603.08155 Cited by: §3.2.
  • A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos (2016) Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging 2, pp. 109–122. Cited by: §1.
  • T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196. External Links: Link, 1710.10196 Cited by: Figure 2, §3.3, §3.3, §3.3.
  • J. Kim, J. K. Lee, and K. M. Lee (2015) Accurate image super-resolution using very deep convolutional networks. CoRR abs/1511.04587. External Links: Link, 1511.04587 Cited by: §1.
  • K. I. Kim and Y. Kwon (2010) Single-image super-resolution using sparse regression and natural image prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (6), pp. 1127–1133. External Links: Document, ISSN 0162-8828 Cited by: §2.1.
  • T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim (2017) Learning to discover cross-domain relations with generative adversarial networks. CoRR abs/1703.05192. External Links: Link, 1703.05192 Cited by: §2.2.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Link Cited by: §4.1.
  • M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Vol. , pp. 2144–2151. External Links: Document, ISSN Cited by: §3.1.
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi (2016) Photo-realistic single image super-resolution using a generative adversarial network. CoRR abs/1609.04802. External Links: Link, 1609.04802 Cited by: §1, §4.4.
  • D. Liu, Z. Wang, Y. Fan, X. Liu, Z. Wang, S. Chang, and T. Huang (2017) Robust video super-resolution with learned temporal dynamics. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 2526–2534. External Links: Document, ISSN 2380-7504 Cited by: §1.
  • S. Liu and W. Deng (2015) Very deep convolutional neural network based image classification using small training sample size. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Vol. , pp. 730–734. External Links: Document, ISSN 2327-0985 Cited by: §3.2.
  • Z. Liu, P. Luo, X. Wang, and X. Tang (2014) Deep learning face attributes in the wild. CoRR abs/1411.7766. External Links: Link, 1411.7766 Cited by: §3.1.
  • S. Park, H. Son, S. Cho, K. Hong, and S. Lee (2018) SRFeat: single image super-resolution with feature discrimination. In The European Conference on Computer Vision (ECCV), Cited by: §2.1.
  • M. S. M. Sajjadi, B. Schölkopf, and M. Hirsch (2016) EnhanceNet: single image super-resolution through automated texture synthesis. CoRR abs/1612.07919. External Links: Link, 1612.07919 Cited by: §1.
  • W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CoRR abs/1609.05158. External Links: Link, 1609.05158 Cited by: §1.
  • A. Shocher, N. Cohen, and M. Irani (2017) ”Zero-shot” super-resolution using deep internal learning. CoRR abs/1712.06087. External Links: Link, 1712.06087 Cited by: §1.
  • X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia (2017) Detail-revealing deep video super-resolution. CoRR abs/1704.02738. External Links: Link, 1704.02738 Cited by: §1.
  • T. Tong, G. Li, X. Liu, and Q. Gao (2017) Image super-resolution using dense skip connections. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 4809–4817. External Links: Document, ISSN 2380-7504 Cited by: §2.1.
  • C. D. Xintao Wang and C. C. Loy (2018) Recovering realistic texture in image super-resolution by deep spatial feature transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.1.
  • J. Yang, J. Wright, T. S. Huang, and Y. Ma (2010) Image super-resolution via sparse representation. IEEE Transactions on Image Processing 19 (11), pp. 2861–2873. External Links: Document, ISSN 1057-7149 Cited by: §2.1.
  • S. Yang, P. Luo, C. C. Loy, and X. Tang (2016)

    WIDER face: a face detection benchmark

    .
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.
  • Z. Yi, H. Zhang, P. Tan, and M. Gong (2017) DualGAN: unsupervised dual learning for image-to-image translation. CoRR abs/1704.02510. External Links: Link, 1704.02510 Cited by: §2.2.
  • X. Yu, B. Fernando, R. Hartley, and F. Porikli (2018) Super-resolving very low-resolution face images with supplementary attributes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.1.
  • X. Yu and F. M. Porikli (2016) Ultra-resolving face images by discriminative generative networks. In ECCV, Cited by: §2.1.
  • R. Zeyde, M. Elad, and M. Protter (2012) On single image scale-up using sparse-representations. In Proceedings of the 7th International Conference on Curves and Surfaces, Berlin, Heidelberg, pp. 711–730. External Links: ISBN 978-3-642-27412-1, Link, Document Cited by: §2.1.
  • H. Zhang, J. Yang, Y. Zhang, and T. S. Huang (2010) Non-local kernel regression for image and video restoration. In ECCV, Cited by: §2.1.
  • K. Zhang, W. Zuo, and L. Zhang (2019) Deep plug-and-play super-resolution for arbitrary blur kernels. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.1.
  • J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networkss. In Computer Vision (ICCV), 2017 IEEE International Conference on, Cited by: §2.2.
  • S. Zhu, S. Liu, C. C. Loy, and X. Tang (2016) Deep cascaded bi-network for face hallucination. CoRR abs/1607.05046. External Links: Link, 1607.05046 Cited by: §2.1.