FH-GAN: Face Hallucination and Recognition using Generative Adversarial Network

05/16/2019 ∙ by Bayram Bayramli, et al. ∙ Shanghai Jiao Tong University 0

There are many factors affecting visual face recognition, such as low resolution images, aging, illumination and pose variance, etc. One of the most important problem is low resolution face images which can result in bad performance on face recognition. Most of the general face recognition algorithms usually assume a sufficient resolution for the face images. However, in practice many applications often do not have sufficient image resolutions. The modern face hallucination models demonstrate reasonable performance to reconstruct high-resolution images from its corresponding low resolution images. However, they do not consider identity level information during hallucination which directly affects results of the recognition of low resolution faces. To address this issue, we propose a Face Hallucination Generative Adversarial Network (FH-GAN) which improves the quality of low resolution face images and accurately recognize those low quality images. Concretely, we make the following contributions: 1) we propose FH-GAN network, an end-to-end system, that improves both face hallucination and face recognition simultaneously. The novelty of this proposed network depends on incorporating identity information in a GAN-based face hallucination algorithm via combining a face recognition network for identity preserving. 2) We also propose a new face hallucination network, namely Dense Sparse Network (DSNet), which improves upon the state-of-art in face hallucination. 3) We demonstrate benefits of training the face recognition and GAN-based DSNet jointly by reporting good result on face hallucination and recognition.



There are no comments yet.


page 2

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, super-resolution models

[4], [34], [19]

which produce high-resolution (HR) images from low-resolution (LR) images has progressed tremendously thanks to the deep learning techniques. Since it is an ill posed problem, LR input may correspond to many HR candidate images which may lead to losing identity information. Many existing works do not consider identity information while hallucinating LR face images, as a result they cannot produce HR faces similar to the real identity. On the other hand, the extensive use of surveillance systems and security cameras makes a challenging use case for face recognition in an environment where detected faces will be in low resolution. Although some face recognition methods

[10], [11], [8], [9] achieved satisfactory results, these algorithms cannot perform well on the low resolution images. Since LR face images may match with many HR candidates, this uncertainty may lead to distorted identity information. Based on these facts, we can see that recovering identity information can improve low resolution face recognition systems and as well as performance of face hallucination.

Figure 1: Hallucination example of our method.

To address this issue, we aim to answer how to hallucinate low resolution face images which can also improve face recognition performance. The goal of the proposed method, FH-GAN, is to enhance upon the visual quality and recognizabilty of low resolution facial images by considering the identity information recovery during super-resolution process. The architecture of FH-GAN is illustrated in Figure 2.

Specifically, we propose an end-to-end FH-GAN network to hallucinate low resolution faces and preserve the identity information which is qualified for face recognition. To achieve it, we introduce:

Figure 2: The architecture of our proposed FH-GAN consists of three associated networks: 1) the main network as well as the generator network is a newly proposed Face Hallucination network(sub-section 3.1). 2) discriminator network used to distinguish between HR face image and hallucinated face image(see sub-section 3.2). The third network is Face Recognition network for recognizing on the hallucinated face images and enhancing face hallucination through an identity loss(see sub-section 3.3). FR - Face Recognition and denotes concatenation.
  • a novel generator architecture for GAN which is sparsely aggregating the output of previous layers at any given depth. It offers fewer parameters, improves flow of information through the network and alleviates gradient vanishing problem.

  • our GAN-based face hallucination utilizes both pixel level and feature level information as the supervisory signal to preserve the identity information.

  • identity loss which measures identity difference between hallucinated HR image and ground truth HR images by using the face recognition.

2 Related Work

In this section, we review the related work in image super-resolution, face hallucination, and face recognition.

Single Image Super Resolution (SISR). SISR aims to reconstruct HR image from its corresponding LR input. Many super resolution methods have been developed including classical approaches [13], [2], [35] and deep learning based approaches [31], [24]. In recent years, huge improvements in deep learning methods have also resulted in significant enhancements in image super resolution techniques. The first work that utilized convolutional networks for super-resolution purposes was SRCNN by Dogn et al., [3]

to predict mapping between interpolated LR and HR pair images using three layers of convolutional networks. This benchmark was further enhanced by expanding network depth. To further improve the reconstruction accuracy

[24], [25]

used more convolutional deep neural networks. They both used interpolation of original LR images as an input which causes an increase in computation and information loss. Later on,

[31] used sub-pixel convolutional layer to learn effective upscaling. Notably, we also use sub-pixel layer in our network. Later, [5] exploited advantage of residual learning by using sub-pixel layer. However, all these methods ignore to take advantages of information from each convolutional layer. Consequently, these methods lose useful hierarchical features from LR image. [33] introduced the basic dense block from DenseNet [14] to learn hierarchical features but the problem with this method is that feature maps aggregated by dense skip connections are not fully exploited. To solve these issues, we propose sparsely aggregated skip connection blocks in our generator network (DSNet) to concatenate features at different levels.

Face Hallucination. Image SR methods can be applied to all kind of images which do not incorporate face-specific information. Generally, face hallucination is a type of class-specific image SR. [41] introduced bichannel convolutional networks to hallucinate face images in the wild. [37] introduced two-step auto-encoder architecture to hallucinate unaligned, noisy low resolution face images.[21] also introduced identity information recovery in their proposed method. [36] proposed GAN-based method to super resolve very low resolution image without using perceptual loss. Except from [21] which is not using GAN-based generator, above mentioned methods do not consider identity information in hallucination process which is vital for recognition and visual quality. In our method, we used perceptual loss to achieve more realistic results and identity loss to incorporate with face recognition model to facilitate identity space by utilizing advanced GAN method. Our experiments demonstrate indistinguishable visual quality images and improve the performance of low resolution face recognition.

Figure 3: The architecture of our proposed super-resolution network, DSNet.

Face Recognition. The low-resolution face recognition task is a subset of the face recognition. There are many useful application scenarios for this task such as security cameras and surveillance systems. In this scenario, face images are captured in the wild from cameras with a large standoff. Some state-of-art techniques [6], [12], [7] has already achieved an accuracy over 99 percent. However, those algorithms can only deal effectively on faces with large region of interest. Therefore, when resolution drops, the performance of these algorithms drops respectively. [42] proposed a relationship-learning-based SR between the high-resolution image space and the LR image space. [40] showed the problem of very low resolution recognition cases through deep learning based architecture.

This is one of the main motivations in our work. We employed the face recognition model of [18]. ArcFace model provides excellent performance on face verification on high resolution images as shown in [18]. In our paper, ArcFace is trained specifically to preserve identity of low resolution face image as well as to enhance the face image quality while hallucinating. As a result, one of our contributions is to demonstrate that a face recognition model when incorporated and trained end-to-end with a super resolution network can still give high accuracy on low resolution face images.

3 Method

In this section, we will first describe the proposed architecture including three connected networks and their loss functions: the first network is a super-resolution network which is also used as a generator, Densely connected Sparse Blocks network (DSNet), used to super-resolve LR face images to HR face images. The second one is an adversarial network used to distinguish super-resolved images from HR correspond. The third network is Face Recognition for identity preserving on the hallucinated facial images. In the end we will describe our identity loss. During evaluation time, the discriminator is not used. In general, we call our algorithm FH-GAN, shown in Fig.2

3.1 Face Hallucination Network

Notably, we propose an architecture that aims to learn end-to-end mapping function between low-resolution facial image and it’s corresponding high-resolution facial images . As shown in figure 3, Dense Sparse network (DSNet) is mainly composed of four parts: low level feature extractor(LLFE), sparely aggregated CNN blocks for learning high level features (SparseBlock - SpB), upscaling layer for increasing the resolution size and a reconstruction layer for generating the HR output.

LLFE. We denote and as the input and output of DSNet. Specifically, we use two convolutional layers, from now on we call Conv, to extract shallow level features. The first Conv layer extracts features from LR input


where denotes the convolution operation and is the output of first low level feature extractor. The output of (1) will be the input of second Conv layer


where denotes the second low level feature extractor convolution operation and is the output of respective layer.

Sparse Blocks (SpB). After applying LLFE layers to learn low level features, is used as input to Sparse Blocks for learning high-level features. The sparse block structure is inspired by sparse aggregation in convolutional networks, first proposed in [27]. In the structure of SparseNet [27] feature maps from previous layers are sparsely concatenated together rather than directly summed as in ResNets [22]

. As shown in Figure 4, each sparse block in our network consists of multiple layers, where each layer is a composition of a convolution followed by PReLu activation function. Within a sparse block, rather than concatenating features from all previous layers, the number of incoming links to a layer are reduced by aggregating the state of preceding layers at an exponential offsets; for example

layers will be concatenated as input for -th layer. The output of -th convolutional layer in SpB is computed as:


where refers to the concatenation of feature maps, is the weights of the Conv layer and denotes the PReLU [] activation function. Bias term is omitted for simplicity. is a positive integer and is the largest non-negative integer such that .

The main difference of SparseNet from DenseNet and ResNet is that the input to a particular layer is formed by aggregation of a subset of previous outputs. The power of short gradient paths is maintained in the Sparse Blocks. The importance of short paths is to enhance the flow of information thence alleviating the vanishing gradient problem. Moreover, altering the number of incoming links to be logarithmic, the sparse block architecture drastically reduce the number of parameters, thereby require less memory and computation cost to achieve high performance.

Multiple sparse blocks are joined together to constitute a high-level feature learner component. Each sparse block receives a concatenation of low-level features from (2) and all preceding sparse blocks as input via skip connections. This enables each sparse block to directly see low-level as well as high-level feature information for better reconstruction performance.

Bottleneck layer. As described above, features from the previous SpB are introduced directly to the next SpB in a concatenation way. This yields a large sized input for the subsequent up-sampling layer, so it is essential to reduce the features size. It has been studied in [38] that a convolutinonal layer size of 1 x 1 kernel can be utilized as a bottleneck layer to diminish the size features map. To enhance model computational efficiency, we utilize bottleneck layer to diminish number of features before feeding them to upsampling layer. The number of feature maps is reduced to 128.

UpSampling and Reconstruction layer. We use sub-pxiel [31] to upscale the LR feature maps to HR feature maps. The ultimate Conv layer in the DSNet which has 3 x 3 kernel size and 3 channels is used for reconstruction.

3.1.1 Pixel and perceptual loss

Given a set of low resolution images and its corresponding high resolution images we minimize the Mean Squared Error(MSE) in image space which is named Pixel-wise loss:


where represents the output of generator network and is the batch size. Although, MSE loss achieves high PSNR values, it usually results in blurry and unrealistic images. To handle this, perceptual loss is proposed in [20] to achieve visually good and sharper images. In perceptual loss, MSE is used in feature space of hallucinated image and its corresponding HR image. We extracted features of HR image and hallucinated image from VGG-19 [32] to calculate the following loss:


where denotes the feature maps obtained from the last convolutional layer of VGG-19[] and is the super-resolved face image.

Figure 4: The architecture of our Sparse Block.

3.2 Adversarial Network

In this subsection, we define adversarial loss to produce realistic super resolved face images. The idea of using GAN [16] is straightforward: the goal of discriminator D is to distinguish super-resolved images generated by generator G from the original images. The generator G aims to generate realistic face images to fool D. In DSNet, we use Wasserstein GAN (WGAN)[28] which is then improved in WGAN-GP [17]. The reason to use WGAN-GP is not to enhance the quality of hallucinated face images but to stabilize and reduce the overall training time. As the generator of WGAN-GP we use our super-resolution network and for the discriminator network we utilize the discriminator of DCGAN [1]

without using batch normalization.

Adversarial Loss. We employ the WGAN-GP loss in our face hallucination network:


where is the input data distribution and is the generator distribution defined by is obtained by uniformly sampling along straight lines between pairs of samples from and . is a penalty coefficient which we set to 10 in our experiments.

3.3 Face Recognition Network

Herein, we employ ArcFace as our face recognition model due to it is state-of-the-art performance on identity representation. ArcFace is Resnet-like [22] CNN model and it is trained by Additive Angular Margin Loss(ArcFace) which can effectively enhance the disciminative power of feature embeddings. ArcFace loss function is modified traditional Softmax loss. The keypoint in ArcFace is that the classification boundary is maximized directly in the angular space. More details about ArcFace can be found here [18]. The loss function of ArcFace on a training image sample is represented as:


where is the -th sample, N is a batch size.

is the hyperparameter of angular margin and

is the feature scale. Given a mini-batch, we compute the on concatenation of non-paired and face images. We train ArcFace using the following loss:


where, { } denotes concatenation.

3.3.1 Identity loss

Equation (4), (5), (6) have been used in general purpose super-resolution. Although, they do provide decent results for facial super-resolution, during the super-resolution process identity information is easy to be lost as these losses are not incorporating information related to face identity information. We have examined that when these losses are used alone identity details may be missing and the performance of the face recognition decreasing (see Table 3.)

To alleviate this issue, we propose to enforce facial identity consistency between the low and the high resolution face images via integrating face recognition network. Simply, we further use a constrain on the identity level. Therefore, for better preservation of human face identity of the super-resolved images, identity-wise feature representation with face recognition network used as supervisory signal. The identity loss described as follows:


where and

are the identity features extracted from the fully connected layer of our face recognition model.

represents -th generated facial images.

3.4 Overall training loss

In summary, the overall losses used for training FH-GAN is weighted sum of the above loss functions:


where , , , are the corresponding loss weights.

4 Experiments

In this section, description of the training and testing details will be first provided followed by the implementation details. Afterwards, we will discuss the comparisons with others and benefit of our method. Later, we will present the effectiveness of using identity loss. Furthermore, we report standard super-resolution metrics, PSNR and SSIM, of proposed FH-GAN. According to [5], the result of PSNR and SSIM are not indicative of visual quality. To alleviate the issue with poor metrics of PSNR and SSIM, we also propose an indirect way to evaluate face image super-resolution quality based on face recognition result. We report face verification accuracies on different methods. In particular, we trained ArchFace on high resolution and hallucinated face images and then used it for verification of face images on low-resolution images.

4.1 Experimental Settings

Dataset. VGGFACE2 [29] is a large-scale dataset for face recognition and synthesis which cover a large range of pose, age and ethnicity. A total of 9000 identities contain images from a wide range of different ethnicities, accents, professions and ages. We use 8631 identities of 3.31 million images for training face recognition model. To train face hallucination, we randomly select 1.2M images from VGGFACE2 dataset.

We use two different datasets for our proposed method. The first one is LFW [15] dataset used for testing both face verification and face hallucination performance in the wild. The LFW contains 13,233 images from 5,749 identities. We use CFP [30] dataset to evaluate face verification. The CFP contains 7000 images from 500 identities. These two dataset are considered in unconstrained settings. Several state-of-the-art models such as, SRGAN [5], SRDenseNet [33], RDN [39] have been used to compare our approach.

Data Preprocessing. In order to conduct a fair comparison with other methods, training data is detected by MTCNN [23] and aligned to a canonical view of size 112 x 112.

Implementation details. HR image size was cropped and aligned to 112x112 and LR input image was obtained by downsampling the HR images using bilinear kernel with a scale factor of 4x.

To train ArcFace, we employed ResNet34 [22] and set the embedding features to 512. We follow [22] to set the feature scale to 64 and choose the angular margin

of ArcFace at 0.5 We set the batch size to 256 and the learning rate is started from 0.01 and divided by 10 after 15, 18 epochs. The training process has finished at 20 epoch.

To train GAN-based DSNet, we used 6 Sparse Blocks while each Sparse Block has 6 convolutional layers. In total, depth of face hallucination network size is 41 layers including, sparse blocks, low level feature extractors, bottleneck, upsampling and reconstitution layers. Within each Sparse Block, we used growth rate of 32. Low level feature extractors have filter size of 64 and size of all convolutional layers were set to 3x3 except bottleneck layer, where size is 1x1. The parametric rectified linear units (PReLu) was used as the activation function. All the networks were optimized using Adam. We used the mini batch size of 128. The learning rate is set to 1e-3 and gradually decreased to 1e-5. Training has finished at 56k iterations.

For end-to-end training of the FH-GAN, all networks(DSNet, discriminator and ArcFace) were training jointly for 4 epochs and learning rate of 1e-4. Face Hallucination model and ArchFace were trained using Adam [26]

and SGD respectively. All models are implemented in PyTorch.

4.2 Discussions

We compare our method to the other methods, including, SRDenseNet [33], RDN [39], SRGAN [5] to demonstrate the effectiveness of our proposed method.

Difference to SRDenseNet. First and foremost, SRDenseNet uses local dense connections from DenseNet [14] which concatenates all the outputs of previous layers thus results in over-burdening the model. However, concatenation allows every subsequent layer a clean view of all previous features but densely concatenation of features mean that a primary portion of the model is dedicated to process previously seen features. Consequently, it is hard for the model to make full use of dense skip connections and all the parameters. But, we exploit the local sparse connections into our proposed network inspired from SparseNets [27] which concatenated the features in an logarithmic manner rather than a linear manner. This property allows to utilize larger growth rate, which is filter size, and enlarge our model by using more layers. By using sparse aggregation topolopy in our proposed method, we reduce parameters size to half and achieve faster convergence compare to SRDenseNet. Another difference is that SRDenseNet only uses MSE loss but we use multiple losses to make the model robust to get better hallucinated face images. As a result, our method achieves better performance and generate visually pleasing face images.

Difference to SRGAN and RDN. In terms different choice of loss function, we mainly summarize differences of our method compared with SRGAN and RDN. RDN only uses loss function but in contrast we do not only use pixel level information but we also incorporate feature level information in our method. Using only pixel-wise loss will result in blurry images and lose identity information which is very crucial for face recognition. However, SRGAN utilizes feature level loss (perceptual loss) to make super-resolved images sharper but sometimes super-resolved images have some artifacts, such as white and red spots on the face. Additionally, SRGAN does not consider to preserve identity information in metric space which will lead to miss identity information and generate additional artifacts in super-resolved images. In our method, we use perceptual loss as well as identity loss to impose identity level constraint by jointly training face hallucination model with face recognition model.

Benefit of our model. In summary, by using sparse blocks we can further increase our model size and growth rate which is very beneficial for super-resolution task to use very deep networks in two aspects 1) large amount of contextual information can be utilized from LR images; 2) in very deep networks high nonlinearity generated by PReLu layers can be utilized to model the sophisticated mapping functions between LR and HR. By using sparse blocks, we get better flexibility and parameter efficiency. As can be seen in Fig. 5, our method provides the sharper and more detailed results performing well across different kind of face images.

4.3 Effectiveness of Identity Loss

Identity Loss. Table 1 shows the ablation investigation on the effects of identity loss. We find that, face recognition performance decreases when we do not include identity loss in our propose method. As we said earlier, because of ill-posed behavior of face hallucination methods it is easier to lose identity information during hallucination.

As shown in Table 1, we get better accuracy when we train FH-GAN jointly with face recognition network. We constrain identity level information by adding face recognition loss. The Identity level difference can be measured by robust face recognition model. The face recognition model with the identity-wise feature representation is used as supervisory signal which helps to preserve identity information and increase the performance of face verification.

Method Identity Loss Accuracy
FH-GAN x 99.00 %
FH-GAN 99.14 %
Table 1: Effectiveness of identity loss on Face Verification Performance.

4.4 Super Resolution Results

Figure 5: From top to bottom: LR image, Bilinear interpolation, SRGAN[5], [39], [33], our model and HR

We compared the PSNR and SSIM results using the proposed method and using other state-of-the-art super-resolution methods, including bilinear interpolation. As we discussed, because of robustness of our model, it achieves better results as compared to others. In most of the cases, standard metrics, such as PSNR and SSIM, for super resolution are not very reliable for visually better images.

FR-Bilinear 98.62 % 92.3 %
FR-SRGAN 99.03 % 93.08 %
FR-RDN 98.92 % 92.6 %
FR-SrDenseNet 98.87 % 92.16 %
FH-GAN 99.16 % 93.36 %
FR-HR images 99.47 % 95.05 %
Table 2: Face verification results on LFW and CFP dataset. FR stands for Face Recognition model which we used in our all experiments. The results in this case, are indicative of visual quality. FR-Bilinear means this method super-resolved the face image using bilinear interpolation and run Face Recognition model on that and similarly other methods.
Bilinear upsample 20.3 0.76
SR-GAN 20.78 0.77
SRDenseNet 20.26 0.79
RDN 21.26 0.81
Ours 21.35 0.83
Table 3: PSNR and SSIM based Face Hallucination performance on LFW. The results are not indicative of visual quality.

Although bilinear method is fast and very light in super resolving but the face images generated by this method are blurry and have artifacts. Bilinear method fails to super resolve low resolution images. Face images generated by RDN and SRDenseNet result in over-smoothed images because of learning only pixel-wise information. Consequently, over-smoothed images do not contain face features completely. As shown in Fig. 5, SRGAN faces contains white dots artifacts in hallucinated face images. Because of effectiveness of our generator network and identity loss we comparatively obtain visually good images.

A few failure cases of our method can be seen in Fig. 6. These failure cases are primarily because of large occlusions and multiple faces. In these failure cases our super-resolved images still preserve the identity but are distorted. Improving these images and investigating real low quality images are left for future works.

4.5 Face Recognition Results

The proposed FH-GAN aims to recognize low resolution human faces. Therefore, for verifying the identity preserving capacity of different super-resoution models, face recognition on two benchmark datasets is studied. We evaluate the performance of face verification on LFW dataset and CFP dataset by using the ArcFace extracted features of hallucinated face images.

Face Verification on low resolution LFW and CFP. Face verification performance evaluated on the recognition accuracy (ACC) in the wild are shown in Table 3. From the results RDN and SRDenseNet are flawed because of their weak specificity to identity preservation. Even though SRGAN has utilized perceptual loss but still their face verification accuracy is not good because they do not consider identity preservation in identity metric space. Our model achieves best results of face verification on two datasets which are very close to face verification results on HR face images. This is indicative of superiority of our face hallucination method.

Figure 6: Hallucinated examples of visually bad results produced by our method. These images include large occlusions.

5 Conclusion

This paper has answered how to hallucinate and recognize the faces simultaneously if the face image resolution is not sufficient enough. Specifically, we proposed FH-GAN: an end-to-end system for super-resolving face images and recognizing those images. Our method incorporates facial identity information in a newly proposed generator architecture using WGAN for face hallucination. The face recognition model aims to improve identity preservation and quality of hallucinated images. We show improvements on both face hallucination and low resolution face recognition.


  • [1] S. C. Alec Radford, Luke Metz. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015.
  • [2] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution through neighbor embedding. CVPR, 2004.
  • [3] K. H. X. T. Chao Dong, Chen Change Loy. Learning a deep convolutional network for image super-resolution. ECCV, 2014.
  • [4] K. H. X. T. Chao Dong, Chen Change Loy. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
  • [5] F. H. J. C. A. A. A. T.-J. T. Z. W. W. S. Christian Ledig, Lucas Theis. Photo-realistic single image super-resolution using a generative adversarial network. arXiv, 2016.
  • [6] F. S. et al. Facenet: A unified embedding for face recognition and clustering. arXiv preprint arXiv:1503.03832, 2015.
  • [7] X. Z. et al. Range loss for deep face recognition with long-tail. ICCV, 2017.
  • [8] Y. S. et al. Deep learning face representation from predicting 10,000 classes. CVPR, 2014.
  • [9] Y. S. et al. Deepid3: Face recognition with very deep neural networks. CoRR, abs/1502.00873, 2015.
  • [10] Y. T. et al. Deepface: Closing the gap to human-level performance in face verification.

    In Proc. Conference on Computer Vision and Pattern Recognition 1701–1708

    , 2014.
  • [11] Y. T. et al. Web-scale training for face identification. CVPR, 2015.
  • [12] Y. W. et al. A discriminative feature learning approach for deep face recognition. ECCV, 2016.
  • [13] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learning low-level vision. IJCV, 2000.
  • [14] K. W. G. Huang, Z. Liu and L. van der Maaten. Densely connected convolutional networks. ICCV, 2017.
  • [15] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
  • [16] M. M. B. X. D. W.-F. S. O. A. C. Y. B. Ian J. Goodfellow, Jean Pouget-Abadie. Generative adversarial networks. NIPS, 2014.
  • [17] M. A. V. D. Ishaan Gulrajani, Faruk Ahmed and A. C. Courville. Improved training of wasserstein gans. CoRR, abs/1704.00028, 2017.
  • [18] J. G. J. Deng and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698,, 2018.
  • [19] Q. Y. W. S. Jimmy SJ. Ren, Li Xu.

    Shepard convolutional neural networks.

    NIPS, 2015.
  • [20] L. F.-F. Justin Johnson, Alexandre Alahi. Perceptual losses for real-time style transfer and super-resolution. ECCV, 2016.
  • [21] C.-W. C. W. H. Y. Q. W. L. K. Zhang, Z. ZHANG and T. Zhang. “super-identity convolutional neural network for face hallucination. ECCV, 2018.
  • [22] S. R. J. S. Kaiming He, Xiangyu Zhang. Deep residual learning for image recognition. ICCV, 2017.
  • [23] Z. L. Y. Q. Kaipeng Zhang, Zhanpeng Zhang. Joint face detection and alignment using multi-task cascaded convolutional networks. SPL, 2016.
  • [24] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. CVPR, 2016.
  • [25] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. CVPR, 2016.
  • [26] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2014.
  • [27] M. M. Z. D. G. M. P. T. Ligeng Zhu, Ruizhi Deng. Sparsely aggregated convolutional networks. ECCV, 2018.
  • [28] L. B. Martin Arjovsky, Soumith Chintala. Wasserstein gan. arXiv:1701.07875, 2017.
  • [29] W. X. O. M. P. A. Z. Qiong Cao, Li Shen. Vggface2: A dataset for recognising faces across pose and age. FG, 2018.
  • [30] C. C. P. V. M. C. R. J. D. W. Sengupta Soumyadip, Chen Jun-Cheng. Frontal to profile face verification in the wild. WACV, 2016.
  • [31] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CVPR, 2016.
  • [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.
  • [33] X. L. Q. G. Tong Tong, Gen Li. Image super-resolution using dense skip connections. ICCV, 2017.
  • [34] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks for image super-resolution with sparse prior. ICCV, 2015.
  • [35] G. S. Weisheng Dong, Lei Zhang and X. Wu. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. TIP, 2011.
  • [36] F. P. Xin Yu. Ultra-resolving face images by discriminative generative networks. ECCV, 2016.
  • [37] F. P. Xin Yu.

    Hallucinating very low-resolution unaligned and noisy face images by transformative discriminative autoencoders.

    CVPR, 2017.
  • [38] X. L. C. X. Ying Tai, Jian Yang. Memnet: A persistent memory network for image restoration. CVPR, 2016.
  • [39] Y. K. B. Z. Y. F. Yulun Zhang, Yapeng Tian. Residual dense network for image super-resolution. CVPR, 2018.
  • [40] Y. Y. D. L. Z. Wang, S. Chang and T. S. Huang. Studying very low resolution recognition using deep networks. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  • [41] H. C. Z. J. Y. Q. Zhou, Fan. Learning face hallucination in the wild. AAAI, 2015.
  • [42] W. Zou and P. C. Yuen. Very low resolution face recognition in parallel environment. IEEE Transactions on Image Processing, 2012.