1 Introduction
Face hallucination, which generates highresolution (HR) facial images from lowresolution (LR) inputs, has attracted great interests in the past few years. However, most of existing works do not take the recovery of identity information into consideration such that they cannot generate faces closed to the real identity. Fig. 1 shows some examples of hallucinated facial images generated by bicubic and several stateoftheart methods. Though they generate clearer facial images than bicubic, the identity similarities are still low, which means that they cannot recover accurate identityrelated facial details. On the other hand, human perception of face heavily relies on identity information [3]. Pixellevel cues cannot fully account for the perception process of the brain. These facts suggest that recovering identity information may improve both the recognizability and performance of hallucination.
Motivated by the above observations, this paper proposes SuperIdentity Convolutional Neural Network (SICNN) for identityenhanced face hallucination. Different from previous methods, we additionally minimize the identity difference between the hallucinated face and its corresponding highresolution face. To do so, (i) we introduce a robust identity metric space in the training process; (ii) we define a superidentity loss to measure the identity difference; (iii) we propose a novel training approach to efficiently utilize the superidentity loss. More details as follows:
For identity metric space, we use a hypersphere space [20] as the identity metric space due to its stateoftheart performance of facial identity representation. Specifically, our SICNN is composed of a face hallucination network cascaded with a recognition network to extract identityrelated feature, and an Euclidean normalization operation to project the feature into the hypersphere space.
For loss function, perceptual loss
[12], computed by feature Euclidean distance, can construct convincing HR images. Differently, in our work, we need to minimize the identity distance of face pairs in the metric space. Here, we modified the perceptual loss to the superidentity loss calculated by normalized Euclidean distance (equivalent to geodesic distance) between the hallucinated face and its corresponding highresolution face in the hypersphere identity metric space. This also facilitates our analysis on the training process (see Sec. 3.5).For training approach, using conventional training approaches to directly train the model with superidentity loss is difficult due to the large margin between the hallucination domain and the HR domain in the hypersphere identity metric space. This is critical during the early training stage when face hallucination network cannot predict high quality hallucinated face images. Moreover, the hallucination domain keeps changing during the hallucination network learning, which makes the training with superidentity loss unstable. We summarize this challenge as a dynamic domain divergence
problem. To overcome this problem, we propose a Domain Integrated Training algorithm that alternately updates the face recognition network and the hallucination network by minimizing the different loss in each iteration. In this alterative optimization, the hallucinated face and HR face will gradually move closer to each other in the hypersphere identity metric space while keep the discrimination of this metric space.
The main contributions of this paper are as summarized as follows:

We propose Superidentity Convolutional Neural Network (SICNN) for enhancing the identity information in face hallucination.

We propose DomainIntegrated Training method to overcome the problem caused by dynamic domain divergence when training SICNN.

Compared with existing stateoftheart hallucination methods, the SICNN achieves superior visual quality and identity recognizability when superresolving a facial image of size 1214 pixels with an upscaling factor.
2 Related Works
Single image superresolution (SR) aims at recovering a HR image from a LR one. Face hallucination is a kind of classspecific image SR, which exploits the statistical properties of facial images. We classify face hallucination methods into two categories: classical approaches and deep learning approach.
Classical Approaches. Subspacebased and facial componentsbased methods are two main kinds of classical face hallucination approaches [16, 23, 15, 31, 37, 18, 17, 19].
For subspacebased methods. Liu et al. [16]
employed a Principal Component Analysis (PCA) based global appearance model to hallucinate LR faces and a local nonparametric model to enhance the details. Ma et al.
[23] used multiple local exemplar patches sampled from aligned HR facial images to hallucinate LR faces. Li et al. [15] resolved to sparse representation on local face patches. These subspacebased methods require precisely aligned reference HR and LR facial images with the same pose and facial expression.Facial components based methods superresolve facial parts rather than entire faces to address various poses and expressions. Tappen et al. [31] used SIFT flow to align LR images, and then deformed the reference HR images. However, the global structure is not preserved due to using local mapping. Yang et al. [37] presented a structured face hallucination method which can maintain the facial structure. However, it relies on accurate facial landmarks.
Deep Learning Approaches. Recently, deep convolutional neural networks (DCNNs) achieve remarkable progresses in a variety of face analysis tasks, such as face recognition [33, 35, 20], face detection [41, 42], facial attribute recognition [40, 22, 30, 34]. Zhou et al. [43] proposed a bichannel CNN to hallucinate blurry facial images in the wild. For unaligned faces, Zhu et al. [44]
proposed to jointly learn face hallucination and facial dense spatial correspondence field estimation. The approach of
[39] is a GANbased method to generate realistic facial images. These works ignore the identity information recovery that is important for recognizability and hallucination quality. Johnson et al. [12] and Bruna et al. [2]relied on perceptual loss function closer to perceptual similarity to recover visually more convincing HR images for general image SR. In this paper we modified the perceptual loss to facilitate identity hypersphere space and propose a novel training approach to overcome the challenging while using the loss.
3 SuperIdentity CNN
In this section, we will first describe the architecture of our face hallucination network. Then we will introduce the proposed superresolution loss and superidentity loss for identity recovery. After that, we will analyze the challenge, dynamic domain divergence problem, in superidentity training. At the last, we introduce the proposed domainintegrated training algorithm to overcome this challenge.
3.1 Face Hallucination Network Architecture
As shown in Fig. 2
(a), the face hallucination network can be decomposed into feature extraction, deconvolution, mapping, and reconstruction.
We use dense block [10] to extract semantic features from LR inputs. More specifically, in the dense block, we set the growth rate to 32 and the kernel size to 33. Deconvolution layer consists of learnable upscaling filters to enlarge the resolutions of input features. Mapping is implemented by a convolutional layer to reduce the dimension of features to reduce computational cost. Reconstruction also exploits a convolutional layer to predict HR images from semantic features.
Here, we denote a convolutional layer as and a deconvolutional layer as , where the variables and represent the filter size and the number of channels, respectively. In addition, PReLU [8]activation function achieves promising performance in CNNbased superresolution [6] and we use it after each layer except the reconstruction stage.
3.2 SuperResolution Loss
We use the pixelwise Euclidean loss, called superresolution loss, to constrain the overall visual appearance. For LR face input , we penalize the pixelwise Euclidean distance between the hallucinated face and its corresponding HR face:
(1) 
where and are the th LR and HR facial image pair in the training data respectively, and represents the output of hallucination network with input . For better understanding, we also denote as in the following text.
3.3 Hypersphere Identity Metric Space
Superresolution loss can constrain pixellevel appearance. And we further use a constrain on the identity level. To measure the identity level difference, the first step is to find a robust identity metric space. Here we employ the hypersphere space [20] due to its stateoftheart performance on identity representation. As shown in Fig. 2 (b), our hallucination network is cascaded with a face recognition network (i.e. ) and an Euclidean normalization operation that projects faces to the constructed hypersphere identity metric space.
is a Resnetlike [9] CNN (see Tab. 1). It is trained by ASoftmax loss function [20] which encourages the CNN to learn discriminate identity features (i.e. maximizing interclass distance and minimizing intraclass distance) by an angular margin. In this paper, we denote this loss function as the recognition loss . For a face input belonging to the th identity. The face recognition loss is represented as:
(2) 
where the denotes the learned angle for identity , is a monotonically decreasing function generalized from , and is the hyper parameter of angular margin constrain. More details can be found in Sphereface [20].
Layer Name  Output Size  Structure 

Input  96112   
Conv1a  94110  3 3, 64, pad 0 
Conv1b  92108  33, 64, pad 0 
Avepool1  4654  3 3, stride 2 
Residual_block1  4654  
Conv2  4452  33, 128, pad 0 
Avepool2  2226  33, stride 2 
Residual_block2  2226  
Conv3  2024  33, 256, pad 0 
Avepool3  1012  33, stride 2 
Residual_block3  1012  
Conv4  810  33, 512, pad 0 
Avepool4  45  33, stride 2 
Residual_block4  45  
FC1  512  45, 512 
3.4 SuperIdentity Loss
To impose the identity information in the training process, one choice is to use a loss computed by features Euclidean distance between face pairs, such as perceptual loss [12]. However, in this paper, since our goal is to minimize identity distance in hypersphere metric space, the original perceptual loss, computed by L2 distance is not the best choice in our task. Therefore, we propose a modified perceptual loss, called SuperIdentity (SI) loss, to compute the normalized Euclidean distance (equivalent to geodesic distance). This modification makes the loss directly related to identity in hypersphere space and facilitate our investigation in Sec. 3.5.
For a LR face input , we penalize the normalized Euclidean distance between the hallucinated face and its corresponding HR face in the constructed hypersphere identity metric space:
(3) 
where and are the identity features extracted from face recognition model () for facial images and , respectively. is the identity representation projected to the unit hypersphere.
In addition to , we want to have some discussions about perceptual loss beyond our work. In general, the perceptual loss is computed by L2 distance. However, in most CNNs, innerproduct operation is used in fullyconnected and convolutional layers. These outputs are related to the feature’s norm, weight’s norm and the angular between them. Therefore, for different tasks and different metric space (e.g. [21, 5, 25]), some modifications about computational metric space of perceptual loss are necessary ( is one of the cases).
3.5 Challenges of Training with SuperIdentity Loss
Superidentity loss imposes an identity level constrain. We examine different training methods as follows:
Baseline training approach I. A straightforward way to train our framework is jointly using the , and to train both and from scratch. The optimization objective can be represented as:
(4) 
where and denotes the loss weight of the and respectively, and denotes the learnable parameters.
Observation I. This training approach generates artifacts (see Fig. 3, first column) and the loss is too difficult to converge. The reasons may come from: (1) In the early training stage, the hallucinated faces are quite different from HR faces, so the is too difficult to be optimized from scratch. (2) The objective of
(i.e. minimizing the intraclass variance) is different from the objective of
and loss (minimizing the pairwise distance), which is disadvantageous to and learning. So, we cannot use the in learning and also cannot use the in learning.Baseline training approach II. To solve above problems, one possible training approach used in perceptual loss [12] can be used. In particular, we train a using HR faces and then jointly use the and the to train the . The joint objective of and can be represented as:
(5) 
Observation II. We have two observations while using this training approach: (1) The is difficult to converge. (2) The visual results are noisy (see Fig. 3, second column). To investigate these challenges, we first visualized the learned identity features (after Euclidean normalization, as shown in Fig. 4) and found that there exists a large margin between the hallucination domain and the HR domain. We formulate this challenge as domain divergence problem. It specifies the failure of the , trained by HR faces, to project faces from hallucination domains to a measurable hypersphere identity metric space. In other words, this face recognition model cannot extract effective identity representation for hallucinated faces. This makes the very difficult to converge and easily get stuck in local minima (i.e. occur many noises in hallucination results).
Baseline training approach III. To overcome the domain divergence challenge, a straightforward alternately training strategy can be used. In particular, we first trained a only using the . Then we trained a using hallucinated faces and HR faces. Finally, we finetune the jointly using the and the following baseline training approach II.
Observation III. Although this alternately training strategy seems able to overcome the domain divergence problem, it still produces artifacts (as shown in Fig. 3, third column). The reason is that the hallucination domain keeps changing when the is being updated. If the hallucination domain has changed, the face recognition model cannot extract effective and measurable identity representation of hallucinated faces anymore.
In short, above observations can be concluded into a dynamic domain divergence problem as following: a large margin exists between the hallucination domain and HR domain and the hallucination domain keeps changing if the hallucination model keeps learning.
3.6 DomainIntegrated Training Algorithm
To overcome the dynamic domain divergence problem, we propose a new training procedure. From above the above observations, we see that alternately training strategy (Baseline Training Approach III) can alleviate the dynamic domain divergence problem. We further propose to do this alternately training in each iteration.
More specifically, we first train a using HR facial images and a using the . Then, we propose to use domainintegrated training approach (Algorithm 1) to finetune and alternately in each iteration.
In particular, in each iteration, we first update the using the recognition loss, which allows the to perform accurate identity representation in this minibatch of faces from different domains. Then, we jointly use the and the to update the . This training approach can encourage the to construct a robust mapping from faces to the measurable hypersphere identity metric space in each iteration for optimization whatever the is changing. The alternative optimization process is conducted until converged. Some hallucination examples are shown in Fig. 3, fourth column, where we can observe a much better visual result with this training approach.
3.7 Comparison to Adversarial Training
DomainIntegrated (DI) training and adversarial training [7] can be related to their alternative learning strategy. But they are quite different in several aspects as follows:
(1) Generally speaking, DI training is essentially a cooperative process in which collaborates with to minimize the identity difference. The learning objective is the same in each subiteration. However, in adversarial training, generator and discriminator compete against each other to improve the performance. The learning objective is alternatively challenging during two models learning.
(2) The loss functions and optimization style are different. In DI training, we minimize in constructing a marginal identity metric space and then minimize for reducing pairwise identity difference. Differently, in adversarial training, the classification loss is minimized for discriminator learning and maximized for generator learning.
4 Experiments
In this section, we will first describe the training and testing details. Then we perform an ablation study to evaluate the effectiveness of the proposed SuperIdentity loss and DomainIntegrated training. Further, we evaluate our proposed method with other stateoftheart methods. After that, we evaluate our method on the higher input size. At the last, we evaluate the benefit of our method for lowresolution face recognition.
4.1 Training Details
Training data. For a fair comparison with other stateoftheart methods, we do face alignment in facial images. In particular, we use similarity transformation based on five landmarks detected by MTCNN [41]. We have removed the images and identities overlap between training and testing.
For face recognition training, we use webcollected facial images including CASIAWebFace [38], CACD2000 [4], CelebA [22], VGG Faces [24] as Set A. It roughly goes to 1.5M images of 17,680 unique persons.
For face hallucination training, we select 1.1M HR facial images (larger than 96112 pixels) from the same 1.5M images as Set B.
Training details. For recognition model training, we use Set A with the batch size of 512 and (angular margin constrain in Eq. 2) of 4. The learning rate is started from 0.1 and divided by 10 at the 20K, 30K iterations. The training process is finished at 35K iterations.
For hallucination model training, we use Set B with the batch size of 128. The learning rate is started from 0.02 and divided by 10 at the 30K, 60K iterations. A complete training is finished at 80K iterations.
For domainintegrated training, we use Set B with the batch size of 128 for and 256 for . The learning rate is started from 0.01 and divided by 10 at the 6K iterations. A complete training is finished at 9K iterations.
4.2 Testing Details
Testing data. We randomly select 1,000 identities with 10,000 HR facial images (larger than 96112 pixels) from UMDFace [1] dataset as Set C. The dataset is used for face hallucination and identity recovery evaluation.
Evaluation protocols. In this section, we perform three kinds of evaluations: (1) Visual quality. (2) Identity recovery. (3) Identity recognizability. For visual quality evaluation, we report several visual examples results on Set C.
For identity recovery, we evaluate the performance of recovering identity information while superresolving faces. In particular, we use the trained by Set A as identity features extractor. And the identity features are taken from the output of the first fully connected layer. Then we compute the identity similarity (i.e. cosine similarity) between the hallucinated face and its corresponding HR faces on Set C. The average similarities over the testing set are reported.
For identity recognizability, we evaluate the recognizability of hallucinated faces. In particular, we first downsample Set A to 1214 pixels as Set A  LR. Then we use different methods to superresolve Set A  LR to 96112 pixels as different Set A  SR. At last, we use the Set A  SR to train different and evaluate them on LFW [11] and YTF [36].
4.3 Ablation Experiment
Loss weight. The hyper parameter (see Algorithm 1) dominates the identity recovery. To verify the effectiveness of the proposed SuperIdentity loss, we vary from 0 (i.e. only use superresolution loss) to 32 to learn different models. From Tab. 2 and Fig. 5, we observe that larger make the facial images sharper with more details and brings the better performance of identity recovery and recognizability. But too large also makes the texture look slightly unnatural. And, since the performances of identity recovery and identity recognizability are stable when is larger than 8, we fix to 8 in other experiments.
0  2  4  8  16  32  

Identity Similarity  0.4418  0.5134  0.5639  0.5978  0.6041  0.6101 
LFW Accuracy  97.61%  97.88%  98.05%  98.25%  98.23%  98.16% 
YTF Accurarcy  93.20%  93.48%  93.56%  93.82%  93.84%  93.76% 
Training approach. We evaluate different training approaches introduced in Sec. 3.5 and Sec. 3.6. Some visual results are shown in Fig. 3. We can see that DomainIntegrated training achieves the best visual results. Besides, from Tab. 3, DomainIntegrated training also achieves the best performance of identity recovery and identity recognizability.
Training Approach  I  II  III  DomainIntegrated Training 

Identity Similarity  0.3875  0.4829  0.5132  0.5978 
LFW Accuracy  97.16%  97.46%  97.58%  98.25% 
YTF Accurarcy  92.98%  93.32%  93.34%  93.84%% 
4.4 Evaluation on Face Hallucination
We compare SICNN with other stateoftheart methods and bicubic interpolation on
Set C for face hallucination. In particular, we follow EnhanceNet [26] training another URDGN, called URDGN*, with additional perceptual loss computed in end of the second and the last ResBlock in . All methods are retrained in same training set  Set B.Some visual examples are shown in Fig. 6. More visual results are included in our supplementary material. We also report the results of average Peak SignaltoNoise Ratio (PSNR) and Structural Similarity (SSIM) in Tab. 4. But as the claim of other works [12, 26, 14], PSNR and SSIM results are useless for sematic superresolution evaluation while visual quality and recognizability are more valuable.
From the visual results, it is clear that our method achieves the best results over other methods. We analyze the results as follows:
(1) For Ma et al.’s method, exemplar patches based, the results are oversmooth and suffer from obvious blocking for such low lowresolution input with large upsampling scale.
(2) For LapSRN [13], since it is based on L2 pixelwise loss, it makes the hallucinated faces oversmooth.
(3) For URDGN [39], it jointly uses pixelwise Euclidean loss and adversarial loss to generate a realistic facial image closest to the average of all potential images. Thus, though the generated facial images look realistic, they are quite different from the original HR images.
(4) For URDGN*, it uses an additional loss  perceptual loss computed in our as the pairwise semantic loss for identity recovery. Though this pixelswise loss + adversarial loss + perceptual loss is the stateoftheart superresolution training approach (i.e. EnhancementNet[26]). It still achieves inferior results than ours.
Method  Bicubic  Ma et al.  LapSRN  URDGN  URDGN*  SICNN 

PSNR (db)  23.1323  23.8606  26.1451  24.1857  25.2859  26.8945 
SSIM  0.6093  0.6571  0.7417  0.6764  0.7224  0.7689 
Method  Bicubic  Ma et al.  LapSRN  URDGN  URDGN*  SICNN 

Identity Similarity  0.2913  0.3823  0.4361  0.3682  0.5267  0.5978 
LFW Acc.  97.51%  97.58%  97.46%  97.20%  98.01%  98.25% 
YTF Acc.  93.08%  93.26%  93.10%  92.78%  93.54%  93.82% 
4.5 Evaluation on Higher Input Resolution
For more comprehensive analysis, in this section, we trained our model for 2428 inputs with 4 upscaling factor. Specifically, we modify the hallucination network (i.e., ) by removing the first DB, DeConv and Conv layers. As shown in Fig. 7, our method performs very well visual quality in higher resolution inputs with 4x upscaling factor.
For identity recovery and identity recognizability evaluation, our method also achieves very good results: Average identity similarity: 0.8868, LFW accuracy: 99.21%, YTF accuracy: 94.86%, which are very close to the performance on HR faces.
4.6 Evaluation on Identity Recovery
We perform an evaluation on identity recovery with other stateoftheart methods. All models for evaluation are the same as last experiment (i.e. Sec. 4.4).
From the Tab. 5, we observe that our method achieves the best performance. Besides, we also observe that URDGN, trained by pixelswise loss and adversarial loss, even shows inferior performance than LapSRN though with sharper visual results (See Sec. 4.4). It means that URDGN will lose some identity information while superresolving a face because the adversarial loss is not a pairwise loss. And if add perceptual loss (i.e. URDGN*), pairwise semantic loss, the results can be improved, but still inferior to our method.
4.7 Evaluation on Identity Recognizability
Follow last two experiments (i.e. Sec. 4.4, 4.6)., we further perform an evaluation on identity recognizability with other stateoftheart methods.
From the Tab. 5, we observe that our method achieves the best performance. We also obtain similar observations as last experiment. Besides, we also observe that though several methods (LapSRN. Ma et al., and URDGN) obtain better visual results than Bicubic interpolation, the identity recognizability of superresolved face is similar or even inferior. It means that these methods cannot generate discriminative faces with better identity recognizability.
4.8 Evaluation on LowResolution Face Recognition
To evaluate the benefit of our method for lowresolution face recognition, we compare our method () with other stateoftheart recognition methods on LFW [11] and YTF [36] benchmark.
From the results in Table 6, we find that these methods’ input sizes are relatively large (area size from 15.3 to 298 compared with our method). Moreover, using our face hallucination method, the recognition model can still achieve reasonable results in such ultralow resolution. We also tried using unaligned faces in training and testing and our proposed method still can achieve similar improvement of performance.
Method 
Ours  Human  [29]  [28]  [27]  [24]  [35]  [20]  

Input Size  1214  96112  Original  152152  4755  224224  224224  96112  96112 
LFW Acc.  98.25%  99.48%  97.53%  97.35%  98.70%  99.63%  98.95%  99.28%  99.42% 
YTF Acc.  93.82%  95.38%    91.4%  93.2%  95.1%  97.3%  94.9%  95.0% 
5 Conclusion
In this paper, we present SuperIdentity CNN (SICNN) to enhance the identity information during super resolving face images of size 1214 pixels with an 8 upscaling factor. Specifically, SICNN aims to minimize the identity difference between the hallucinated face and its corresponding HR face. In addition, we propose a domainintegrated training approach to overcome the dynamic domain divergence problem when training SICNN. Extensive experiments demonstrate that SICNN not only achieves superior hallucination results but also significantly improves the performance of lowresolution face recognition.
6 Acknowledgement
This work was supported in part by MediaTek Inc and the Ministry of Science and Technology, Taiwan, under Grant MOST 1072634F002 007. We also benefit from the grants from NVIDIA and the NVIDIA DGX1 AI Supercomputer.
References
 [1] Bansal, A., Nanduri, A., Castillo, C., Ranjan, R., Chellappa, R.: Umdfaces: An annotated face dataset for training deep networks. arXiv:1611.01484 (2016)
 [2] Bruna, J., Sprechmann, P., LeCun, Y.: Superresolution with deep convolutional sufficient statistics. ICLR (2016)
 [3] Chang, L., Tsao, D.Y.: The code for facial identity in the primate brain. Cell 169(6), 1013–1028 (2017)
 [4] Chen, B.C., Chen, C.S., Hsu, W.H.: Face recognition and retrieval using crossage reference coding with crossage celebrity dataset. TMM 17(6), 804–815 (2015)
 [5] Chunjie, L., Qiang, Y., et al.: Cosine normalization: Using cosine similarity instead of dot product in neural networks. arXiv (2017)
 [6] Dong, C., Loy, C.C., Tang, X.: Accelerating the superresolution convolutional neural network. In: ECCV. pp. 391–407 (2016)
 [7] Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. pp. 2672–2680 (2014)

[8]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In: ICCV. pp. 1026–1034 (2015)
 [9] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016)
 [10] Huang, G., Liu, Z., Weinberge, r.K., Maaten, L.v.d.: Densely connected convolutional networks (2017)
 [11] Huang, G.B., LearnedMiller, E.: Labeled faces in the wild: Updates and new reporting procedures. Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Tech. Rep pp. 14–003 (2014)
 [12] Johnson, J., Alahi, A., FeiFei, L.: Perceptual losses for realtime style transfer and superresolution. In: ECCV. pp. 694–711 (2016)
 [13] Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate superresolution. CVPR (2017)
 [14] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photorealistic single image superresolution using a generative adversarial network (2017)
 [15] Li, Y., Cai, C., Qiu, G., Lam, K.M.: Face hallucination based on sparse localpixel structure. PR 47(3), 1261–1270 (2014)
 [16] Liu, C., Shum, H.Y., Freeman, W.T.: Face hallucination: Theory and practice. IJCV 75(1), 115–134 (2007)
 [17] Liu, W., Lin, D., Tang, X.: Hallucinating faces: Tensorpatch superresolution and coupled residue compensation. In: CVPR. vol. 2, pp. 478–484. IEEE (2005)
 [18] Liu, W., Lin, D., Tang, X.: Neighbor combination and transformation for hallucinating faces. In: ICME. pp. 4–pp. IEEE (2005)

[19]
Liu, W., Tang, X., Liu, J.: Bayesian tensor inference for sketchbased facial photo hallucination. In: IJCAI. pp. 2141–2146 (2007)
 [20] Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: Deep hypersphere embedding for face recognition (2017)
 [21] Liu, W., Zhang, Y.M., Li, X., Yu, Z., Dai, B., Zhao, T., Song, L.: Deep hyperspherical learning. In: NIPS. pp. 3953–3963 (2017)
 [22] Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV. pp. 3730–3738 (2015)
 [23] Ma, X., Zhang, J., Qi, C.: Hallucinating face by positionpatch. PR 43(6), 2224–2236 (2010)
 [24] Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC. vol. 1, p. 6 (2015)
 [25] Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. ICLR (2016)
 [26] Sajjadi, M.S., Scholkopf, B., Hirsch, M.: Enhancenet: Single image superresolution through automated texture synthesis. In: ICCV. pp. 4491–4500 (2017)
 [27] Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR. pp. 815–823 (2015)
 [28] Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR. pp. 2892–2900 (2015)
 [29] Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to humanlevel performance in face verification. In: CVPR. pp. 1701–1708 (2014)
 [30] Tan, L., Zhang, K., Wang, K., Zeng, X., Peng, X., Qiao, Y.: Group emotion recognition with individual facial emotion cnns and global image based cnns. In: ICMI. pp. 549–552. ACM (2017)
 [31] Tappen, M.F., Liu, C.: A bayesian approach to alignmentbased image hallucination. In: ECCV. pp. 236–249 (2012)
 [32] Van Der Maaten, L.: Accelerating tsne using treebased algorithms. JMLR 15(1), 3221–3245 (2014)
 [33] Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., Liu, W.: Cosface: Large margin cosine loss for deep face recognition. CVPR (2018)
 [34] Wang, K., Zeng, X., Yang, J., Meng, D., Zhang, K., Peng, X., Qiao, Y.: Cascade attention networks for group emotion recognition with face, body and image cues. In: ICMI. pp. 640–645. ACM (2018)
 [35] Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: ECCV. pp. 499–515 (2016)
 [36] Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: CVPR. pp. 529–534 (2011)
 [37] Yang, C.Y., Liu, S., Yang, M.H.: Structured face hallucination. In: ECVP. pp. 1099–1106 (2013)
 [38] Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv:1411.7923 (2014)
 [39] Yu, X., Porikli, F.: Ultraresolving face images by discriminative generative networks. In: ECCV. pp. 318–333 (2016)
 [40] Zhang, K., Tan, L., Li, Z., Qiao, Y.: Gender and smile classification using deep convolutional neural networks. In: CVPR Workshops. pp. 34–38 (2016)
 [41] Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. SPL 23(10), 1499–1503 (2016)
 [42] Zhang, K., Zhang, Z., Wang, H., Li, Z., Qiao, Y., Liu, W.: Detecting faces using inside cascaded contextual cnn. In: ICCV. pp. 3171–3179 (2017)
 [43] Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Learning face hallucination in the wild. In: AAAI. pp. 3871–3877 (2015)
 [44] Zhu, S., Liu, S., Loy, C.C., Tang, X.: Deep cascaded binetwork for face hallucination. In: ECCV. pp. 614–630 (2016)
Comments
There are no comments yet.