Automatic face understanding is critical for problems in human perception (e.g., super-resolution (SR) , visual understanding , and style transfer ) and applied machine vision (e.g., landmark localization , identity recognition 
, and face detection). Modern-day models for face-based tasks tend to breakdown when applied to images of low-resolution (LR). In practice, face-based systems are frequently confronted with such scenarios (e.g., LR cameras used for surveillance ). Recent studies revealed that a decrease in resolution (i.e., ) yields an increase in error for models used for facial landmark localization (Bulat et al. 2018). To address this problem, face SR, also known as face hallucination, aims to generate high-resolution (HR) faces from LR imagery . The recovered faces then provide more detailed information (e.g., sharper edges, clearer shapes, and finer skin details), and are often used for improved analysis and perception. However, most existing methods (e.g., Superfan ) rely heavily on the quality of recovered images. Since SR methods usually suffer from blurriness, using SR images for face-related tasks can hinder the final prediction or conclusion.
On the other hand, facial prior knowledge can be used to recover SR faces of higher quality [1, 16]. In problems of single image super-resolution (SISR), face SR utilizes prior knowledge to improve the accuracy of the inferred images and, thus, to yield results of higher quality. For example, one can leverage low-level information (i.e., smoothness in color), facial heatmaps, and face parsing maps to provide additional mid-level information (i.e., face structure) to recover sharper edges and shapes . Also, high-level information can be extracted with identity labels and other face attributes (e.g., gender, age, and pose), and then leveraged to reduce the ambiguity of the hallucinated faces [33, 14]. Hence, additional face information is beneficial for SR, and especially for tiny faces (e.g., ).
Previous work in face SR either super-resolved LR images using prior information (e.g., FSRNet ) or directly localized the landmarks on the super-resolved images (e.g., SuperFAN ). Figure 2 compares these frameworks with the proposed method. Specifically, SuperFAN only uses SR to help localize the landmarks of tiny faces, but not vice-versa. Besides, our model does not process the recovered SR output that suffers from blurriness, as we dedicate an encoding module to maximize the amount of information captured from LR faces. As for FSRNet, landmarks are only used as facial prior knowledge to super-resolve faces, which suffers from the same problem of detecting landmarks on a coarse, recovered SR image. Furthermore, SuperFAN and FSRNet address the two tasks separately, leading to redundant feature maps. Since both face SR and landmark localization tasks could benefit from one another, we aim to extract the maximum amount of information from LR faces by addressing the two tasks simultaneously. Thus, we propose a multi-task framework that allows these tasks to benefit from one another, which improves the performance in both tasks (see Figure1).
The main contribution of this paper are as follows:
In this paper, we propose a network that does SR and landmark detection on tiny faces jointly– a network we dubbed JASRNet111The code is available at: https://github.com/YuYin1/JASRNet.. To the best of our knowledge, we are the first to train a multi-task model that jointly learns landmark localization and SR. Specifically, and unlike existing two-step approaches, we leverage the complementary information of the two tasks. This allows for more accurate landmark predictions to be made in LR space and improved reconstruction from LR-to-HR.
Novel deep feature extraction and fusion modules are used to maximize the amount of information captured from the LR faces, which is done at intermediate layers of the encoder to exploit the deep hierarchical machinery.
We show large improvements for both SR and landmark localization for tiny faces (i.e., ). Besides, our JASRNet yields results for landmark localization on LR faces (i.e., ) that are comparable to existing methods evaluated on the corresponding HR faces (i.e., ). Furthermore, the proposed method recovers HR faces with sharper edges and shapes compared with state-of-the-art methods for SR.
Typical SISR methods do not benefit from facial prior information and can be utilized to super-resolve images of arbitrary type. By introducing face-specific information, Yu [34, 35] proposed a GAN-based model to recover HR images from tiny faces of size . Chen  used a separate branch to estimate facial landmark heatmaps and parsing maps, which were then used as face-specific information to super-resolve tiny face images. FaceAttr  validated that knowledge of facial attributes can also significantly reduce the ambiguity in face SR. It is worth noting that our method not only utilizes facial prior information to super-resolve tiny face with better quality, but also achieves state-of-the-art performance on landmark alignment by benefiting from SR.
Modern-day approaches for face alignment have been successful on HR faces [5, 18, 19, 20] . However, most suffer from performance degradation with decreasing image resolution, especially with faces smaller than . The first to address landmark detection on LR faces was SuperFAN , which super-resolved tiny faces, from which the output images were fed to a landmark localization model. Although the error of the landmark localization provides gradients to back-propagate through the SR module, it is, in essence, a 2-step process. We argue that the facial prior information is not fully utilized for SR. To address this problem, we present a novel synergistic multi-task framework that learns facial landmark localization and SR jointly.
Multi-task learning is commonly used to jointly address correlated tasks. HyperFace 
proposed a multi-task learning framework for face detection, face alignment, gender recognition, and pose estimation. The joint learning tasks were based on regression or classification (i.e., a special case of regression). Hence, similar architectures were adopted for all tasks. In our case, however, face SR and alignment are based on generation and regression, respectively. Thus, one of the main differences in architectures of the proposed from HyperFace is that we include specific modules for each task, while HyperFace used only fully connected layers after feature fusion.
Super-resolution (SR) and landmark localization of tiny faces are highly correlated tasks. Both of them can benefit from each other. While previous work either uses SR to help align tiny faces or vice-versa, but not both. We argue that the amount of information extracted from LR image is not maximized when only one task is used to help the other. Hence, we propose a deep joint alignment and super-resolution network (JASRNet) to model super-resolution and localize landmarks for tiny faces simultaneously, with information from both tasks boosting the performance of the other. As shown in Figure 3, the proposed JASRNet consists of four parts: (1) a shared shallow encoder module is used for extracting shallow and shared features for both tasks; (2) a deep feature extraction and fusion, which is used for obtaining better feature representations; (3-4) task-specific modules for super-resolution and face alignment, respectively.
Let be training samples. The original LR faces are passed in to the shared encoder, which then feeds into the feature extraction module to extract features for both tasks. To exploit the representative power of different grains, the intermediate features of the shared encoder branch out to fuse with the output of deep feature extraction module. This feature fusion forms a more efficient feature representation, as later demonstrated as part of the ablation study. Carrying on, the fused features are fed to both task-specific modules. Thus, the super-resolved images
and the probability maps of the landmark estimationsare produced simultaneously.
Usually, there are sharper edges or sudden changes around the contour of facial component. For face alignment, the SR module recovers the image with better resolution, which, hence, helps the model detect more accurate landmarks. In parallel, the alignment module locates the edges and structure of the face, forcing more attention to the high-frequency content (i.e
., edges). Since both tasks, face SR and landmark localization, are suited to benefit from one another, the aim of this work is to exploit the amount of maximum information that can be extracted from the LR faces. This is done by combining the loss function of each task. For the SR task, theloss is minimized, as it can provide better convergence than [15, 37]. For the alignment task, a heatmap loss is used, like in . Together, the loss function of JASRNet can be expressed as
where denotes the total loss, and denote the loss for super-resolution and the heatmap loss for alignment, respectively. The weight of is , and the estimated heatmap of the image is . As mentioned above, is the super-resolved image recovered from .
Shared feature extraction and fusion
Shallow encoder. Previous work in face SR and alignment usually addressed these two tasks separately, leading to redundant feature maps. To efficiently extract features from LR images, a shared encoder is designed to extract shallow features that capture complementary information of the two tasks. It consists of a convolutional layer, a residual block , and then three transformations made-up of the maxpooling operation and residual blocks (Figure 3). Intermediate layers of the encoder are later fused for richer features in geometry and semantics.
All the convolution layers of JASRNet use kernels of size
, and each is followed by a ReLU layer. The number of channels are all set as 128, except for the last convolutional layers in both reconstruction and alignment module, which are set as 3 and the number of landmarks (namely 68 for 300W), respectively. There are three maxpooling layers in the network, each downsample the feature maps, which, in total, reduce the size of the feature maps by a factor of 8. The structure of the residual blocks is the same as in the original residual nets (ResNets) 
, except we omitted the batch normalization (BN) layers, as it reduces the variation of feature ranges: ResNets used for SISR (EDSR) performed best with all BN layers removed. Also, we found that BN layers slow down the speed to convergence of the network, while reducing its overall performance, which was especially true in the SR task. Since we aim to reserve the most information possible when passing through the shared encoder module (i.e., during feature extraction), we follow EDSR  and remove all BN from the residual blocks.
Deep feature extraction and fusion
. Deeper networks have shown to have a better performance in many computer vision tasks including SR[3, 4, 7, 15, 26]. Increased depth was also a tactic used in this work. Shallow features extracted from the shared encoder are passed to the deep feature extraction module consisting of residual blocks, with in the reported experiments. A deeper network not only recovers sharper edges and shapes for super-resolved face images, but it also achieves a higher accuracy for landmark localization.
Inspired by Hyperface , we fused intermediate layers to exploit the representative power of features at different levels of the hierarchical model. Considering the similarity of features from adjacent layers, not all features of the shared encoder are fused to compose the new feature representation. Since each of the maxpooling layers downsample the feature map by a factor of 2, the output of the layers that precede each maxpooling layer branches out using skip connections, and are later fused to form richer features with geometry information. To match sizes of the feature maps, a
convolutional layer with stride 2 is applied to downsample fusing features by a factor of 2 for each maxpooling layer that is applied in parallel to the skip connection.
The outputs before the maxpooling are denoted as ; the output of the last residual block in feature extraction module is (see Figure 3). Provided LR images as input, we have
where transform the signal during feature extraction. Hence, is the mapping of the first convolution layers and residual blocks, and are the mappings of the first and second steps combining maxpooling and residual blocks, respectively, and is the mapping for the remaining residual blocks making up the feature extraction module.
Mathematically speaking, the fused features that is output can be founded as
where the convolution operation fuses intermediate features.
Super-resolution reconstruction. The super-resolution reconstruction module reconstructs the HR image from shared features of size . First, shared feature maps are fed to two residual blocks to extract task-specific features. Next, 3 conv-layers, each of which are followed by pixel shuffle layers , upscale the feature maps in size (i.e., to ). Finally, a convolutional layer made-up of filters to map from HR RGB image space.
Inspired by EDSR  and RDN , the first and last residual blocks of the shared encoder and SR reconstruction module are linked by a large skip connection. This recovers HR images with finer details (i.e., sharper edges and shapes). The skip connection directly provides low frequency information to the super-resolved images. Hence, it forces the network to focus on learning the high frequency information, opposed to low frequency information already provided. Since the output size of the first convolution layer is , and the feature map size of the last residual block in reconstruction module is , we downsample the feature map with 3 convolution and 3 maxpooling layers (see Figure 3).
Unlike SuperFAN, where the long skip connection is reported to have minimal impact on overall performance, our model largely benefits from the skip connection. This is because the features extracted includes high frequency information and, thus, is more efficient for recovering sharp and accurate edges. Furthermore, since super-resolution and face alignment share the deep features, a byproduct of this long skip connection also is boosted performance for the landmark localization task as well.
Face alignment. Like the SR reconstruction module, the shared features are fed through consecutive residual blocks to extract features specific to face alignment. Inspired by the successes of convolutional pose machines (CPM)  on face alignment, we also utilize the sequential framework made-up of residual blocks for estimating locations of landmarks. In the first stage, two residual blocks predict coarse heatmaps . Then, in the second stage, the heatmaps predicted in the first stage are first concatenated with the feature maps , which are then fed to the second prediction module composed of three sequential residual blocks that predict heatmaps . The third stage then concatenates feature maps and to produce final estimation expressed as
where maps the prediction modules, with . Note that the size of the feature maps is constant throughout the face alignment module (i.e., ). During training, heatmap regression loss was used to localize landmarks, opposed to directly predicting pixel coordinates . Thus, argmax is used to determine from the predicted heatmaps in final stage (i.e., ). Specifically, the maximum value of each of the heatmaps is found as the predicted landmarks (i.e., ).
We now review the experimental settings and results. Specifically, the datasets, implementation details, and metrics are first described. Then, we show results comparing with the state-of-the-art methods for the face SR and alignment task separately. Besides, we highlight the benefits of the proposed feature fusion and joint training. Finally, we conduct an ablation study as a deep-dive revealing the contributions of the components introduced in this work.
Datasets. We evaluated the proposed approach on several datasets, which are listed as follows:
Implementation details. We first cropped facial images about the head region, which were then resized to 128128. These were designated as the HR images. Then, LR images were generated by applying bicubic downsampling (8) to the HR images, yielding a resolution of 1616. Then, the input LR images were reversed to match the size of the HR faces: each were up-scaled 8
using bicubic interpolation resulting in images of size 128128. The training images were augmented using random scaling, rotation, and horizontally flipping. Specifically, these augmentation transformations were used to make fifteen copies. Optimization was done with ADAM with a learning rate of that dropped 0.5 at and epochs. The model was trained with a batch size of 8 and for a total epoch of 40 epochs. Implementation was done using PyTorch. Training took about 7 hours on Helen with a Nvidia TITAN-XP GPU.
Evaluation metrics. The metric used to evaluate landmark localization was NMSE (i.e., the normalized euclidean distances between ground-truth and predicted landmarks). Following [2, 5, 24], the normalization factor is set as inter-ocular distance for 300W and the area of the ground-truth bounding box for AFLW dataset.
For SR, we evaluated using the peak signal to noise ratio (PSNR) and structural similarity index (SSIM): PSNR is computed as the mean squared error (MSE) between the SR and HR images, while SSIM accounts for the noise and edges (i.e., the high-frequency content) of an image. In our experiments, we converted the RGB images to the YCbCr color space and only calculated the PSNR for the Y-channel. To focus on the face region, while ignoring the background, only the face region within the bounding box was measured when evaluating the SR images.
Comparison with state-of-the-art methods
Comparisons were made with state-of-the-art methods in both SR and face alignment. It is important to note that most existing methods only do a single task, while the proposed model does both. Furthermore, our model performs the best in both tasks. The methods that do both tasks, SuperFAN  and FSRNet , were used to compare both tasks simultaneously.
Face super-resolution results. We compared with methods used for SISR (i.e., VDSR , SRRes , and EDSR ), as well as methods for face SR (i.e., URDGN , TDAE , SuperFAN  and FSRNet ). For a fair comparison, we retrained aforementioned models with the same training and testing data used in the respective experiment. Qualitative comparisons clearly show that the proposed JASRNet recovers HR images with relatively more details (i.e., sharper edges, more accurate facial component shapes and textures), while other methods tend to produce face images with more blur and inaccuracies (see Figure 4). Quantitative results for face SR are shown in Table 1. The proposed model achieved the highest PSNR and SSIM on 300W and HELEN dataset. Since some methods only support an upscaling factor of 4, we added an additional upscaling module () to get the equivalent factor of 8. For this, we incorporated the commonly used pixel shuffle followed by a convolutional layer .
Face alignment results. We present face alignment results for 300W and AFLW dataset with LR image size of and separately. The results are summarized in Table 2. First, we compare the results of LR images (see bottom part of Table 2). Since only a few works address the tiny face (i.e., ) alignment problem, we only compare the performance of proposed models with SuperFAN, FSRNet, and another state-of-the-art method CPM+SBR . Noticed that CPM+SBR is applied on super-resolved images using bicubic interpolation. Compared with other state-of-art methods, we show a large improvement for landmark localization on tiny faces.
Furthermore, we present results of JASRNet on faces with a resolution of (see Table 2 (top)). Note that existing methods detect landmarks on HR (i.e., ) images. Still, the proposed framework is comparable for landmark localization on LR images with the others on HR.
Comparison on both tasks. To the best of our knowledge, FSRNet  and SuperFAN  were the only attempts that reported results on both tasks (i.e., SR and face alignment). Thus, we compared results of both tasks with these two methods. Since one of the primary tasks for “enhancing” faces is to improve facial recognition capabilities, we also measured face verification performance on the super-resolved images. Additionally, the number of parameters used in each model is listed in Table 3. In this section, models were trained on the 300W training set, and tested on the 300W test set and the entire LFW dataset. The SR and alignment results for 300W test set are shown in Table 1 and 2, respectively. As for LFW dataset, the results for SR and facial recognition are listed in Table 3. Performance was measured using verification accuracy (ACC), PSNR, and SSIM. We did not include LFW in the test for landmark localization since it does not support the 68 landmarks used as prior knowledge in all three methods. Thus, we show that our JASRNet significantly outperforms SuperFAN and FSRNet in face SR and landmark localization (see Table 1, 2, and 3). Qualitatively, the proposed method also produces more accurate landmark estimations for alignment task and much more detailed appearances and texture for SR task than the other two methods (see Figure 1, 4). Note that our model also have less parameters than SuperFAN and FSRNet (see Table 3).
We next measured the contributions of feature fusion, joint training, and the long skip connection. Table 4 lists the four additional variants used. Baseline (BL) only consisted of an encoder, a feature extraction module, and either a SR or alignment module. In other words, the BL omitted the feature fusion at the intermediate layers, removed the long skip connection, and was only able to handle a single task per pass (i.e., either SR or face alignment, but not both). BL_F is BL with feature fusion. Joint training (JT) net was conducted by aggregating both task-specific modules to the baseline, and JT with feature fusion is JT_F. Finally, JT_F with long skip connection forms the proposed JASRNet. The training set used in this section is 300W. Note that our baseline model has even better performance while less parameters than SuperFAN . Reasons are three-fold: 1) batch normalization omitted in layers of residual blocks to speed up training and boost performance; 2) Pixel shuffle layers  used in reconstruction module instead of deconvolutional, which is used in SuperFAN; 3) Two independent modules are used in SuperFAN, i.e., SR and face alignment are handled separately. This yields redundant feature maps and, hence, degrades performance.
Effects of the feature fusion. Fusing the features at the intermediate layers yields richer, and more efficient feature representations for SR, with BL_F and JT_F outperforming BL and JT, respectively, in SR (see Table 4). However, feature fusion has less impact on face alignment. This is because SR uses both low and high frequency information to recover HR from LR images, while landmark localization is mostly dependent on the high frequency content.
Effects of joint-task mechanism. To highlight the importance of training the two tasks jointly, we compared JT to BL and JT_F to BL_F (see Table 4). Results for both tasks (i.e., SR and face alignment) show that joint-task variants (i.e., JT and JT_F) significantly outperform BL and BL_F, respectfully. This validates that the joint training, in itself, contributes to the state-of-the-art performance of JASRNet.
Effects of long skip connection. The impact of the long skip connection is evident by the results: JASRNet, which is JT_F with the added skip connection, outperforms all others in both SR and landmark localization. The impact for SR stems from the skip connection forcing the network to encode sharper and more precise edges in the feature representation, as expected. However, the boosted accuracy for face alignment was less expected, yet supporting of the narrative: we believe the shared features for SR and face alignment yield additional information that complements both tasks.
Baseline variations. We also show the variations of the vanilla baseline for insights on the effects of different fusion methods (i.e., concatenation vs element-wise addition), the number of residual blocks in the feature extraction module (i.e., 16 vs 32), and the number of stages in the face alignment module (i.e., 1 vs 2). Table 5 lists results for different settings. Clearly, element-wise addition is better for the feature fusion module in our model. Also, more residual blocks and stages improves the performance. Thus, the deeper structure and, thus, the higher capacity captures more information for the SR and face alignment tasks: as the network grows so does its potential to learn.
We proposed a JASRNet to exploit the maximum amount of information from tiny face images when simultaneously addressing alignment and super-resolution tasks. Extensive experiments demonstrated the proposed significantly outperforms previous state-of-the-art in SR by recovering sharper edges (i.e., finer details) from HR faces. We also show large improvements for landmark localization of tiny faces (i.e., ). Furthermore, the proposed framework yields comparable results for landmark localization on faces of lower-resolution (i.e., ) to existing methods on higher-resolution (i.e., ).
-  (2000) Hallucinating faces. IEEE. Cited by: Introduction.
-  (2017) How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In ICCV, Cited by: Face alignment, Experimental settings.
-  (2018) Super-fan: integrated facial landmark localization and sr of real-world low resolution faces in arbitrary poses with gans. In CVPR, Cited by: Introduction, Introduction, Figure 2, Face alignment, Shared feature extraction and fusion, Comparison with state-of-the-art methods, Comparison with state-of-the-art methods, Comparison with state-of-the-art methods, Ablation study, Table 2.
-  (2018) FSRNet: end-to-end learning face sr with facial priors. In CVPR, Cited by: Figure 1, Introduction, Introduction, Figure 2, Face super-resolution, Shared feature extraction and fusion, 3rd item, Comparison with state-of-the-art methods, Comparison with state-of-the-art methods, Comparison with state-of-the-art methods, Table 2, Table 3.
-  (2018) Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In CVPR, Cited by: Face alignment, Method, 2nd item, Experimental settings, Comparison with state-of-the-art methods, Table 2.
-  (2017) Reconstructing perceived faces from brain activations with deep adversarial neural decoding. In NeurIPS, pp. 4246–4257. Cited by: Introduction.
-  (2016) Deep residual learning for image recognition. In CVPR, Cited by: Shared feature extraction and fusion, Shared feature extraction and fusion, Shared feature extraction and fusion.
-  (2007-10) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report Technical Report 07-49, University of Massachusetts, Amherst. Cited by: 4th item.
-  (2016) Accurate image super-resolution using very deep convolutional networks. In CVPR, Cited by: Comparison with state-of-the-art methods.
-  (2011) Annotated facial landmarks in the wild. In ICCVW, Cited by: 2nd item.
-  (2012) Interactive facial feature localization. In ECCV, Cited by: 3rd item.
-  (2014-05) Labeled faces in the wild: updates and new reporting procedures. Technical report Technical Report UM-CS-2014-003, University of Massachusetts, Amherst. Cited by: 4th item.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, Cited by: Comparison with state-of-the-art methods.
Attribute augmented convolutional neural network for face hallucination. In CVPRW, Cited by: Introduction.
-  (2017) Enhanced deep residual networks for single image super-resolution. In CVPRW, Cited by: Shared feature extraction and fusion, Shared feature extraction and fusion, Task-specific modules, Method, Comparison with state-of-the-art methods.
-  (2007) Face hallucination: theory and practice. IJCV. Cited by: Introduction, Introduction.
Unsupervised image-to-image translation networks. In NeurIPS, pp. 700–708. Cited by: Introduction.
-  (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection.. In CVPR, Cited by: Face alignment, 1st item, Table 2.
-  (2019) Face alignment with expression-and pose-based adaptive initialization. IEEE Transactions on Multimedia 21 (4), pp. 943–956. Cited by: Face alignment.
-  (2019) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. TPAMI. Cited by: Face alignment, Multi-task learning, Shared feature extraction and fusion.
-  (2014) Face alignment at 3000 fps via regressing local binary features. In CVPR, Cited by: Table 2.
-  (2019) Laplace landmark localization. arXiv preprint arXiv:1903.11633. Cited by: Introduction.
-  (2016) 300 faces in-the-wild challenge: database and results. Image and vision computing. Cited by: 1st item.
-  (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In ICCVW, Cited by: 1st item, Experimental settings.
-  (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, Cited by: Task-specific modules, Comparison with state-of-the-art methods, Ablation study.
-  (2017) Memnet: a persistent memory network for image restoration. In Proceedings of the IEEE international conference on computer vision, pp. 4539–4547. Cited by: Shared feature extraction and fusion.
-  (2016) Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In CVPR, Cited by: Table 2.
-  (2018) Recurrent convolutional shape regression. TPAMI. Cited by: Table 2.
-  (2004) Image quality assessment: from error visibility to structural similarity. TIP. Cited by: Experimental settings.
-  (2016) Convolutional pose machines. In CVPR, Cited by: Task-specific modules.
-  (2016) Deep convolutional neural network with independent softmax for large scale face recognition. In Proceedings of the 24th ACM international conference on Multimedia, pp. 1063–1067. Cited by: Introduction.
-  (2013) Supervised descent method and its applications to face alignment. In CVPR, Cited by: Table 2.
-  (2018) Super-resolving very low-resolution face images with supplementary attributes. In CVPR, Cited by: Introduction, Face super-resolution.
-  (2016) Ultra-resolving face images by discriminative generative networks. In ECCV, Cited by: Introduction, Face super-resolution, Comparison with state-of-the-art methods.
Hallucinating very low-resolution unaligned and noisy face images by transformative discriminative autoencoders. In CVPR, Cited by: Introduction, Face super-resolution, Comparison with state-of-the-art methods.
-  (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters. Cited by: Introduction.
-  (2018) Residual dense network for image super-resolution. In CVPR, Cited by: Task-specific modules, Method.
-  (2015) Face alignment by coarse-to-fine shape searching. In CVPR, Cited by: 1st item, Table 2.