Heterogeneous Face Frontalization via Domain Agnostic Learning

07/17/2021
by   Xing Di, et al.
3

Recent advances in deep convolutional neural networks (DCNNs) have shown impressive performance improvements on thermal to visible face synthesis and matching problems. However, current DCNN-based synthesis models do not perform well on thermal faces with large pose variations. In order to deal with this problem, heterogeneous face frontalization methods are needed in which a model takes a thermal profile face image and generates a frontal visible face. This is an extremely difficult problem due to the large domain as well as large pose discrepancies between the two modalities. Despite its applications in biometrics and surveillance, this problem is relatively unexplored in the literature. We propose a domain agnostic learning-based generative adversarial network (DAL-GAN) which can synthesize frontal views in the visible domain from thermal faces with pose variations. DAL-GAN consists of a generator with an auxiliary classifier and two discriminators which capture both local and global texture discriminations for better synthesis. A contrastive constraint is enforced in the latent space of the generator with the help of a dual-path training strategy, which improves the feature vector discrimination. Finally, a multi-purpose loss function is utilized to guide the network in synthesizing identity preserving cross-domain frontalization. Extensive experimental results demonstrate that DAL-GAN can generate better quality frontal views compared to the other baseline methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 6

page 7

04/15/2019

Polarimetric Thermal to Visible Face Verification via Self-Attention Guided Synthesis

Polarimetric thermal to visible face verification entails matching two i...
08/08/2017

Generative Adversarial Network-based Synthesis of Visible Faces from Polarimetric Thermal Faces

The large domain discrepancy between faces captured in polarimetric (or ...
02/25/2022

FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment

We present Face Swapping GAN (FSGAN) for face swapping and reenactment. ...
02/17/2020

Dual-Attention GAN for Large-Pose Face Frontalization

Face frontalization provides an effective and efficient way for face dat...
09/10/2019

Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors

The cross-sensor gap is one of the challenges that arise much research i...
09/14/2018

A study on the use of Boundary Equilibrium GAN for Approximate Frontalization of Unconstrained Faces to aid in Surveillance

Face frontalization is the process of synthesizing frontal facing views ...
06/22/2018

KinshipGAN: Synthesizing of Kinship Faces From Family Photos by Regularizing a Deep Face Network

In this paper, we propose a kinship generator network that can synthesiz...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Fig. 1: An overview of the proposed heterogeneous face frontalization method. Frontalization flow aims to reconstruct a visible frontal face from a thermal profile face. Feature-learning flow aims to learn domain-agnostic features that imitate the domain discrepancy between input and output images. To enhance feature discrimination, the latent features are regularized by a contrastive constraint during training.

Face recognition is a challenging problem which has been actively researched over the past few decades. In particular, deep convolutional neural network (DCNN) based methods have shown impressive performance improvements on various visible face recognition datasets. Numerous DCNN-based models have been proposed in the literature which can address key challenges that include expression, illumination and pose variations, image resolution, aging, etc. However, existing methods are specifically designed for recognizing face images that are collected in the visible spectrum. In applications such as night-time surveillance, we are faced with the scenario of identifying a face image acquired in the thermal domain by comparing it with a gallery of face images that are acquired in the visible domain. Existing DCNN-based visible face recognition methods will not perform well when directly applied to the problem of thermal to visible face recognition due to the significant distributional shift between the thermal and visible domains. In order to bridge this gap, various cross-domain face recognition algorithms have been proposed [7, 14, 13, 43, 44, 47, 29, 27, 5, 36, 37]. In particular, synthesis-based methods have gained a lot of traction in recent years [7]. Given a thermal face image, the idea is to synthesize the corresponding face image in the visible domain. Once the visible image is synthesized, any DCNN-based visible face recognition network can be leveraged for identification.

One of the main limitations of the existing synthesis-based models for cross-modal face recognition is that they do not perform well on thermal faces with large pose variations [7]

. Face frontalization is an extensively studied problem in the computer vision and biometrics communities. Various methods have been developed for frontalization

[12, 51, 38, 3, 39, 16, 17, 41, 42, 49, 50, 40, 48]. However, existing face frontalization methods’ performance degrades significantly on thermal faces since they are specifically designed for frontalizing visible face images. In order to deal with this problem, heterogeneous face frontalization methods are needed in which a model takes a thermal profile face image and generates a frontal visible face. This is an extremely difficult problem due to the large domain as well as large pose discrepancies between the two modalities. Despite its applications in biometrics and surveillance, this problem is relatively unexplored in the literature.

We propose a domain agnostic learning-based generative adversarial network (DAL-GAN) [10] with dual-path training architecture which can synthesize frontal views in the visible domain from thermal faces with pose variations. Fig. 1 gives a simplified overview of the proposed heterogeneous frontalization framework. The model is trained on two flows: frontalization flow and feature-learning flow. The frontalization flow aims to synthesize a visible frontal face image from a thermal profile face image. The feature learning flow aims to learn domain-agnostic features that help in reconstructing better visible faces. DAL-GAN consists of a generator with an auxiliary classifier and two discriminators which capture both local and global texture discriminations for better synthesis. A contrastive constraint is enforced in the latent space of the generator with the help of a dual-path training strategy, which improves the feature vector’s discrimination. Finally, a multi-purpose loss function is utilized to guide the network in synthesizing identity-preserving cross-domain frontalization. We conduct extensive experiments to demonstrate that DAL-GAN can generate better quality frontal views compared to the other baseline methods. Fig. 1 shows sample outputs from the proposed network.

In summary, this paper makes the following contributions.

  • [noitemsep]

  • We propose a cross-spectrum face frontalization model via learning domain-agnostic features. To the best of our knowledge, this is one of the first models for cross-spectrum face frontalization.

  • We introduce a dual-path network architecture for learning discriminative and domain-agnostic features. Features are obtained based on both a gradient reversal layer and the contrastive learning strategy.

  • Extensive experiments and ablation studies are conducted to demonstrate the effectiveness of the proposed face frontalization method.

Fig. 2: Illustration of the proposed dual-path architecture. Two weight-shared identical generators are employed in each path. Both the face frontalization flow and domain-agnostic feature-learning flow are implemented during training. For frontalization, a proper combination of multiple losses are utilized which contain the multi-scale pixel loss , identity loss , global and local adversarial losses and , total variation loss as well as contrastive loss . Additionally, another domain classifier network is utilized for learning domain-agnostic features which are optimized by the classification loss with a gradient reversal layer (GRL).

Ii Related Work

Face Frontalization: Recent face frontalization methods utilize either 2D/3D warping [12, 51, 20, 51, 50], stochastic modeling [38, 2, 1] or the generative models [3, 39, 16, 17, 41, 42, 49, 40, 48, 21]. For instance, Hassner et al.[12] proposed a single unmodified 3D surface model for frontalization. Sagonas et al.[38] proposed a model that jointly performs frontalization and landmark localization by solving a low-rank optimization. Rui et al.[17] developed TP-GAN with a two-path architecture for capturing both the global and local contextual information. Tran et al.[39] introduced the DR-GAN model which frontalizes face images via disentangled representations. Hu et al.[16] introduced CAPG-GAN to synthesize frontal face images guided by the target landmarks. Zhao et al.[49] proposed the PIM model based on the domain adaptation strategy for pose invariant face recognition. Cao et al.[3]

proposed the HF-PIM model which includes dense correspondence field estimation and facial texture map recovery. Similarly, Zhang

et al.[48] developed the A3F-CNN model with the appearance flow constrain. Li et al.[26] proposed a frontalization model using a series of discriminators optimized by segmented face images. Yin et al.[42] proposed a self-attention-based generator is to integrate local features with their long-range dependencies for obtaining better frontalized faces. Wei et al.[40] proposed the FFWM model to overcome the illumination issue via flow-based feature warping.

Thermal-to-visible Synthesis: Various approaches have been proposed in the literature for thermal-to-visible face synthesis and matching [7, 14, 13, 43, 44, 47, 29, 27, 5, 36, 37, 6, 34, 18]. For instance, Hu et al.[14] developed a model based on partial least squares for heterogeneous face recognition. Riggan et al.[36] combined feature estimation and image reconstruction steps for matching heterogeneous faces. Zhang et al.[44] introduced the GAN-VFS model for cross-spectrum synthesis. Another TV-GAN model is introduced by Zhang et al.[47] which leverages the identity information via a classification loss. Riggan et al.[37] enhanced the image quality of the synthesized images by leveraging both global and local facial regions. Moreover, some models [45, 5] synthesize high-quality images by leveraging complementary information in multiple modalities. Recently, Mallat et al.[29] proposed a cascaded refinement network (CRN) model to synthesize visible images using a contextual loss. Litvin et al.[27] developed the FusionNet architecture to overcome the issue of model over-fitting on the RGB-T images [30]. Very recently, Di et al.introduced Multi-AP-GAN [7] and AP-GAN [8] models for improving the quality of the synthesized visible faces using facial attributes.

Note that the proposed model is different from the models discussed above. In our approach, a feature normalization-based generative adversarial network (GAN) is utilized to learn frontalization mapping. The generator is also optimized by learning domain-agnostic features through a gradient reversal layer (GRL) [9]. However, we find that simply learning domain-agnostic features is not enough for cross-spectrum frontalization, and learning discriminative features is also important. Therefore, semi-supervised contrastive learning is also implemented via a dual-path training strategy.

Iii Proposed Method

Given a thermal face image with a significant pose variation, our objective is to synthesize the corresponding frontal face image in the visible domain. The generated frontal face image should be photo-realistic and identity-preserving. In order to address this cross-spectrum face frontalization problem, we propose a dual-path network architecture which learns domain-agnostic features via a contrastive learning strategy. In what follows, we present the proposed network in detail.

Iii-a Networks Architecture

Fig. 2 gives an overview of the proposed dual-path architecture. Each path contains a generator and two discriminators . The two generators share weights. Another domain classifier is utilized for learning domain-agnostic features. More details about the network architecture can be found in the supplementary file.

Iii-A1 Generator

Each generator consists of an encoder-decoder structure where the encoder aims to extract domain-agnostic features from the input thermal face while the decoder aims to reconstruct a frontal face image in the visible domain. In this work, we adopt the generator architecture from DA-GAN [42] with the following modifications.

We remove all batch normalization layers in DA-GAN because they were originally introduced to eliminate the covariate shift. However, recent studies have shown that covariate shift often does not exist in GANs

[23, 24, 25]. Instead we use feature vector equalization to prevent the escalation of feature magnitudes. Feature equalization also helps the network converge smoothly during training. Given the original feature vector at pixel location , the equalization is defined as follows

(1)

where is the normalized vector and is the total number of features. We set equal to . In our experiments, we find that feature equalization also allows us to use a higher initial learning rate which in turn helps in accelerating training.

Iii-A2 Discriminator

A single discriminator may not be able to capture both global and local facial textures [17, 49]. Therefore, we employ two identical discriminators () to learn global and local discrimination respectively, as shown in Fig. 3. In particular, the local components of a frontal face are extracted by a predefined off-the-shelf model [28]. In this work, the key components (eyes, nose, lips, brow) are extracted to learn the local facial structure and the entire face image is used for learning the global texture. Mathematically, we define a mask to extract the key components of a ground-truth visible image as:

With the help of this mask, we can obtain local regions of a real/fake image by the element-wise product between the mask and the real/fake image. The local/global discriminators are optimized by the following loss function / separately

(2)

Here, is sampled uniformly along a straight line between a pair of real image and the generated image [11]. The discriminators attempts to maximize this objective, while the generator attempts to minimize it. We set in our experiments.

Fig. 3: Illustration of the global and local discriminators.

Iii-B Objective Function

The loss function we use to train the proposed network consists of a combination of contrastive loss, gradient reversal layer-based domain adaptive loss and frontalization loss.

Frontalization Loss: To capture texture information between the real and fake images at multiple scalese, we utilize a multi-scale pixel loss as follows

(3)

where corresponds to the resolution scale. In our work, we utilize three different resolution scales: , and .

Identity preserving synthesis is important for heterogeneous face recognition. To achieve this, we utilize the identity loss as follows

(4)

where is the pre-trained VGGFace [32] model which is used to extract features. By minimizing the feature distance between the real and fake images, the generator is optimized to synthesize identity-preserving images.

In order to reduce the artifacts in the synthetic images, a total variation regularization is also applied as that in [22]. Hence, the total loss for the frontalization flow is defined as follows

(5)

where are parameters.

Domain Adaptive Loss: To learn domain-agnostic features, one domain classifier network with a gradient reversal layer [9]

is employed in our network. The domain classifier network is a simple 3-layer multi-linear perceptron (MLP) neural network with the same equalization as in Eq (

1). Given both thermal and visible face images, the classifier aims to correctly estimate which spectrum the input image belongs to. When back-propagating, the gradient reversal layer flips the gradients which leads to domain-agnostic features in the encoder. The network parameters corresponding to the encoder network are updated as follows

(6)

where is the -th input sample, is the learning rate, and are the updated and original parameters, respectively. is the domain index . is set equal to in our experiments.

Contrastive Loss: Finally, the contrasitive loss [4] is used to enhance the discriminability of the latent features. It is defined as follows

(7)

where are two profile face images input to the dual-path architecture. The label vector indicates whether they contain the same () or different () identity. The hyper-parameter indicates the margin.

Total Objective: In order to reduce the artifacts in the synthetic images, a total variation regularization is also applied as that in [22]. The overall loss function used to train the proposed heterogeneous frontalization network is as follows

(8)

where are regularization parameters.

Iv Experiments

Datasets: We evaluate the proposed heterogeneous face frontalization network on three publicly available datasets: DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF) [35], ARL Multimodal Face Database (ARL-MMFD) [7, 15, 45] and TUFTS Face [31]. Sample images from these datasets are shown in Fig. 4.


ARL-VTF [35]    ARL-MMFD [7]     TUFTS [31]

Fig. 4: Input profile thermal images sampled from three datasets respectively.

ARL-VTF Dataset: The ARL-VTF dataset [35] consists of 500,000 images from 395 subjects (295 for training, 100 for testing). This dataset contains a large collection of paired visible and thermal faces. Variations in baseline, expressions, pose, and eye-wear are included. We first crop the images based on the given ground-truth bounding-box annotations for conducting experiments. The proposed model is trained on this dataset by the predefined development and testing splits [35]. The final experimental results are based on the average of the predefined 5 splits.

ARL-MMFD Dataset: Additionally, we evaluate the proposed model on Volume III of the ARL-MMFD dataset [7]. This dataset was collected by ARL cross 11 sessions over 6 days. It contains 5419 paired polarimetric thermal and visible images from 121 subjects with significant variations in expression, illuminations, pose, glasses, etc. We select the polarimetric thermal profile and visible frontal face image pairs corresponding to both neutral and expressive faces for conducting experiments on this dataset. In particular, we randomly select images from 90 identities for training and the images from the remaining identities for testing. So, there are no overlapping images between training and testing sets. We use the original aligned visible and polarimetric thermal images for training and testing without any preprocessing. The final experimental results are based on the average of five random splits.

TUFTS Face Dataset: Finally, the proposed model is also evaluated on a recently published TUFTS Face Dataset [31]. TUFTS is a multi-modal dataset which contains more than 10,000 images from 113 individuals from more than 15 countries. For conducting experiments, we only select thermal and visible images from this multi-modal dataset. Hence, there are more than 1000 images from over 100 subjects with different pose and expression variations. This dataset is very challenging due to a large number of pose and expression variations and only a few images per variation are available in the dataset. Images from randomly selected 89 individuals are used for training and images from the remaining 23 subjects are used for testing. The raw images are used to train and test without any pre-processing. In particular, profile images in the thermal domain and the frontal images (both neural and expression) in the visible domain are utilized for training the models.

Implementation Details: All the images in this work are resized to and the image intensity is scaled into [0, 1]. The features from VGGFace [33] average pooling layer are extracted from the synthesized visible image and the similarity for verification is calculated based on the cosine distance. In all the experiments, the hyper-parameters are set as

. The learning rate is initially set as 0.01 and the batch size is 8. We train our model 10, 100, and 400 epochs on ARL-VTF, ARL-MMFD and TUFTS datasets, respectively.

We compare the proposed method with the following state-of-the-art facial frontalization methods: TP-GAN [17]; PIM [49]; M2FPA [26]; DA-GAN [42]

. Besides, we add pix2pix

[19] as a baseline, which is an image-to-image translation method between two domains. We followed the same network settings as mentioned in the original papers and tried our best to finetune the training parameters. Note that TP-GAN [17] and PIM [49] require landmarks on input profile thermal face images. We manually label the landmarks on thermal faces in the TUFTS Face Dataset [31] and use officially provided landmarks in the ARL-VTF dataset [35]. We estimate the landmarks by MTCNN [46] on the images in the ARL-MMFD dataset [7].

Iv-a Experimental Results

ARL Visible-Thermal Face Dataset: The evaluation is based on the following two protocols: (1) Gallery G_VB0- to Probe P_PTP0. (2) Gallery G_VB0- to Probe P_PTP-. The images in Gallery G_VB0- are the facial baseline images in the visible domain without eye-glasses occlusion. The images in Probe P_PTP0 are the facial profile images in the thermal domain without eye-glasses occlusion. The images in Probe P_PTP- are the facial profile images in the thermal domain with eye-glasses token-off.

We evaluate our model with the other baseline methods both qualitatively and quantitatively. Visual frontalization results are shown in Fig. 5. As can be seen from this figure, the proposed method is able to synthesize more photo-realistic and identity-preserving images compared with the other methods. Other methods fail to synthesize high-quality and identity-preserving images. Additionally, Table I quantitatively compares the verification performance of different methods. Our method achieves around 2% and 5% improvements on both the AUC and EER scores in two protocols, respectively when compared with the recent DA-GAN [42]. Additionally, the proposed model also achieves improvements on the True Accept Rate (TAR) at False Accept Rates (FAR) 1% and 5%.

Probe 00_pose 10_pose
Gallery Metric AUC EER FAR=1% FAR=5% AUC EER FAR=1% FAR=5%
Gallery 0010 Raw 54.88 46.38 2.16 8.33 56.10 45.98 1.06 8.53
Pix2Pix [19] 5.04 46.84 2.17 8.90 57.95 44.84 1.79 14.47
TP-GAN [17] 64.21 40.12 3.31 12.11 67.41 36.78 3.89 13.84
PIM [49] 68.69 36.58 5.43 16.56 73.31 32.70 5.96 20.42
M2FPA [26] 74.99 32.33 5.73 20.26 76.99 29.84 9.51 23.35
DA-GAN [42] 75.58 31.18 6.85 22.23 75.76 30.69 8.40 23.62
Ours 77.48 29.08 8.20 25.89 82.18 25.11 10.82 30.64
TABLE I: Verification performance comparison on the ARL-VTF dataset [35].
Probe 00_pose 10_pose
Gallery Metric AUC EER FAR=1% FAR=5% AUC EER FAR=1% FAR=5%
Gallery 0010 Baseline 56.21 45.99 2.07 9.04 59.20 43.13 2.73 9.71
w/ Multi-scale 58.07 44.82 2.13 10.27 62.01 40.91 3.84 11.06
w/ 70.26 34.26 4.90 16.26 76.56 29.67 6.70 23.84
w/ self-attn 71.99 34.05 5.22 19.72 76.60 29.94 6.62 24.00
w/ 72.58 33.54 6.21 20.07 76.58 28.44 6.66 24.18
w/ Eq.(1) 77.13 29.16 6.41 21.35 80.05 26.26 7.11 29.40
w/ 77.18 29.07 7.43 22.20 80.56 26.46 9.24 29.90
w/ (ours) 77.48 29.08 8.20 25.89 82.18 25.11 10.82 30.64
TABLE II: Verification performance comparison corresponding to the ablation study.

Input      Pix2Pix[19]   TP-GAN [17]    PIM[49]      M2FPA [26]  DA-GAN[42]       Ours         Reference

Fig. 5: Cross–domain face frontalization comparison on the ARL-VTF [35] dataset.

ARL Multimodal Face Database: We evaluate the proposed model on Volume III of ARL-MMFD [7]. Qualitative and quantitative results corresponding to this dataset are shown in Fig. 6 and Table III, respectively. As can be seen from Table III, the proposed model surpasses the best performing baseline model by 2.4% and 1.5% in AUC and EER scores, respectively. Furthermore, the proposed model surpasses 7.7% and 1.3% when compared with the DA-GAN [41] at FAR=1% and FAR=5%, respectively.

From Fig. 6, we can see that our model is able to synthesize more photo-realistic images while preserving the identity better than the other frontalization models. For each subject in this dataset, the pose variations mainly cover from , while the illumination variation is various. Hence, the method like PIM [49] fails to synthesize photo-realistic images due to the illumination artifacts. In addition, our proposed model is trained based on contrastive loss which helps the network provide better results even on this smaller dataset.


Input      Pix2Pix[19]   TP-GAN [17]    PIM[49]      M2FPA [26]   DA-GAN[42]       Ours         Reference

Fig. 6: Cross-domain face frontalization comparison on the ARL-MMFD [7] dataset.
Method AUC EER FAR=1% FAR=5%
Raw 64.12 40.51 4.21 17.74
Pix2Pix [19] 73.60 31.96 11.16 26.45
TP-GAN [17] 76.15 30.89 6.26 19.03
PIM [49] 80.89 26.86 9.83 27.11
M2FPA [26] 85.58 22.27 13.32 37.58
DA-GAN [42] 86.26 21.56 14.74 33.17
Ours 88.61 20.21 16.07 40.93
TABLE III: Verification performance comparison on the ARL-MMFD dataset [7].

TUFTS Face Database: Finally, the proposed model is also evaluated on the TUFTS Face dataset [31]. The performances are reported in quantitative as shown in Table IV. We can observe our proposed method surpasses the previous baselines and achieves around improvement in AUC and EER scores.

Method AUC EER FAR=1% FAR=5%
Raw 67.55 37.88 5.11 16.11
Pix2Pix [19] 69.71 35.31 5.44 21.66
TP-GAN [17] 70.93 35.32 6.46 18.77
PIM [49] 72.84 34.10 8.77 21.00
M2FPA [26] 75.07 31.22 8.33 23.44
DA-GAN [42] 75.24 31.14 10.44 26.22
Ours 78.68 28.38 10.44 27.11
TABLE IV: Verification performance comparison on the TUFTS Face Database [31].

Iv-B Ablation Study

In this section, we analyze how each part of the proposed model contributes to the final performance. We choose the global UNet [17] generator and the discriminator as the baseline model. In particular, we analyze the contribution of each loss function and the equalization in Eq (1). We conduct these ablation experiments on the ARL-VTF dataset and show the results both qualitatively and quantitatively in Fig. 7 and Table II, respectively.

As shown in Table II, the quantitative verification results are improved by consecutively adding different components. Fig. 7 shows the corresponding synthesized samples. Based on this ablation study, we can see that significantly preserves the identity in the synthesized image. Feature equalization Eq. (1) significantly improves the image quality. Finally, and improve the verification performance.


Profile      Baseline    w/   w/   w/ self-attn   w/ w/ Eq. (1)   w/    Ours       Frontal

Fig. 7: Results corresponding to the ablation study.

Iv-C Pose Invariant Representation

In Fig. 9, we show the synthesized frontal faces from the proposed model with a range of yaw poses on the ARL-MMFD dataset. Yaw poses in this dataset mainly cover . Given arbitrary yaw pose polarimetric thermal images, we can observe that the synthesized visible frontal images maintain a good pose-invariant representation. Additionally, we also conduct this analysis on the ARL-VTF dataset as shown in Fig. 8. From these two figures, we can see that the proposed model learns pose-invariant frontal representation in the visible domain when given arbitrary pose thermal input images.

Fig. 8: Synthesized frontal images corresponding to a range of yaw poses on the ARL-VTF dataset.
Fig. 9: Synthesized frontal images corresponding to a range of yaw poses on the ARL-MMFD dataset.

V Conclusion

In this work, we proposed a novel heterogeneous face frontalization model which generates frontal visible faces from profile thermal faces. The generator contains a gradient reversal layer-based classifier for domain-agnostic feature learning and a pair of local and global discriminators for better synthesis. Additionally, another contrastive constraint is enforced by a dual-path training strategy for learning discriminative latent features. Quantitative and visual experiments conducted on three real thermal-visible datasets demonstrate the superiority of the proposed method when compared to other existing methods. Additionally, an ablation study was conducted to demonstrate the improvements obtained from different modules and loss functions.

References

  • [1] J. Booth, A. Roussos, A. Ponniah, D. Dunaway, and S. Zafeiriou (2018) Large scale 3d morphable models. International Journal of Computer Vision 126 (2), pp. 233–254. Cited by: §II.
  • [2] J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, and D. Dunaway (2016) A 3d morphable model learnt from 10,000 faces. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 5543–5552. Cited by: §II.
  • [3] J. Cao, Y. Hu, H. Zhang, R. He, and Z. Sun (2018) Learning a high fidelity pose invariant model for high-resolution face frontalization. See conf/nips/2018, pp. 2872–2882. External Links: Link Cited by: §I, §II.
  • [4] S. Chopra, R. Hadsell, and Y. LeCun (2005) Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1, pp. 539–546. Cited by: §III-B.
  • [5] T. de Freitas Pereira, A. Anjos, and S. Marcel (2019) Heterogeneous face recognition using domain specific units. IEEE Transactions on Information Forensics and Security 14 (7), pp. 1803–1816. External Links: Document Cited by: §I, §II.
  • [6] X. Di, B. S. Riggan, S. Hu, N. J. Short, and V. M. Patel (2019) Polarimetric thermal to visible face verification via self-attention guided synthesis. In 2019 International Conference on Biometrics (ICB), pp. 1–8. Cited by: §II.
  • [7] X. Di, B. S. Riggan, S. Hu, N. J. Short, and V. M. Patel (2021) Multi-scale thermal to visible face verification via attribute guided synthesis. IEEE Transactions on Biometrics, Behavior, and Identity Science 3 (2), pp. 266–280. External Links: Document Cited by: §I, §I, §II, Fig. 4, Fig. 6, §IV-A, TABLE III, §IV, §IV, §IV.
  • [8] X. Di, H. Zhang, and V. M. Patel (2018) Polarimetric thermal to visible face verification via attribute preserved synthesis. In 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–10. Cited by: §II.
  • [9] Y. Ganin and V. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    .
    In

    International conference on machine learning

    ,
    pp. 1180–1189. Cited by: §II, §III-B.
  • [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §I.
  • [11] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp. . External Links: Link Cited by: §III-A2.
  • [12] T. Hassner, S. Harel, E. Paz, and R. Enbar (2015) Effective face frontalization in unconstrained images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4295–4304. Cited by: §I, §II.
  • [13] R. He, J. Cao, L. Song, Z. Sun, and T. Tan (2019) Adversarial cross-spectral face completion for nir-vis face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §I, §II.
  • [14] S. Hu, J. Choi, A. L. Chan, and W. R. Schwartz (2015) Thermal-to-visible face recognition using partial least squares. JOSA A 32 (3), pp. 431–442. Cited by: §I, §II.
  • [15] S. Hu, N. J. Short, B. S. Riggan, C. Gordon, K. P. Gurton, M. Thielke, P. Gurram, and A. L. Chan (2016) A polarimetric thermal database for face recognition research. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 119–126. Cited by: §IV.
  • [16] Y. Hu, X. Wu, B. Yu, R. He, and Z. Sun (2018) Pose-guided photorealistic face rotation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8398–8406. Cited by: §I, §II.
  • [17] R. Huang, S. Zhang, T. Li, and R. He (2017) Beyond face rotation: global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2439–2448. Cited by: §I, §II, §III-A2, Fig. 5, Fig. 6, §IV-B, TABLE I, TABLE III, TABLE IV, §IV.
  • [18] R. Immidisetti, S. Hu, and V. M. Patel (2021) Simultaneous face hallucination and translation for thermal to visible face verification using axial-gan. arXiv preprint arXiv:2104.06534. Cited by: §II.
  • [19] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. CVPR. Cited by: Fig. 5, Fig. 6, TABLE I, TABLE III, TABLE IV, §IV.
  • [20] L. A. Jeni and J. F. Cohn (2016) Person-independent 3d gaze estimation using face frontalization. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 87–95. Cited by: §II.
  • [21] S. Jiang, Z. Tao, and Y. Fu (2021) Geometrically editable face image translation with adversarial networks. IEEE Transactions on Image Processing. Cited by: §II.
  • [22] J. Johnson, A. Alahi, and L. Fei-Fei (2016)

    Perceptual losses for real-time style transfer and super-resolution

    .
    In European conference on computer vision, pp. 694–711. Cited by: §III-B, §III-B.
  • [23] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, External Links: Link Cited by: §III-A1.
  • [24] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410. Cited by: §III-A1.
  • [25] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020) Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119. Cited by: §III-A1.
  • [26] P. Li, X. Wu, Y. Hu, R. He, and Z. Sun (2019) M2FPA: a multi-yaw multi-pitch high-quality database and benchmark for facial pose analysis. arXiv preprint arXiv:1904.00168. Cited by: §II, Fig. 5, Fig. 6, TABLE I, TABLE III, TABLE IV, §IV.
  • [27] A. Litvin, K. Nasrollahi, S. Escalera, C. Ozcinar, T. B. Moeslund, and G. Anbarjafari (2019) A novel deep network architecture for reconstructing rgb facial images from thermal for face recognition. Multimedia Tools and Applications 78 (18), pp. 25259–25271. Cited by: §I, §II.
  • [28] S. Liu, J. Yang, C. Huang, and M. Yang (2015) Multi-objective convolutional learning for face labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3451–3459. Cited by: §III-A2.
  • [29] K. Mallat, N. Damer, F. Boutros, A. Kuijper, and J. Dugelay (2019-06) Cross-spectrum thermal to visible face recognition based on cascaded image synthesis. In ICB 2019, 12th IAPR International Conference On Biometrics, 4-7 June, Crete, Greece, Crete, GRÈCE. Cited by: §I, §II.
  • [30] O. Nikisins, K. Nasrollahi, M. Greitans, and T. B. Moeslund (2014) RGB-d-t based face recognition. In 2014 22nd International Conference on Pattern Recognition, Vol. , pp. 1716–1721. Cited by: §II.
  • [31] K. Panetta, Q. Wan, S. Agaian, S. Rajeev, S. Kamath, R. Rajendran, S. P. Rao, A. Kaszowska, H. A. Taylor, A. Samani, et al. (2018) A comprehensive database for benchmarking imaging systems. IEEE transactions on pattern analysis and machine intelligence 42 (3), pp. 509–520. Cited by: Fig. 4, §IV-A, TABLE IV, §IV, §IV, §IV.
  • [32] O. M. Parkhi, A. Vedaldi, and A. Zisserman (2015) Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC), Cited by: §III-B.
  • [33] O. M. Parkhi, A. Vedaldi, and A. Zisserman (2015) Deep face recognition. In British Machine Vision Conference, Cited by: §IV.
  • [34] D. Poster, M. Thielke, R. Nguyen, S. Rajaraman, X. Di, C. N. Fondje, V. M. Patel, N. J. Short, B. S. Riggan, N. M. Nasrabadi, et al. (2021) A large-scale, time-synchronized visible and thermal face dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1559–1568. Cited by: §II.
  • [35] D. Poster, M. Thielke, R. Nguyen, S. Rajaraman, X. Di, C. N. Fondje, V. M. Patel, N. J. Short, B. S. Riggan, N. M. Nasrabadi, and S. Hu (2021-01) A large-scale, time-synchronized visible and thermal face dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1559–1568. Cited by: Fig. 4, Fig. 5, TABLE I, §IV, §IV, §IV.
  • [36] B. S. Riggan, N. J. Short, S. Hu, and H. Kwon (2016) Estimation of visible spectrum faces from polarimetric thermal faces. In Biometrics Theory, Applications and Systems (BTAS), 2016 IEEE 8th International Conference on, pp. 1–7. Cited by: §I, §II.
  • [37] B. S. Riggan, N. J. Short, and S. Hu (2018) Thermal to visible synthesis of face images using multiple regions. In IEEE Winter Conference on Applications of Computer Vision (WACV), Cited by: §I, §II.
  • [38] C. Sagonas, Y. Panagakis, S. Zafeiriou, and M. Pantic (2015) Robust statistical face frontalization. In Proceedings of the IEEE international conference on computer vision, pp. 3871–3879. Cited by: §I, §II.
  • [39] L. Tran, X. Yin, and X. Liu (2017) Disentangled representation learning GAN for pose-invariant face recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 1283–1292. Cited by: §I, §II.
  • [40] Y. Wei, M. Liu, H. Wang, R. Zhu, G. Hu, and W. Zuo (2020) Learning flow-based feature warping for face frontalization with illumination inconsistent supervision. In Proceedings of the European Conference on Computer Vision, Cited by: §I, §II.
  • [41] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker (2017) Towards large-pose face frontalization in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3990–3999. Cited by: §I, §II, §IV-A.
  • [42] Y. Yin, S. Jiang, J. Robinson, and Y. Fu Dual-attention gan for large-pose face frontalization. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pp. 24–31. Cited by: §I, §II, §III-A1, Fig. 5, Fig. 6, §IV-A, TABLE I, TABLE III, TABLE IV, §IV.
  • [43] A. Yu, H. Wu, H. Huang, Z. Lei, and R. He (2021) LAMP-hq: a large-scale multi-pose high-quality database and benchmark for nir-vis face recognition. International Journal of Computer Vision 129 (5), pp. 1467–1483. Cited by: §I, §II.
  • [44] H. Zhang, V. M. Patel, B. S. Riggan, and S. Hu (2017) Generative adversarial network-based synthesis of visible faces from polarimetrie thermal faces. In 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 100–107. Cited by: §I, §II.
  • [45] H. Zhang, B. S. Riggan, S. Hu, N. J. Short, and V. M. Patel (2019) Synthesis of high-quality visible faces from polarimetric thermal faces using generative adversarial networks.

    International Journal of Computer Vision: Special Issue on Deep Learning for Face Analysis

    .
    Cited by: §II, §IV.
  • [46] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23 (10), pp. 1499–1503. Cited by: §IV.
  • [47] T. Zhang, A. Wiliem, S. Yang, and B. Lovell (2018) TV-gan: generative adversarial network based thermal to visible face recognition. In 2018 International Conference on Biometrics (ICB), Vol. , pp. 174–181. External Links: Document Cited by: §I, §II.
  • [48] Z. Zhang, X. Chen, B. Wang, G. Hu, W. Zuo, and E. R. Hancock (2018) Face frontalization using an appearance-flow-based convolutional neural network. IEEE Transactions on Image Processing 28 (5), pp. 2187–2199. Cited by: §I, §II.
  • [49] J. Zhao, Y. Cheng, Y. Xu, L. Xiong, J. Li, F. Zhao, K. Jayashree, S. Pranata, S. Shen, J. Xing, et al. (2018) Towards pose invariant face recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2207–2216. Cited by: §I, §II, §III-A2, Fig. 5, Fig. 6, §IV-A, TABLE I, TABLE III, TABLE IV, §IV.
  • [50] J. Zhao, L. Xiong, Y. Cheng, Y. Cheng, J. Li, L. Zhou, Y. Xu, J. Karlekar, S. Pranata, S. Shen, J. Xing, S. Yan, and J. Feng (2018) 3D-aided deep pose-invariant face recognition. In

    Proceedings of the 27th International Joint Conference on Artificial Intelligence

    ,
    IJCAI’18, pp. 1184–1190. External Links: ISBN 9780999241127 Cited by: §I, §II.
  • [51] X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li (2015) High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 787–796. Cited by: §I, §II.