Unconstrained Periocular Recognition: Using Generative Deep Learning Frameworks for Attribute Normalization

02/10/2020 ∙ by Luiz A. Zanlorensi, et al. ∙ Universidade da Beira Interior 0

Ocular biometric systems working in unconstrained environments usually face the problem of small within-class compactness caused by the multiple factors that jointly degrade the quality of the obtained data. In this work, we propose an attribute normalization strategy based on deep learning generative frameworks, that reduces the variability of the samples used in pairwise comparisons, without reducing their discriminability. The proposed method can be seen as a preprocessing step that contributes for data regularization and improves the recognition accuracy, being fully agnostic to the recognition strategy used. As proof of concept, we consider the "eyeglasses" and "gaze" factors, comparing the levels of performance of five different recognition methods with/without using the proposed normalization strategy. Also, we introduce a new dataset for unconstrained periocular recognition, composed of images acquired by mobile devices, particularly suited to perceive the impact of "wearing eyeglasses" in recognition effectiveness. Our experiments were performed in two different datasets, and support the usefulness of our attribute normalization scheme to improve the recognition performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The development of ocular biometric systems operating under unconstrained environments is challenging since the collected data (images) may present some problems caused by noise, blur, motion blur, occlusion, eye gaze, off-angle, eyeglasses, contact lenses, makeup, among others. These problems generate high within-class variability degrading the level of uniqueness of the features extracted from the biometric trait.

With the recently advancement of deep learning techniques, several approaches applying Convolutional Neural Networks (CNN) to periocular recognition have been developed 

[15, 22, 30, 8, 27, 9]. An advantage of applications based on deep learning is that unlike the handcrafted features, there is a process of representation learning. This process can produce feature extractor models invariant for some within-class factors, depending on the image samples present in the training set. Nevertheless, new approaches are still being developed using handcrafted features and achieving top-ranked results in ocular recognition competitions [23, 5, 1, 26]. The main advantage of these approaches is the computational cost compared with methods based on deep learning techniques.

Figure 1: Cohesive perspective of the proposed attribute normalization scheme: images feed an encoder/decoder deep model for automatic image editing, removing the eyeglasses and correcting deviated gazes before the recognition step. This contributes for reducing the within-class variability without significantly reducing the discriminability between classes, which is the key for the observed improvements in performance.

Even though CNN approaches can handle within-class variability, there are still several factors present in images captured under unconstrained environments, which affect periocular recognition in biometric systems based on deep learning and mainly on handcrafted features. Regarding these kind of problems, our work proposed an image preprocessing method to normalize the most common image attributes that can decrease the recognition effectiveness in periocular biometric systems. The proposed attribute normalization prepossessing consists of remove or correct attributes that are different in a pairwise image comparison using deep models for image editing, as show in Fig. 

1. For example, in a dataset containing images from the same subject wearing and not wearing eyeglasses, the proposed preprocess will normalize all the images by removing the eyeglasses. Another contribution is a new dataset for mobile periocular recognition under a real and slightly constrained environment. This dataset, called UFPR-Eyeglasses, is composed of images captured by mobile devices from subjects wearing and not wearing eyeglasses. The rest of this paper is organized as follows. In Sec. 2, we discuss the related works describing deep models for attribute editing. In Sec. 3, we explain the proposed normalization and how it was performed. The experimental protocol is described in Sec. 4 and the results are reported in Sec. 5. Finally, we state conclusions in Sec. 6.

2 Related Work

Recently, several methods have been developed for automatic facial attribute editing. Approaches based on Generative Adversarial net (GAN) [6]

and Variational Autoencoder (VAE) 

[11] architectures reported promising results performing these tasks [13, 21, 3, 12, 7, 24, 31, 10, 25]. The models for face attributes editing can be divide based on their ability to manipulate a single [24, 31] or multiple attributes [13, 21, 3, 12, 7], such as eyeglasses, hair color, age, mustache, gender, beard, among others. Also, there are strategies for image attribute editing by transferring face attributes [3, 25, 10]. The concept of this task is to modify a face image based on attributes contained in another image, preserving the subject identity. As stated by He et al. [7], one advantage of models based on encoder/decoder architecture is that they can handle multiple attributes manipulation using a single trained model. Also, in models based on encoder-decoder architecture, the attributes are manipulated through modifications in the latent representation generated by the encoder. With these modifications, the decoder can generate images with different attributes compared to the input ones.

Regarding the image attribute manipulation, each model proposes a different strategy to relate the latent representation to the face attributes. The model proposed by Shen and Liu [24] consists of two networks performing the inverse attribute manipulation, e.g., one network to remove the mustache and another one to add it. The attribute manipulation is performed by a pixel-wise addition of the residual image containing the required attribute and the input image. This approach handles a single attribute manipulation per trained model. The IcGAN [21]

is composed of an encoder and a conditional GAN generator using a normal distribution independent of the attribute to generate the latent image representation. The input image is also encoded into an attribute information vector. Then, the attribute manipulation is performed by modifying the attribute vector and using it and the latent representation as input to the GAN generator. The VAE/GAN 

[13] generates a vector for each attribute computing the difference between the mean latent representations with and without the attribute. Thus, the face attributes can be manipulated by adding the generated attribute vectors to a latent representation. Also, based on an encoder/decoder network with an attribute vector, the Fader network [12] produces a latent representation invariant to the attributes by an adversarial process introduced in the architecture. As stated by He et al. [7], this process may result in information loss, which can compromise its use to our proposed attribute normalization, since some discriminant information in the periocular image can be lost. The SaGAN model [29]

is composed of a generator developed with an attribute manipulation network (AMN) and a spatial attention network (SAN), and a discriminator to determine whether or not the generated image is real and for attribute classification. The SAN and AMN models were combined in the generator to induce the manipulation only inside the attribute region. The authors also evaluated the proposed attribute editing model on face recognition. They used the generated images with edited attributes for data augmentation improving the verification results in two datasets. As one can see, there are several models for facial attributes editing. Regarding biometric system applications, it is crucial to the model the ability to modify only the desired attribute, without removing or changing any other information that may be discriminating for the subject.

3 Proposed attribute normalization method

The proposed attribute normalization preprocess consists of applying generative deep models for image attribute editing to a pair of ocular images aiming for the correction/removal of different attributes. Regarding the within-class variability in periocular images caused by different aspects such as eyeglasses and eye gaze, the hypothesis that we considered in this work is that it is possible to decrease this variabiality by an attribute normalization preprocess.

To perform such normalization process, we employed the AttGAN model [7] since its results compared with other state-of-the-art methods demonstrated a better capacity in changing facial attributes keeping the subject identity information as can be seen in Fig. 2, which is a crucial factor for a biometric system.

Figure 2: Comparison of state-of-the-art methods for facial attribute editing results. Adapted from [7].

The AttGAN [7] is a deep model based on an encoder/decoder architecture. Compared with other facial attribute editing models, its main difference is an attribute classification constraint, which requires the correct attribute manipulation in the generated images. Regarding the problem of information loss, the architecture has a reconstruction learning, used to preserve the other attribute details, i.e., changing only the required attribute. The model training is performed using three learning components: the reconstruction, the attribute classification, and adversarial learning. These components guarantee the visual and reconstruction quality of the generated images with the correct attribute manipulation. Due to all these features and mainly regarding the ability to reduce the information loss, we choose the AttGAN network to perform the proposed attribute normalization. As the generative model receives as input an image and the attributes to be changed, we performed the attribute normalization by feeding the model with the images and requesting to remove the eyeglasses and correct the eye gaze.

The AttGAN can handle multiple attribute editing, i.e., changing more than one attribute with a single model. However, as we had to use different datasets for each attribute normalization in our experiments, we trained two models, one for each attribute. We validate our proposed normalization by comparing the results of biometric systems based on handcrafted features and deep learning approaches using the original and normalized images.

4 Experiments

4.1 Datasets

We carried out the experiments using two datasets: the UFPR-Eyeglasses, collected for this work, and used for the eyeglasses attribute normalization, and the UBIPr [19] for the eye gaze normalization. These datasets were detailed bellow.

4.1.1 UFPR-Eyeglasses

We collected a new challenging dataset to evaluate the effect of the occlusion caused by eyeglasses in the periocular recognition using images captured by mobile devices under real uncontrolled environments. The dataset has periocular images (containing both eyes) from subjects ( classes), all taken by the subject himself/herself using his/her smartphone at visible wavelength in distinct sessions. We manually annotated the iris bounding box of each image, and used these annotations to perform the image normalization regarding rotation and scale, and also to crop the periocular region of each eye to pixels. The within-class variations are mainly caused by different aspects on the images, such as illumination, occlusions, distances, reflection, eyeglasses, and image quality. The UFPR-Eyeglasses dataset (images and annotations) is available (under author request) to the research community at [https://web.inf.ufpr.br/vri/databases/ufpr-eyeglasses/].

4.1.2 UBIPr

The UBIPr dataset [19] is composed of ocular images from subjects. These images were captured under an uncontrolled environment by a Canon OS 5D camera with a mm focal length at visible wavelength. The main challenge of this dataset includes several variability factors in the images, such as different distances, scales, occlusions, poses, eye gazes, and eyeglasses. Unlike the UFPR-eyeglasses, this dataset does not contain images from the same subject with and without eyeglasses. Instead, there are images from the same subject with and without eye gaze. Thus, we used this dataset to evaluate the eye gaze normalization.

4.2 Baseline methods

We evaluated the proposed ocular normalization scheme using handcrafted features [20, 1], and deep representations based on approaches that recently reported state-of-the-art performances in the periocular and iris recognition [15, 28]. These methods are detailed below.

4.2.1 Handcrafted features approaches

For the evaluation of the handcrafted features-based methods, we employed three approaches. The first is one of the first periocular recognition methods found in the literature, proposed by Park et al. [20]. This approach combined Local Binary Patterns (LBP) [16, 17], Histogram of Oriented Gradients (HOG) [4], and Scale-Invariant Feature Transform (SIFT) [14] features. The second one is the winner approach in the Miche-II contest [5, 1]. This method is also composed of an iris recognition scheme, but in our experiments, we used only the periocular recognition module, which was performed using Multi-Block Transitional Local Binary Patters (MB-TLBP) features [1]. At last, we combining the following features by a score-level fusion: LBP, Local Phase Quantization (LPQ) [18], HOG and SIFT. All the features were extracted from a gray representation of the images extracted by the intensity channel. The normalized LBP and LPQ features were extracted from patches with a size of pixels cropped from each image. Then, the features of each patch were concatenated, generating feature vectors with a size of and for the LBP and LPQ, respectively. The HOG features were extracted from the entire image producing a feature vector with of size.

4.2.2 Deep learning based approaches

Recent works reported promising results in the development of biometric systems based on deep representations of the periocular region [15, 22, 8, 27, 9]

. These approaches generally consist of a CNN model that has a softmax layer at the top, and it is trained using the cross-entropy loss function. After the training stage, the softmax layer is removed, and then the deep representations can be extracted at the newest last layer. To evaluate the attribute normalization using these kinds of models, we employed two state-of-the-art methods to extract deep representations 

[15, 27]. These methods are based on the VGG16 and ResNet50 architectures pre-trained for face recognition [2]. Both methods generated a feature vector with a size of for each image. We reported results from runs (repetitions) for each model.

5 Results and Discussion

The first step in our proposed normalization strategy is the training of the AttGAN model for ocular attribute editing using periocular images. For the eyeglasses normalization (removal), we employed the entire UBIPr dataset in the training stage. Then, we normalized all the images from the UFPR-eyeglasses dataset by removing the eyeglasses. For the eye gaze normalization, we trained the Att-GAN using images from the first half of the subjects from the UBIPr dataset and normalized all images from the second half of the subjects by correcting the eye gaze. The Deep learning based approaches were trained using the first half of the subjects for both datasets. The second half of the subjects were used to evaluate and compare handcrafted features and deep learning approaches using original and normalized images. Some qualitative results of the attribute normalization using the AttGAN model are shown in Fig. 3.

UFPR-Eyeglasses UBIPr
    Original        Normalized     Original        Normalized
Figure 3: Examples of original and normalized images from the UFPR-Eyeglasses (Eyeglasses removal) and UBIPr (Eyegaze correction) datasets.

For the recognition performance evaluation, according to the conclusions we previously drew about distance measures in ocular representations [15, 27], we chose to use the cosine distance metric to match both deep learning-based and handcrafted approaches. Regarding the SIFT features matching, we used the ratio test, as proposed by Lowe [14].

We started by generating pairwise comparisons considering only images with different attributes, i.e., pairs with eyeglasses/no-eyeglasses in the UFPR-Eyeglasses dataset and pairs with different gaze in the case of the UBIPr dataset. Using the second half of the subjects for each dataset, we applied the all-against-all protocol, generating genuine and impostor pairs for the UFPR-Eyeglasses dataset and genuine / impostors pairs for the UBIPr dataset.

Considering a verification task, we used the Decidabilty index and the Area Under the Curve (AUC) as metrics to evaluate the methods. The Decidability index measures how separated are the genuine and impostors scores distributions. As the proposed normalization aims to decrease the within-class variability, we considered the Decidability as the primary metric. The AUC informs the quality of the predictions based on different thresholds. The results achieved with the proposed attribute normalization are shown in Table 1, for the UFPR-Eyeglasses and UBIPr datasets. Note that we compared the results of the methods using the original and normalized images, in order to better evaluate the improvements in performance with respect to the solution described in this paper.

Method - Features Att. Normalization UFPR-Eyeglasses / UBIPr
AUC (%) Decidability
Ahmed et al. [1] - / /
Proposed / /
Park et al. [20] - / /
Proposed / /
LBP + LPQ + - / /
HOG + SIFT Proposed / /
Luz et al. [15] - / /
Proposed / /
Zanlorensi et al. [27] - / /
Proposed / /
Table 1: Comparison of results using original and normalized images in the UFPR-Eyeglasses and UBIPr datasets.
Original Normalized Original Normalized
Figure 4: Genuine scores comparison from original and normalized images. Higher scores mean that the periocular image pairwise is more likely to be genuine.

The results showed that the proposed normalization preprocessing consistently improve the verification results in the UFPR-Eyeglasses dataset, increasing the Decidability by (i.e., ) and (i.e., ), respectively using the features from the method proposed by Park et al. [20] and from the proposed handcrafted features fusion. Using the deep learning based approaches, the attribute normalization improved the Decidability by and for the methods proposed by Luz et al. [15] and Zanlorensi et al. [27], respectively. Unlike the experiments performed using the UFPR-Eyeglasses dataset, in the UBIPr one, the attribute normalization process consists of the eye gaze correction. Since this process is computed in a small portion of the periocular image (only in the eyeball region), in general, we can observe that the impact of applying the attribute normalization is smaller than the ones obtained in the UFPR-Eyeglasses images. Nevertheless, the highest Decidability index in the UBIPr dataset using hand-crafted features and Deep learning-based models was achieved by employing the normalized images.

Fig. 4

shows some qualitative results where wrong genuine matching between original images were corrected using the proposed attribute normalization. One can also observe that in the UFPR-Eyeglasses dataset, even when the eyeglasses were not entirely removed, the generative model was able to smooth them, such that the biometric system was able to correctly classified a pair as genuine. Investigating other wrong genuine matches, we stated that the pose and illumination aspect is one of the most significant factors that penalize the within-class variability in the UBIPr dataset.

6 Conclusion

This paper proposed an attribute normalization scheme that can be used as a preprocessing step to reduce the within-class variability in unconstrained periocular recognition. The idea is to use state-of-the-art generative model that normalizes specific factors of all samples before being used by the recognition algorithm. Noting that our solution is fully agnostic to the recognition method used, our proof-of-concept was conducted in two datasets and five different baseline methods. Our idea was to compare the levels of performance attained by the recognition methods when using the raw data and when receiving the images preprocessed by our solution. The observed results corroborated our hypothesis that the proposed attribute normalization is highly effective to reduce the within-class variabilities, without compromising the discriminability between classes, which is the root for the observed improvements in performance.

Acknowledgment: This work was supported by grants from the National Council for Scientific and Technological Development (CNPq)(#313423/2017-2 and #428333/2016-8), and the Coordination for the Improvement of Higher Education Personnel (CAPES), Brazilian funding agencies, and also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. The second author work is funded by FCT/MEC through national funds and co-funded by FEDER - PT2020 partnership agreement under the projects UID/EEA/50008/2019 and POCI-01-0247-FEDER-033395.

References

  • [1] N. U. Ahmed, S. Cvetkovic, E. H. Siddiqi, A. Nikiforov, and I. Nikiforov (2017-05) Combining iris and periocular biometric for matching visible spectrum eye images. Pattern Recognition Letters 91, pp. 11–16. External Links: ISBN 9781509048472, ISSN 01678655 Cited by: §1, §4.2.1, §4.2, Table 1.
  • [2] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman (2017) VGGFace2: A dataset for recognising faces across pose and age. CoRR. External Links: Link, 1710.08092 Cited by: §4.2.2.
  • [3] Y. Choi et al. (2018-06)

    StarGAN: unified generative adversarial networks for multi-domain image-to-image translation

    .
    In CVPR, Cited by: §2.
  • [4] N. Dalal and B. Triggs (2005-06) Histograms of oriented gradients for human detection. In CVPR, Vol. 1, pp. 886–893. External Links: ISSN 1063-6919 Cited by: §4.2.1.
  • [5] M. De Marsico and H. Proença (2017-05) Results from MICHE II - Mobile Iris CHallenge Evaluation II. Pattern Recognition Letters 91, pp. 3–10. External Links: ISSN 01678655 Cited by: §1, §4.2.1.
  • [6] I. Goodfellow et al. (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. Cited by: §2.
  • [7] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen (2019-11) AttGAN: facial attribute editing by only changing what you want. IEEE Transactions on Image Processing 28 (11), pp. 5464–5478. External Links: ISSN 1941-0042 Cited by: §2, §2, Figure 2, §3, §3.
  • [8] K. Hernandez-Diaz, F. Alonso-Fernandez, and J. Bigun (2018-Sep.) Periocular recognition using cnn features off-the-shelf. In BIOSIG, pp. 1–5. Cited by: §1, §4.2.2.
  • [9] F. A. K. Hernandez-Diaz and J. Bigun (2019) Cross spectral periocular matching using resnet features. In ICB, pp. 1–6. Cited by: §1, §4.2.2.
  • [10] T. Karras, S. Laine, and T. Aila (2019-06) A style-based generator architecture for generative adversarial networks. In CVPR, Cited by: §2.
  • [11] D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. In International Conference on Learning Representations, Cited by: §2.
  • [12] G. Lample et al. (2017) Fader networks:manipulating images by sliding attributes. In Advances in Neural Information Processing Systems, pp. 5967–5976. Cited by: §2, §2.
  • [13] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther (2016-06) Autoencoding beyond pixels using a learned similarity metric. In ICML, Vol. 48, New York, New York, USA, pp. 1558–1566. Cited by: §2, §2.
  • [14] D. G. Lowe (2004) Distinctive image features from scale-invariant keypoints.

    International Journal of Computer Vision

    60, pp. 91–110.
    External Links: ISSN 1573-1405 Cited by: §4.2.1, §5.
  • [15] E. Luz, G. Moreira, L. A. Zanlorensi Junior, and D. Menotti (2018) Deep periocular representation aiming video surveillance. Pattern Recognition Letters 114, pp. 2–12. External Links: ISSN 0167-8655 Cited by: §1, §4.2.2, §4.2, Table 1, §5, §5.
  • [16] T. Ojala, M. Pietikainen, and D. Harwood (1994) Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In ICPR, Vol. 1, pp. 582–585. Cited by: §4.2.1.
  • [17] T. Ojala, M. Pietikäinen, and D. Harwood (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29 (1), pp. 51–59. External Links: ISSN 0031-3203 Cited by: §4.2.1.
  • [18] V. Ojansivu and J. Heikkilä (2008) Blur insensitive texture classification using local phase quantization. In International conference on image and signal processing, pp. 236–243. Cited by: §4.2.1.
  • [19] C. N. Padole and H. Proença (2012-03) Periocular recognition: analysis of performance degradation factors. In International Conference on Biometrics (ICB), pp. 439–445. Cited by: §4.1.2, §4.1.
  • [20] U. Park, R. R. Jillela, A. Ross, and A. K. Jain (2011-03) Periocular biometrics in the visible spectrum. IEEE Transactions on Information Forensics and Security 6 (1), pp. 96–106. External Links: ISSN 1556-6021 Cited by: §4.2.1, §4.2, Table 1, §5.
  • [21] G. Perarnau, J. van de Weijer, B. Raducanu, and J. M. Álvarez (2016) Invertible Conditional GANs for image editing. In NIPS Workshop on Adversarial Training, Cited by: §2, §2.
  • [22] H. Proença and J. C. Neves (2018-04) Deep-PRWIS: periocular recognition without the iris and sclera using deep learning frameworks. IEEE Transactions on Information Forensics and Security 13 (4), pp. 888–896. External Links: ISSN 1556-6013 Cited by: §1, §4.2.2.
  • [23] A. F. Sequeira et al. (2017-10) Cross-eyed 2017: cross-spectral iris/periocular recognition competition. In IJCB, Denver, CO, USA, pp. 725–732. External Links: ISSN 2474-9699 Cited by: §1.
  • [24] W. Shen and R. Liu (2017-07) Learning residual images for face attribute manipulation. In CVPR, Cited by: §2, §2.
  • [25] T. Xiao, J. Hong, and J. Ma (2018-09) ELEGANT: exchanging latent encodings with gan for transferring multiple face attributes. In ECCV, Cited by: §2.
  • [26] L. A. Zanlorensi, R. Laroca, E. Luz, A. S. Britto Jr., L. S. Oliveira, and D. Menotti (2019) Ocular recognition databases and competitions: a survey. arXiv preprint arXiv:1911.09646, pp. 1–20. Cited by: §1.
  • [27] L. A. Zanlorensi, D. R. Lucio, A. S. Britto Jr., H. Proença, and D. Menotti (2019-11) Deep representations for cross-spectral ocular biometrics. IET Biometrics. External Links: ISSN 2047-4938 Cited by: §1, §4.2.2, Table 1, §5, §5.
  • [28] L. A. Zanlorensi, E. Luz, R. Laroca, A. S. Britto Jr., L. S. Oliveira, and D. Menotti (2018-10) The impact of preprocessing on deep representations for iris recognition on unconstrained environments. In Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 289–296. External Links: ISSN 2377-5416 Cited by: §4.2.
  • [29] G. Zhang, M. Kan, S. Shan, and X. Chen (2018-09) Generative adversarial network with spatial attention for face attribute editing. In ECCV, Cited by: §2.
  • [30] Z. Zhao and A. Kumar (2018-12) Improving periocular recognition by explicit attention to critical regions in deep neural network. IEEE Transactions on Information Forensics and Security 13 (12), pp. 2937–2952. External Links: ISSN 1556-6013 Cited by: §1.
  • [31] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017-10) Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, Cited by: §2.