Multi-view Deep Features for Robust Facial Kinship Verification

06/01/2020 ∙ by Oualid Laiadi, et al. ∙ 0

Automatic kinship verification from facial images is an emerging research topic in machine learning community. In this paper, we proposed an effective facial features extraction model based on multi-view deep features. Thus, we used four pre-trained deep learning models using eight features layers (FC6 and FC7 layers of each VGG-F, VGG-M, VGG-S and VGG-Face models) to train the proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN) method. Furthermore, we show that how can metric learning methods based on WCCN method integration improves the Simple Scoring Cosine similarity (SSC) method. We refer that we used the SSC method in RFIW'20 competition using the eight deep features concatenation. Thus, the integration of WCCN in the metric learning methods decreases the intra-class variations effect introduced by the deep features weights. We evaluate our proposed method on two kinship benchmarks namely KinFaceW-I and KinFaceW-II databases using four Parent-Child relations (Father-Son, Father-Daughter, Mother-Son and Mother-Daughter). Thus, the proposed MSIDA+WCCN method improves the SSC method with 12.80 KinFaceW-II databases, respectively. The results obtained are positively compared with some modern methods, including those that rely on deep learning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The basic idea of automatic kinship verification using facial images is to check if a given two facial images input have pertinence from the same family or not. Several applications can be useful under automatic kinship verification e.g. for forensics, finding missing children, social media comprehension and image annotation. Thus, a DNA test is the most reliable source for kinship verification, it unfortunately cannot be utilized in many situations such as in video surveillance.

Many authors feed their method by different features or multiple features (multi-view data) to represent facial images for kinship verification. Lu et al. used the Multiview neighborhood repulsed metric learning (MNRML)  [16] method to train four multi-view features, Local Binary Patterns (LBP), Learning-based descriptor (LE), SIFT and Three-patch LBP (TPLBP). Yan et al.  [26] employed three different feature descriptors including Local Binary Patterns(LBP), Spatial Pyramid LEarning (SPLE) and Scale-Invariant Feature Transform (SIFT) to extract different and complementary information from each face image through DMML method. Yan et al.  [27] applied three dif-ferent feature descriptors including LBP, spatial pyramid lEarning (SPLE), and SIFT to extract different and complementary information from each face image to train the MPDFL method. Lu et al.  [15] used four features as it; Local Binary Patterns (LBP), Dense SIFT (DSIFT), the histogram of oriented gradients (HOG) and LPQ for train DDMML method. Lu et al.  [9] used MvDML to train four multi-view features, Local Binary Patterns (LBP), Learning-based descriptor (LE), SIFT and Three-patch LBP (TPLBP). Laiadi et al.  [14] used three features LPQ, BSIF and CoALBP to train SIEDA method. Dornaika et al. used MNRML to train the two features, FC7 layers of VGG-F and VGG-Face for the purpose of kinship verification. Laiadi et al. proposed TXQDA  [13] method to train LPQ and BSIF features using ten scales.

In this work, we propose a new framework to kinship verification from facial images using eight deep features based on four pre-trained deep learning networks. For this reason, we extract FC6 and FC7 layers from VGG-F, VGG-M, VGG-S and VGG-Face models to train the proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN) method. We report our preliminary experimental investigations on the KinFaceW-I and KinFaceW-II benchmarks using four relations, Father-Son, Father-Daughter, Mother-Son and Mother-Daughter face subsets showing very high performance compared to state-of-the-art methods.

Ii Proposed Framework

Figure 1 depicts an overview of our proposed framework. The input is a pair of two face images e.g. a Parent and a Child. We extract features from these images into eight deep features of the input test pair. We compute the cosine similarity between the two facial images to encode the final metric. The cosine similarity is fed to the ROC curve for the performance evaluation.

Fig. 1: Block diagram of the proposed face pair matching system.

Ii-a Extracting Multi-view Deep Features

Many methods suggested in the literature on automatic verification of kinship have focused mainly on analyzing deep features trained on facial images (i.e. VGG-Face), thus ignoring deep features trained on object images (i.e. VGG-F, VGG-M and VGG-S). Recently, deep facial features have shown great performance than their shallow counterparts to verify kinship relation (e.g.  [30]). When considering the facial deep information, the problem usually consists in learning a discriminating metric where the classification (e.g. kinship verification in our case) becomes more affordable when combined to the object deep features. As suggested in  [7], we consider the object deep features for kinship verification. Facial and object features show a complementarity which extracted by MSIDA+WCCN method. This is in contrast to use facial deep features or object deep features information separately. Therefore, we extract the facial deep features using VGG-Face  [18] method and object deep features [4] (VGG-F, VGG-M and VGG-S methods) using MSIDA+WCCN method. Figure 2

depicts multi-view deep features extraction (Mv-VGG) and tensor design. From this figure, the different colors of block architecture represent the difference in each architecture type. For deep object features, VGG-Fast is an architecture contains number of parameters smaller than VGG-Medium, and this latter is an architecture contains number of parameters smaller than VGG-Slow and these three architectures were trained on the object recognition ILSVRC-2012 database. For deep face features, the VGG-Face architecture was trained on VGG Face database [17] which contains 2.6M facial images from 2,622 identities. Furthermore, in the tensor representation, the length of each data stacked in a tensor mode must be the same, and this property was saved by the four pre-trained models with 4096 neurons in each of the eight used fully connected layers (i.e the FC6 and FC7 layers of the four pre-trained models have the same length).

Fig. 2: Multi-view deep features extraction (Mv-VGG) and tensor design.

Iii Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN)

Iii-a Side-Information based Linear Discriminant analysis (SILD)

The positive classes pair images are directly utilized to calculate the within class scatter matrix and the negative classes pair images are used to compute the between class scatter matrix. Let us refer that as the collection of positive-class image pairs and
as the collection of negative-class image pairs, where the image is represented by the class label . Here, the within-class and between-class scatter matrices of Side-Information based Linear Discriminant analysis  [17] (SILD) method can be represented by:

(3)
(4)

The target function for SILD is:

(5)

The problem in 5 can be solved by a two-step method  [23]. Firstly, is diagonalized as follows:

(6)
(7)

Secondly, is also diagonalized:

(8)

Finally, the projection matrix can be computed as:

(9)

where H and Z are orthogonal matrices and and E are diagonal matrices.

A solution to the optimization problem in (5

) is obtained via solving the generalized eigenvalue problem. The projection matrix of SILD is formed by the first

eigenvectors in (9) that ordered in the descending order of eigenvalues.

Iii-B Proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization

Let a Tensor training set of classes, where: contains samples of Parents samples and contains samples of Children samples. The goal of MSIDA  [2] is the calculation of projection matrices (). Thus, we calculate one projection matrix for each tensor mode. The objective function of MSIDA method is defined as follow:

(10)

We calculate the two covariance matrices and for each mode by:

S_w = ∑_p=1^∏_o≠kI_o S_w^p, S_w^p = ∑_i=1^C_1((ˇξ_i^1)^k,p-(^ξ_i^1)^k,p)((ˇξ_i^1)^k,p-(^ξ_i^1)^k,p)^T

S_b = ∑_p=1^∏_o≠kI_o S_b^p, S_b^p = ∑_i=1^C_0((ˇξ_i^0)^k,p-(^ξ_i^0)^k,p)((ˇξ_i^0)^k,p-(^ξ_i^0)^k,p)^T

Now that the solution for one mode is known, the optimization problem in equation 10 can be solved iteratively. The projection matrices are first initialized to identity. At each iteration are hypothetical known and

is estimated. Set:

and are replaced in equation 10 by X and Z. The new equation can be solved by the generalized eigenvalue decomposition problem:

(11)

Where, is the eigenvectors matrix and the eigenvalues matrix.

The iterative process of MSIDA breaks up on the recognition of one of the following situations: i) The number of iterations reaches a predefined maximum; or ii) the difference of the estimated projection between two consecutive iterations is less than a threshold, where is the mode dimension of . As depicted in Fig. 1, the block diagram of the proposed approach consists of three essential components: feature extraction, tensor subspace transformation and comparison. We focus in this work on subspace transformation and the feature extraction based multiple scales local descriptor.

Iii-C Within-Class Covariance Normalization

The first use of the Within-Class Covariance Normalization (WCCN) is in the community of speaker recognition. While Dehak et al.  [5]

founded that it is the best technique to project the reduced-vectors of LDA method to a new subspace determined by the square-root of the inverse of the within-class covariance matrix. We propose a new variant of MSIDA by integrating WCCN:

G = ∑_p=1^∏_o≠kI_o G^p, G^p =∑_i=1^C_1 (Wk)Tξi1)k,p-(Wk)T(^ξi1)k,p(Wk)Tξi1)k,p-(Wk)T(^ξi1)k,p

where, is the MSIDA projection matrix found in Eq.11. The WCCN projection matrix is obtained by Cholesky decomposition  [11, 28] of the inverse of : . Where the new projection matrix is obtained by: . By imposing upper bounds on the classification error metric  [1], WCCN decreases the within-class variations effect by reducing the expected classification error on the training step.

Iv Experimental Analysis

For experimental evaluation, we considered the KinFaceW-I and KinFaceW-II databases are gathered through Internet research, including some public figures with their parents and/or children. In the KinFaceWI dataset, there are 156, 134, 116, and 127 pairs corresponding to the F-S, F-D, M-S, and M-D relations, respectively. For the KinFaceW-II dataset, each kin relation type contains 250 pairs. In total KinFaceW-I count 1066 face images and 2000 face images for KinFaceW-II.

Method KinFaceW-I KinFaceW-II
F-S F-D M-S M-D Mean F-S F-D M-S M-D Mean
MNRML  [16] 72.50 66.50 66.20 72.00 69.90 76.90 74.30 77.40 77.60 76.50
DMML  [26] 74.50 69.50 69.50 75.50 72.25 78.50 76.50 78.50 79.50 78.25
MPDFL  [27] 73.50 67.50 66.10 73.10 70.10 77.30 74.70 77.80 78.00 77.00
MMTL  [19] N.A N.A N.A N.A 73.70 N.A N.A N.A N.A 77.20
DDMML  [15] 86.40 79.10 81.40 87.00 83.50 87.40 83.80 83.20 83.00 84.30
NRCML  [25] 66.10 61.10 66.90 73.00 66.30 79.80 76.10 79.80 80.00 78.70
MKSM  [29] 83.65 81.35 79.69 81.16 81.46 83.80 81.20 82.40 82.40 82.45
MvDML  [9] / / / / / 80.40 79.80 78.80 81.80 80.20
Deep+Shallow  [3] 68.80 68.80 70.50 65.50 68.40 66.50 68.80 65.40 65.40 66.50
 [10] / / / / / 82.40 78.20 78.80 80.40 80.00
ResNet + CF  [21] 78.00 83.70 87.00 80.80 82.40 87.70 86.00 86.70 87.40 86.60
RDML  [6] 76.20 74.20 76.90 82.20 77.30 79.30 72.30 77.40 78.30 76.80
MNRML+SVM [7] 85.90 79.85 86.20 86.62 84.55 87.20 82.60 88.40 89.40 86.90
SILD+WCCN/LR  [12] / / / / / 88.40 84.20 85.80 86.40 86.20
KML  [30] N.A N.A N.A N.A 82.80 N.A N.A N.A N.A 85.70
MvGMML  [8] 69.25 73.12 69.40 72.76 71.13 70.40 73.40 65.80 69.20 69.70
SSC 71.57 70.83 77.12 79.88 74.85 72.80 69.00 73.80 73.80 72.35
SILD 73.75 71.25 76.25 77.49 74.69 72.80 69.20 74.00 74.00 72.50
MSIDA 73.00 72.96 78.41 77.91 75.57 75.00 69.40 75.80 74.40 73.95
SILD+WCCN 75.72 72.39 79.80 80.74 77.16 77.40 75.60 75.80 78.40 76.80
MSIDA+WCCN 85.98 85.93 90.05 88.62 87.65 89.40 82.80 87.80 88.00 87.00
TABLE I: Performance comparisons (%) with state-of-the-art methods on KinFaceW-I and KinFaceW-II databases.
(a)
(b)
(c)
(d)
Fig. 7: ROC curves of different methods (SSC, SILD, MSIDA, SILD+WCCN and MSIDA+WCCN) on KinFaceW-I database obtained on (a) F-S set, (b) F-D set, (c) M-S set and (d) M-D set.
(a)
(b)
(c)
(d)
Fig. 12: ROC curves of different methods (SSC, SILD, MSIDA, SILD+WCCN and MSIDA+WCCN) on KinFaceW-II database obtained on (a) F-S set, (b) F-D set, (c) M-S set and (d) M-D set.

Iv-a Experimental Setup

The number of the positive and negative pairs used in the experiments is the same for each relation on the four subsets. We use five-fold cross validation strategy for the evaluation. We report the mean accuracy over the five folds. The negative pairs and folds are predefined for the all four relations. For the facial deep features and object deep features, we extracted VGG-Face, VGG-F, VGG-M and VGG-S as this has shown to perform better than shallow methods  [30, 7]. The tensor features are performs by the proposed MSIDA+WCCN method.

Iv-B Results and Analysis

Iv-B1 Results on RFIW’20 Challenge

For RFIW’20 Challenge  [20, 24, 22, 21], we used the eight fully connected layers of the four pre-trained models (facial and object models). For this reason, we used the Simple Scoring Cosine similarity (SSC) method by concatenating the eight deep features to form a vector of features for each pair facial images. Then, we compute the cosine similarity metric between the two vectors. This method show and prove how can a raw weights of deep features perform in kinship verification as an excellent features of facial images without using any application of learning methods.

Iv-B2 Results on KinFaceW databases

We run the experiments on the four relations of the two databases, KinFaceW-I and KinFaceW-II, using SSC, SILD, MSIDA, SILD+WCCN and MSIDA+WCCN methods. The results of these experiments are reported in Table I. The ROC curves comparing SSC, SILD, MSIDA, SILD+WCCN and MSIDA+WCCN are provided in Figures 7 and 12 for the four relation of KinFaceW-I and KinFaceW-II databases, respectively. As can be noticed from the figure, the performance of MSIDA+WCCN is much better than that the other methods in all cases.

Our proposed method is compared against some recent state-of-the-art methods in Table I. Note that some of these methods, such as MvDML, DDMML, ResNet+CF, MNRML+SVM, use combination of different features to describe a face image. Some other methods are based on deep learning. On the four relations of KinFaceW-I and KinFaceW-II databases, our approach yields in the best results for the mean of all the four kinship subsets of the two databases. These results are promising and demonstrate that our proposed approach is performs better than the recent methods for kinship verification. Furthermore, MSIDA+WCCN and SILD+WCCN improve the performances of their counterparts (i.e. MSIDA and SILD) with large margin. Besides, for linear (vector-based) methods, SILD+WCCN improves SILD method with about 2.47% and 4.30% for KinFaceW-I and KinFaceW-II databases, respectively. Also, for multilinear (tensor-based) methods, MSIDA+WCCN improves MSIDA method with about 12.08% and 13.05% for KinFaceW-I and KinFaceW-II databases, respectively. Thus, the integration of WCCN shows stable and robust performances on the metric learning methods for kinship verification.

V Conclusion

In this paper, we presented an effective approach based on multi-view deep features (facial and object) features to the problem of kinship verification. To achieve a low dimensional and discriminative subspace, we proposed the MSIDA+WCCN method. Also, we studied the effect of WCCN on different metric learning methods showing that the within-class intra-variability introduced by the training data (multi-view deep features in our case) can be reduced to a large extent. Thus, we see that the performances was improved and the metric learning methods can learn good metrics through WCCN integration. The obtained results by MSIDA+WCCN method outperform the state of the art on four Parent-Child relations on two databases, KinFaceW-I and KinFaceW-II.

References

  • [1] O. Barkan, J. Weill, L. Wolf, and H. Aronowitz.

    Fast high dimensional vector multiplication face recognition.

    In

    2013 IEEE International Conference on Computer Vision

    , pages 1960–1967, Dec 2013.
  • [2] M. Bessaoudi, A. Ouamane, M. Belahcene, A. Chouchane, E. Boutellaa, and S. Bourennane. Multilinear side-information based discriminant analysis for face and kinship verification in the wild. Neurocomputing, 329:267 – 278, 2019.
  • [3] M. Bordallo Lopez, A. Hadid, E. Boutellaa, J. Goncalves, V. Kostakos, and S. Hosio. Kinship verification from facial images and videos: human versus machine. Machine Vision and Applications, 29(5):873–890, Jul 2018.
  • [4] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. CoRR, abs/1405.3531, 2014.
  • [5] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4):788–798, May 2011.
  • [6] Z. Ding, M. Shao, W. Hwang, S. Suh, J. Han, C. Choi, and Y. Fu. Robust discriminative metric learning for image representation. IEEE Transactions on Circuits and Systems for Video Technology, 29(11):3173–3183, Nov 2019.
  • [7] F. Dornaika, I. Arganda-Carreras, and O. Serradilla. Transfer learning and feature fusion for kinship verification. Neural Computing and Applications, Apr 2019.
  • [8] J. Hu, J. Lu, L. Liu, and J. Zhou.

    Multi-view geometric mean metric learning for kinship verification.

    In 2019 IEEE International Conference on Image Processing (ICIP), pages 1178–1182, Sep. 2019.
  • [9] J. Hu, J. Lu, and Y. Tan. Sharable and individual multi-view metric learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9):2281–2288, Sep. 2018.
  • [10] J. Hu, J. Lu, Y. Tan, J. Yuan, and J. Zhou. Local large-margin multi-metric learning for face and kinship verification. IEEE Transactions on Circuits and Systems for Video Technology, 28(8):1875–1891, Aug 2018.
  • [11] R. L. Iman and W. J. Conover. A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics - Simulation and Computation, 11(3):311–334, 1982.
  • [12] O. Laiadi, A. Ouamane, A. Benakcha, A. Taleb-Ahmed, and A. Hadid. Learning multi-view deep and shallow features through new discriminative subspace for bi-subject and tri-subject kinship verification. Applied Intelligence, 49(11):3894–3908, Nov 2019.
  • [13] O. Laiadi, A. Ouamane, A. Benakcha, A. Taleb-Ahmed, and A. Hadid. Tensor cross-view quadratic discriminant analysis for kinship verification in the wild. Neurocomputing, 2019.
  • [14] O. Laiadi, A. Ouamane, E. Boutellaa, A. Benakcha, A. Taleb-Ahmed, and A. Hadid. Kinship verification from face images in discriminative subspaces of color components. Multimedia Tools and Applications, 78(12):16465–16487, Jun 2019.
  • [15] J. Lu, J. Hu, and Y. P. Tan. Discriminative deep metric learning for face and kinship verification. IEEE Transactions on Image Processing, 26(9):4269–4282, Sept 2017.
  • [16] J. Lu, X. Zhou, Y.-P. Tan, Y. Shang, and J. Zhou. Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Pattern Anal. Mach. Intell., 36(2):331–345, Feb. 2014.
  • [17] D. X. Meina Kan, Shiguang Shan and X. Chen. Side-information based linear discriminant analysis for face recognition. In Proc. BMVC, pages 125.1–125.0, 2011. http://dx.doi.org/10.5244/C.25.125.
  • [18] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015.
  • [19] X. Qin, X. Tan, and S. Chen. Mixed bi-subject kinship verification via multi-view multi-task learning. Neurocomputing, 214:350 – 357, 2016.
  • [20] J. P. Robinson, M. Shao, Y. Wu, and Y. Fu. Families in the wild (fiw): Large-scale kinship image database and benchmarks. In Proceedings of the 2016 ACM on Multimedia Conference, pages 242–246. ACM, 2016.
  • [21] J. P. Robinson, M. Shao, Y. Wu, H. Liu, T. Gillis, and Y. Fu. Visual kinship recognition of families in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11):2624–2637, Nov 2018.
  • [22] J. P. Robinson, M. Shao, H. Zhao, Y. Wu, T. Gillis, and Y. Fu. Recognizing families in the wild (rfiw): Data challenge workshop in conjunction with acm mm 2017. In RFIW ’17: Proceedings of the 2017 Workshop on Recognizing Families In the Wild, pages 5–12, New York, NY, USA, 2016. ACM.
  • [23] D. L. Swets and J. J. Weng.

    Using discriminant eigenfeatures for image retrieval.

    IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):831–836, Aug 1996.
  • [24] S. Wang, J. P. Robinson, and Y. Fu. Kinship verification on families in the wild with marginalized denoising metric learning. In Automatic Face and Gesture Recognition (FG), 2017 12th IEEE International Conference and Workshops on.
  • [25] H. Yan. Kinship verification using neighborhood repulsed correlation metric learning. Image and Vision Computing, 60:91 – 97, 2017.

    Regularization Techniques for High-Dimensional Data Analysis.

  • [26] H. Yan, J. Lu, W. Deng, and X. Zhou. Discriminative multimetric learning for kinship verification. IEEE Transactions on Information Forensics and Security, 9(7):1169–1178, July 2014.
  • [27] H. Yan, J. Lu, and X. Zhou. Prototype-based discriminative feature learning for kinship verification. IEEE Transactions on Cybernetics, 45(11):2535–2545, Nov 2015.
  • [28] H. Yu, C. Y. Chung, K. P. Wong, H. W. Lee, and J. H. Zhang. Probabilistic load flow evaluation with hybrid latin hypercube sampling and cholesky decomposition. IEEE Transactions on Power Systems, 24(2):661–667, May 2009.
  • [29] Y.-G. Zhao, Z. Song, F. Zheng, and L. Shao. Learning a multiple kernel similarity metric for kinship verification. Information Sciences, 430-431:247 – 260, 2018.
  • [30] X. Zhou, K. Jin, M. Xu, and G. Guo. Learning deep compact similarity metric for kinship verification from face images. Information Fusion, 48:84 – 94, 2019.