Kinship Identification through Joint Learning Using Kinship Verification Ensemble

04/14/2020 ∙ by Wei Wang, et al. ∙ University of Amsterdam 1

While kinship verification is a well-exploited task which only identifies whether or not two people are kins, kinship identification is the task to further identify the particular type of kinships and is not well exploited yet. We found that a naive extension of kinship verification cannot solve the identification properly. This is because the existing verification networks are individually trained on specific kinships and do not consider the context between different kinship types. Also, the existing kinship verification dataset has a biased positive-negative distribution, which is different from real-world distribution. To solve it, we propose a novel kinship identification approach through the joint training of kinship verification ensembles and a Joint Identification Module. We also propose to rebalance the training dataset to make it realistic. Rigorous experiments demonstrate the superiority of performance on kinship identification task. It also demonstrates significant performance improvement of kinship verification when trained on the same unbiased data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Kinship is the relationship between people who are biologically related with overlapping genes [19, 20], such as parent-children, sibling-sibling, and grandparent-grandchildren [1, 22, 24, 29] etc.

  In computer vision community, image-based Kinship verification and identification can lead to a variety of applications: missing children searching

[29], family album organization, forensic investigations [24], automatic image annotation [19], social media analysis [35, 5, 3], social behavior analysis [14, 36, 21, 10], historical and genealogical research [16, 5], crime scene investigation [17] etc. Kinship verification is a well-exploited task and aims to identify whether two people are kin or not. However, there is less attention to the kinship identification task. Existing kinship verification methods usually train and test each type of kinship model independently [26, 22, 29], which do not fully use the complementary information among different types. The existing datasets also have unrealistic positive-negative sample distributions. These lead to significant limitations in practice. Since there is no prior knowledge of the distribution of images during testing, all independently trained models should be used to figure out the kinship type of a specific image pair. Fig. 1.a shows an example of providing an image pair to four individually trained verification network from a recent method [34]. Surprisingly, when testing the kinship-relation of this image pair, the networks provide a contradictory outputs from which the testing subjects are simultaneously father-daughter, father-son, mother-daughter and mother-son.

Figure 1: (a) Outputs from individual verification models based on attention Network[34] lead to contradictory results. (b) Output of proposed joint learning method by using attention Network results in a precise softmaxed result.

Considering these limitations of kinship verification methods, we focused our work on kinship identification and proposed a new approach that combines the kinship verification with kinship identification by learning the verification and identification label jointly. Specifically, we ensembled all verification models and combined the binary outputs of verification models to form a multi-class output for the identification. The multi-class and binary outputs were leveraged in a multi-task-learning way during the training process to improve the generalization as well as representation. Unlike the existing methods, the proposed method can use more contextual information from the training samples. We test our proposed kinship identification method on KinfaceWI and KinfaceWII dataset and demonstrate the superiority of performance on the kinship identification task. We also demonstrate that the proposed method can significantly improve the performance of kinship verification when trained on the same unbiased data. To summarize, the contributions of our work are:

  • We propose a joint learned network that simultaneously optimizes the performance of kinship verification and kinship identification.

  • We rebalanced the data from KinfaceWI and KinfaceWII to adapt the real-life distribution for the kinship identification task.

  • The proposed method achieves the superiority of performance on kinship identification with unbiased datasets which was derived from KinfaceWI and KinfaceWII.

2 Related Work

Kinship Verification

In 2010, Fang et al. [9] first attempted to use the handcrafted feature descriptors to test the relation between the different facial parts on the kinship verification task. Later Xia et al.

collected a dataset with the young and old parents images to utilize the intermediate distribution by using transfer learning

[30, 31]. Lu et al. [20, 37] proposed a series of metric learning methods[32, 18, 12] aming at pulling the feature of intraclass samples as close as possible and repulsing the interclass as far as possible. Other handcrafted feature-based methods can be found in [33, 28, 30, 38, 6, 19, 32, 8]

. With the explosion and development of neural networks, the deep learning-based methods

[36, 34]

can make use of the pre-trained neural networks in an off-the-shelf way and enjoy the advantages of deep feature representation. To our knowledge, Zhang

et al.

first attempted to use deep convolutional neural networks

[36] for the kinship verification task, and Yan et al. [34] was the first group to add attention mechanism in the deep network on kinship verification. Both of them reached higher performance compared to handcrafted methods. In recent years, there is a trend that combining different features from both traditional descriptors [33, 37] and deep learning neural networks [4, 13, 25] can improve the feature representations [2]. For example, (m)DML [7, 27] combined a denoising auto-encoder with metric learning and preserved more explicit intrinsic representations. However, as illustrated in Fig. 1.a, most of these methods still focus on each specific type of kinship. They train and test the independent models on the same kinship type data separately, which may not be feasible in reality.

Kinship Identification

Different from kinship verification, there is less attention to kinship identification [1]. [1, 22] gave the explanation of kinship identification while paying modest attention to it without proposing any resolutions. Guo et al. [11] proposed a pairwise kinship identification by using a multi-class linear logistic regressor. Moreover, they also proposed a method utilizing the graph information from one image. Whereas there is not enough data with family information in reality, and this method is limited by using multi-input information. In terms of kinship-based applications like searching for missing children, we need to handle each potential pair online and find out the most likely pairs for specific kinship types. In this case, we need to filter the online data and test the most likely data after filtering. As for the family photo arrangement or social media analysis, we should understand the relationships of people in a photo. Whereas there usually are many faces and different kinship relations in one photo, how to verify the most likely kinship type became crucial. However, the previous work could not handle this. Practically, the kinship verification is closely related to kinship identification. The kinship identification process and kinship verification process can influence each other and improve each other. As shown in Fig. 1.b, we proposed our approach by jointly learning all independent models with kinship verification and kinship identification information. Our aim is to utilize the kinship verification structure and kinship identification advantages to improve the performance of the kinship identification task.

3 Kinship Identification Through Joint Learning with Kinship Verification

In this section, we first explicitly introduce the relationships between three terminologies from literatures: kinship verification, kinship identification, and kinship classification [23]. Accordingly, we introduce the current challenge of kinship identification. And later, we propose three basic ideas for conducting kinship identification.

3.1 Definition of Kinship Verification and Kinship Identification and Kinship Classification

In computer vision literature, kinship recognition is referred to the general tasks for kinship analysis based on visual information. Within it, there are mainly three sub-tasks [22, 1]

: kinship verification, kinship identification, and kinship classification(family recognition). Kinship verification aims to authenticate the relationship between a pair of images of people and to determine whether they are blood-related or not. Kinship identification aims at estimating the degree of kinship relation of people, and it tries to identify the type of kinship relations. Kinship classification

[29, 22] aims to recognize which family that an individual belongs to. In literature, kinship verification and kinship identification are studied independently but closely related, whereas fewer people pay attention to the importance of the identification process. In this paper, we are focusing on kinship identification, which is an important but not well-exploited topic yet.

3.2 Methods for Kinship Identification and Explanation in Metric Feature Space

As depicted in Fig. 2, there are mainly three methods that can be used for kinship identification task: ensembles of kinship verification models, multi-classification network and our proposed joint learning method.

Figure 2: Methods for kinship identification. Different from Fig. 1.a,(a) ensembles binary results and output the lable with maximum value.

3.2.1 Ensembles of Kinship verification models

As shown in Fig. 2.a, we define each individual kinship verification network based on type as binary classification problem. Let be the training set of N pairs of images with types of kinship relation, where and are the images from th parent and th child with the height and width respectively. Assuming the number of corresponding to the kinship type: father-daughter, father-son, mother-daughter, mother-son respectively, the network can be defined as:


where represents the parameters of the learned network and represents the type of kinship relation that verification model focus on. The output of each kinship verification network is a vector. Additionaly, a softmax loss function is used after for the training process, given by Equation 2


where represent the th element of the input vector . Each kinship verification model was trained independently. Different from the structure in Fig. 1.a, to solve the kinship identification task, the ensemble method feed the test data into four kinship verification models simultaneously and ensembels four binary outputs to form a concret result. The output class can be described in Equation 3.


where is the th parent image from type data set and is the th child image from type data set. On the one hand, due to vast difference of inherited features among different kin-type images, the kinship verification models can get better representation on a specific kin-type than the multi-classification model. On the other hand, since each verification model is trained independently, it is easy to be overfitted and effected by the images from other distribution, which cause ambiguity. However the ensemble method can not handle this problem well.

3.2.2 Multi-classification Neural Network

The kinship identification can be taken as a multiple classification problem, which can be tackled by multi-classification neural network:


where and the output is a vector. The basic structure is depicted in Fig. 2.b. The multi-classification network has better generalization than normal binary-classificaiton network(kinship verification network), whereas it may get a weaker representation on a specific kin-type of images with the limitation of dataset and network structure.

3.2.3 Representation of Kinship Relationships in Metric Feature Space and Proposed Methods

(a) Feature learning of Ensembles of Kinship Verification Models
(b) Feature learning of Multi-classification Neural Network
(c) Feature learning strategy of proposed Joint Learning Method
Figure 3: Possible Feature space of different methods during training process. The same shape of feature of samples indicates the sample are from the same family. In (a), each verification model is trained independently.

Considering the limitation of above methods, we proposed a joint learning approach. A joint learning method can make better use of the cues from different task and improve the generalization, compared to separately trained models[15]. The feature learning process can be described in a metric learning process. Ideally, the learned metric space will represent the kinship closeness in a closer distance. However, the existing kinship verification model only considers specific kinship types and ignores the impact of other different types on the mode. As shown in Fig. 2(a), when the father-daughter verification model is being trained, the features of matched father and daughter samples will be pulled together during the training process and the negative daughter images will be pushed away. Whereas, due to the negative samples of father-son pairs not included in the training data, the features of son images are less affected by the training process. Since the model does not thoroughly learn other types of negative samples, the separately trained models can easily conflict each other and result in ambiguous results. In contrast, as the Fig. 2(b) shows, the multi-classfication neural network make use of different types of images and considers the interaction between different types. During training, the multi-classification neural network will learn the different types of representation simultaneously. The feature of sons will be learned as a negative feature for the father-daughter feature space, and the feature of a daughter will be taken as the negative feature of father-son space. The yellow arrow in the Fig. 2(b) indicates that the negative samples will be repulsed away from the unmatched feature space. However, it is highly related to the structure of network. Although training different types of samples at the same time improves the generalization ability of the multi-classification network, training multiple types of data at the same time will affect the model’s representation on specific type. Based on the limitation of above two methods, joint learning method take the advantage of generalization of multi-class training and representation of verification models. Fig. 2

c shows the pipeline of our proposed method. The images will be fed into a Basic-feature Extraction Module, which followed by four Individual Verification Modules. The outputs of four modules form a multi-classification output for the kinship identification. As shown in Fig. 

2(c), the learning strategy of our proposed joint learning model can be explained into two phases. At the first phase, the shared layers learns the global representation and extracts mid-level features of facial images utilizing all kinds of types of training images. At the second phase, each Individual Verification Module focuses on one specific type. Under the constraint of multiple output, these modules pull the target type of samples nearer and push the differnt type of samples further away. Since all the models of joint learning method are learned togther, it can get a better generalisation than ensembles of kinship verification models by learning the shared domain information between complimentary verification task[15]. Also, each joint learned models foucus on each specific kin-type dataset, resulting in a better representation than multi-classification neural network method. As a consequence, the joint learning methods improve the representations and generalisation through sharing the information from complimentary datsets. In Fig. 2.c, the Individual Kinship Verification models are ensembled together. Each model has a binary classification output. Above them, a multiple classification output is introduced as the kinship identification output by combining each kinship verification output logically. The output of kinship identification is a vector and can be described as:


where the represent the th item of vector and represents th item of the output vector of . The output class can be described as:


As a consequence, we aim to utilize the representation capability of the kinship verification models as well as making use of the advantages of kinship identification. This approach consists of two major phases: the combination of different types of images and joint learning. Our main ideas of the approach can be summarized as follows:

  1. We utilize all different kin-types of image pairs to train each kinship model, not based on a specific type.

  2. Different models are trained jointly, which will learn better shared representations among different types of subjects.

Notice that, naively using a single classification network (Fig. 4.a) or naively combine multiple verification networks (Fig. 4.b) will not provide an optimal performance. As described above, our proposed network (Fig. 4.c) can best utilize the advantage of both tasks. Without loss of generosity, we introduce our implementation on four relationships: father-daughter (F-D), father-son (F-S), mother-daughter (M-D), mother-son (M-S).

3.2.4 Real World Kinship Distribution and Dataset Bias

It is also important to notice that current kinship verification dataset distribution is unrealistic from real practice. Specifically, the proportion of positive and negative samples is highly unbalanced in the real life. For example, for the application of online family picture organization, we want to find out the matched pairs of images with a specific kinship relationship from a gallery of thousands of images, in which the kin-related samples are only a small portion of data. Another example is that when searching for missing children, we want to retrieve a picture that looks the most like the child of the parents in which the majority of these samples are negative samples. On the contrary, the existing dataset for verification has a 1:1 positive and negative sample distribution.

4 Joint Learning of Kinship Identification and Kinship Verification

Figure 4: Structure of proposed JLNet

4.1 Network Architecture of Proposed Joint Learning Method

We proposed a joint learnt network(JLNet) based on the introduced learning strategy shown in Fig. 2(c), which utilizes the better representation of kinship verification models on one specific kin-type pairs and the better generalization of Multi-classification Network. The architecture of proposed joint learning network(JLNet) is portrayed in Fig. 4. The structure of JLnet consist of three modules: Basic-feature Extraction Module, Individual Verification Module and Joint Identification Module. Following [34], we rely on Attention Network since it enjoys a outstanding performance among deep convolutional network for the kinship verification task.

4.1.1 Basic-feature Extraction Module

We took Attention Network as the basic framework of kinship verification model. The Attention Network used a bottom-up top-down structure, consisting three attention stages. Each stage consist of one attention module and one residual structure. In order to make use of the shared information between complimentary task, the parameters of front two stages of attention Network were shared to learn low-level and mid-level features from input images, forming the Basic-feature Extraction Module. This Basic-feature Extraction Module extracts the general basic feature of facial images to improve the feature representation and the basic feature will be further extracted according to the kinship type in the following layers.

4.1.2 Individual Verification Module

After that, we focus on the high-level feature extraction on different types of images. Four separate branches were added after the last layer(a max pool layer) of Basic-feature Extraction Module. Each branch foucus on one specific kin-type saparetly, resulting in four Individual Verification Modules. Every Individual Module has the same structure as the third stage of Attention Network but focuses on different kinship types. The separated branches better improve the capability of representations on different kinship types of images.

4.1.3 Joint Identification Module

After individual verification modules, a multiple output with vector was derived from the concatenation of the second items of binary outputs and the minumum item of first items of binary outputs, which forms the Joint Identification Module. The multiple output can be described in Eq. 5. The logical concatenation of last layers of Individual Verification Modules were taken as the final layer as the kinship identification outputs, result in a constraint over all the verification output, focing them to focus on the convergence of both verification tasks and identification task. Over all, the Basic-feature Extraction Module maps the input pairs into a mid-level feature space, and then the Individual Verification Modules extract the high-level features and learn the specific representations among each kin-type individually. The Joint Identification Module utilized information from four binary outputs. Since a softmax function was added after the multiple output, each verification model learns more mutually exclusive information during the training process, which avoid the confusion.

4.2 Model Training

Taking the unbalanced positive and negative samples into account, we used the Weighted Cross Entropy as the loss function. Let correspond to the

th element of one-hot encoded label for the kinship identification task and

be the th element of one-hot encoded label for th type of kinship verification task, where corresponds to the type of father-daughter, father-son, mother-daughter and mother-son respectively. Denoting the as the Weighted Cross Entropy loss of Joint Identification Module and as the Weighted Corss Entropy loss of th Individual Verification Module, the loss of the whole JLNet can be written as a weighted summation of kinship verification loss and kinship identification loss:


where is the th weight of and is the th item of softmaxed output of

th kinship verification model. As for model training, it is crucial for Individual Verification Modules to get better representations on each specific type. Since the weight selection can influence the model performance, we leaned the weight updating to verification task to ensure the verification branches have better representations. During the training, the parameter updating process was divided into two steps in each epoch. At the first step, all Individual Verification Modules will update their parameters individually. At the second step, the parameters of the whole structure would be updated based on the weighted summation loss. The alogrithm of parameter updation is deployed in Algorithm 


1 Input: The th parent image and child image from type training set
2 Output: The parameters of JLNet
3 initialization(JLNet);
4 while epoch epoch numbers do
5       if  epoch in matched epoch_lambda_milestone(() then
6             choose matched weight
7       end if
8      During each epoch;
9       step 1: parameter updating based on binary outputs losses:
10       learning rate of Optimizer: ;
11       parameter update;
12       step 2: parameter updating based on multi loss:
13       learning rate of Optimizer:
14       parameter update;
16 end while
Algorithm 1 Parameter Updating Druing Training
Figure 5: Testing Mode of proposed JLNet by combining binary output and multiple output

4.3 Model Testing

JLNet can handle both kinship verification and kinship identification task. In different task, we use different ways for testing. For kinship identification task, to make fully use of the information from both kinship verification results and kinship identification results, we combined the results from binary outputs and multiple output logically. The combined result is based on the confidence of these two types of outputs. The process of combination can be described in Fig. 5. For the kinship verification task, we used the matched single verification model for testing since a single verification model has better representation on specific type.

5 Experiment

We generated three datasets and conducted our proposed Joint Learning method (JLNet) and two comparative experiments: Ensembles of Kinship Verification models and Multi-Classification Network. The structures of three experiments used the same network and were trained with the same batch size and the same data augmentation methods.

5.1 Unbias Dataset for Training and Testing

Three benchmark datasets were derived from KinfaceWI and KinfaceWII [19, 20], which are the most used public dataset for kinship verification task. In KinfaceWI, there are 156 pairs of father-son, 134 pairs of father-daughter, 116 pairs of mother-daughter, and 127 pairs of mother-son. Meanwhile, in KinfaceWII, there are 250 pairs of pictures for each kinship relation. All these two datasets consist of four kinship types: father-daughter (F-D), father-son (F-S), mother-daughter (M-D), mother-son (M-S). To create an unbiased dataset in the experiment environment, we utilized four types of dataset and created three image set. The ratio of each type of positive samples and negative samples becomes . To this end, we formed three benchmark datasets:

  1. Independent Verification Image Set: This dataset has four independent subsets, and each subset has one specific kinship type. This dataset is the same as the KinfaceWI or KinfaceWII. The positive samples are the parent-children pairs with the same type of kinship. The negative samples are the pairs of unrelated parents and children within the same kin-type distribution. The positive and negative ratio is for both training and testing.

  2. Mixed-type Image Set: This dataset combines four different kin-type images with a ratio of . This dataset is used for both training and testing. Image pairs with the kinship relation are formed to be the positive samples. The negative samples are random image pairs without kinship relation within the same type distribution.

  3. Real-scenario Image Set: This data simulates the data distribution in a real scenario(e.g.,retrieval of missing children). All the images in kinfaceWI or KinfaceWII will be paired one by one, which leads to the highly unbalanced positive-negative rate. Taking KinfaceWII as an example, in each cross-validation, there will be 400 images(200 positive pairs). All these images will be paired one by one and result in pairs to be tested. The ratio of positive pairs and negative pairs is around .

5.2 Experiment Design

5.2.1 Proposed Joint Learning Method

Joint Learning

In the training process of Joint Learning approach, we used Mixed-type Image Set for training. The dataset was divided into 5-folds and verified by 5-cross validation. We used Adam as the optimizer, and the learning rate was set to be . The data was augmented by randomly changing the brightness, contrast, and saturation of the image. Random changing to the grayscale, random horizontally flipping, random perspective-changing, and random resizing and cropping are also included. All the images have the same size , and the batch size is set to be 64. Each epoch is divided into two phases during training as shown in Algorithm. 1. The first phase is the updating of four separate models without the constraint of multiple outputs. The weight list of weighted cross-entropy loss was set to be . The second phase is to update network parameters jointly by using both binary output and multiple-output. The weight matrix of cross-entropy of the multiple outputs was set to be . The of the total loss were set to be respectively. Since there is no public code of attention network, we re-implemented the attention network from scratch. The trained models of JLNet will test Independent Verification Image Set, Mixed-type Image Set and Real-scenario Image Set in turn.The results of JLNet are listed in Table. 1-5

. The confusion matrix is depicted in Figure

6 and figure 7.

5.2.2 Ablation study

Joint Learning without the Backpropagation of Multiple Outputs Results (JLNet)

In order to study whether the additional multi-classification output as a constraint is helpful to the model or not, we designed this control experiments for the ablation study. The structure of JLNet is exactly the same structure as the JLNet and Joint Learning was trained in the same way as Joint Learning Method did, but it does not take multiple outputs results into parameter updating.

Joint Learning using the multiple output for kinship identification(JLNet)

In this ablation study, we used the trained model of JLNet directly but only the multiple output were used as the final result during testing process.

5.2.3 Comparitive Experiments

Ensembles of Verification Models

For the Ensembles of Kinship Verification models, we conducted two ways to train the models:

  • Ensemble Verification*: Each verification model is trained separately on Independent Verification Image Set, which was the same as [34].

  • Ensemble Verification: Each verification model is trained on Mixed-type Image Set, which was the same as the training data of JLNet and Multi-class Net.

Adam was used as the optimizer, and the learning rate was set to be .

Multi-Classification Network(Multi-class Net)

As for the Multi-class Net approach, the Mixed-type Image Set was used as the training data. We used Adam as optimizer. The learning rate was set to be . A weight matrix of was used in weighted Cross-Entropy loss while training.

5.3 Results & Evaluation

5.3.1 Results on Independent Verification Image Set

max width=1 Methods Independent Verification Image Set1 Independent Verification Image Set2 F-D F-S M-D M-S Mean F-D F-S M-D M-S Mean Ensemble Verification* 0.7017 0.7506 0.741 0.615 0.7021 0.746 0.744 0.752 0.732 0.7435 Multi-class Net 0.6463 0.6797 0.665 0.577 0.642 0.588 0.624 0.62 0.592 0.606 Ensemble Verification 0.6425 0.6321 0.6382 0.577 0.6224 0.606 0.600 0.586 0.626 0.6045 JLNet 0.6534 0.6991 0.6539 0.5772 0.6459 0.616 0.61 0.600 0.650 0.619 JLNet(proposed) 0.6947 0.7469 0.7004 0.6025 0.6861 0.7 0.744 0.718 0.728 0.7225

Table 1: The accuracy of different methods through 5-fold cross-validation on Independent Verification Image Set1(based on KinfaceWI) and Independent Verification Image Set2(based on KinfaceWII). Ensemble Verification* is trained on the matched Independent Verification Image Set. JLNet is trained without the constraint of multi-class output.

max width=1 Methods Independent Verification Image Set1 Independent Verification Image Set2 F-D F-S M-D M-S Mean F-D F-S M-D M-S Mean Ensemble Verification* 0.6915 0.7472 0.7566 0.6648 0.715 0.7671 0.7589 0.769 0.7607 0.7639 Multi-class Net 0.6084 0.6563 0.6767 0.5766 0.6295 0.5629 0.6000 0.6143 0.5062 0.5709 Ensemble Verification 0.6639 0.6737 0.6735 0.6083 0.6548 0.6213 0.6439 0.6051 0.6399 0.6276 JLNer 0.6301 0.6952 0.6496 0.5816 0.6391 0.6396 0.6166 0.6061 0.6191 0.6203 JLNet(proposed) 0.684 0.7374 0.6902 0.6074 0.6798 0.7189 0.7588 0.7275 0.7526 0.7394

Table 2: The F1 score of different methods through 5-fold cross-validation on Verification Image Set1(based on KinfaceWI) and Verification Image Set2(based on KinfaceWII). Ensemble Verification* is trained on the matched Verification Image Set. JLNet is trained without the constraint of multi-class output.

Table 1 shows the verification accuracy results of different methods tested on each Independent Verification Image set from KinfaceWI or KinfaceWII. All these methods are trained on Mixed-type Image Set except ensemble verification*. The results showed that when trained on the same dataset(Mixed-type Image Set), the Joint Learning approach surpasses all the other approaches. Tested on the Independent Verification Image Set1, the JLNet surpasses the multi-class net method by around 11.6% and surpasses the Ensemble Verification approach by 11.8% on average accuracy. As for the F1 score, the JLNet surpasses the Multi-class Net method by around 16.8% and surpasses the Ensemble Verification approach by around 11.2% on average. When comparing the JLNet and JLNet, we can see that the additional multi-output can improve the results of ensembled models. When compared with the Ensemble Verification*, the Joint Learning approach can still get compelling results. Considering Ensemble Verification* is trained and tested on the same Independent Verification Image Set, it may cause overfitting. The JLNet has a better generalization than Ensemble verification*, which can be proved in the next session.

5.3.2 Results on Mixed-type Image Set

max width=1 Methods Mixed-type Image Set1 Mixed-type Image Set2 macro F1 acc macro F1 acc Ensemble Verification* 0.3240 0.3723 0.2846 0.3319 Multi-class Net 0.5291 0.5494 0.4861 0.5225 Ensemble Verification 0.4837 0.4887 0.4464 0.4564 JLNet 0.5139 0.5467 0.4611 0.4850 JLNet 0.5377 0.5898 0.5003 0.5285 JLNet(proposed) 0.5392 0.6002 0.5143 0.5685

Table 3: Macro F1 score and Accuracy of different methods for kinship identification task on Mixed-type Image Set1(base on KinfaceWI) and Mixed-type Image Set2(based on KinfaceWII). Ensemble Verification* is trained on independent Verification Image Set. JLNet is trained without the constraint of multi-class output. JLNet only use the multi-class output as the final result for the kinship identification.

Table 3 shows the results of macro F1 score and Accuracy for kinship identification task based on Mixed-type Image Set. The results show that the performances of JLNet surpass the Ensemble Verification method and Multi-class Net method. As shown in Fig. 6 and Fig. 7, the Ensembled Verification method get severe confusion. The independently trained verification models can lead to overfitting and have week generalization capability. Different from that, JLNet got the highest performance. From Fig. 6, we can see that the Joint Learning method gets a less confused result. To this end, the Joint Learning method obtains a superiority of performance for the kinship identification on the Mixed-type Image Set.

5.3.3 Results on Real-scenario Image Set

max width=1 Methods Real-scenario Image Set(based on KinfaceWI) F-D F-S M-D M-S mean F10 all acc Ensemble Verification* 0.0886 0.1179 0.1236 0.1003 0.1076 0.1830 0.4807 Multi-class Net 0.1548 0.2951 0.3047 0.1539 0.2271 0.2947 0.5618 Ensemble Verification 0.1508 0.2791 0.2740 0.1378 0.2104 0.2596 0.4537 JLNet 0.1510 0.2951 0.2899 0.1593 0.2238 0.2980 0.5916 JLNet 0.1670 0.3169 0.2986 0.1590 0.2354 0.3280 0.6949 JLNet(proposed) 0.1762 0.3272 0.2935 0.1710 0.2420 0.3464 0.7606

Table 4: Macro F10 score and Accuracy of different methods on Real-scenario Image Set based on KinfaceWI dataset. Ensemble Verification* is trained on Independent Verification Image Set. JLNet is trained without the constraint of multi-class output. JLNet only use the multi-class output as the final result for the kinship identification. F10 all represents the average of Macro F10 scores of all different labels(the negative label is also included).

max width=1 Methods Real-scenario Image Set(based on KinfaceWII) F-D F-S M-D M-S mean F10 all acc Ensemble Verification* 0.0469 0.0713 0.0726 0.0904 0.0703 0.1498 0.4647 Multi-class Net 0.1468 0.1972 0.1853 0.1076 0.1592 0.2528 0.6240 Ensemble Verification 0.1399 0.1681 0.1496 0.0900 0.1369 0.2075 0.4874 JLNet 0.1409 0.1743 0.1595 0.0974 0.1430 0.2297 0.5738 JLNet 0.1496 0.1930 0.2030 0.1158 0.1654 0.2561 0.6161 JLNet(proposed) 0.1708 0.2111 0.2345 0.1226 0.1847 0.2937 0.7267

Table 5: Macro F10 score and Accuracy of different methods on Real-scenario Image Set based on KinfaceWII dataset. Ensemble Verification* is trained on Independent Verification Image Set. JLNet is trained without the constraint of multi-class output. JLNet only use the multi-class output as the final result for the kinship identification. F10 all represents the average of Macro F10 scores of all different labels(the negative label is also included).
(a) Multi-class Net
(b) Ensemble Verification
(c) JLNet
Figure 6: Confusion Matrix of different experiments on mixed-type sample set based on KinfaceWI dataset.
(a) Multi-class Net
(b) Ensemble Verification
(c) JLNet
Figure 7: Confusion Matrix of different experiments on mixed-type sample set based on KinfaceWII dataset.

Table 4 and Table 5 shows the results of the F10 score and Accuracy for kinship identification task in the real scenario task. In reality, we focus more on recall than precision, so we used the F10 score to emphasize on the recall rate in the real scenario. The results showed that the JLNet obtained the best performance on both KinfaceWI-based on Real-scenario Image Set and KinfaceWII-based on Real-scenario Image Set. The results show that the JLNet surpassed all the other experiments conducted above. As for the results on KinfaceWII-based real-scenario data, the JLNet surpasses the second-best approach by 4.1% on the F10 score.

5.3.4 Results of Selected Samples

We selected several representative samples and viewed the results. Fig. 8 represents the identification results of different approaches. The results show that independently trained verification models tend to get ambiguous results (yes represent the positive output of a specific model). The final results show that the JLNet yields a more precise result compared to other methods.

Figure 8: Validation of each kinship identification methods on selected samples

Based on the results shown above, the following conclusions can be drawn:

  1. We proposed a new network JLNet for the kinship identification task in real scenarios.

  2. Our proposed approach gets the a superiority of performance for the kinship identification task based on the Mixed-type Image Set and the Real-scenario Image Set(based on the KinfaceWI and KinfaceWII datasets).

  3. The experiment shows that kinship identification and kinship verification can provide complementary information. The combination of kinship identification and kinship verification can further improve performance.

  4. Since our approach is not restricted to a specific network, it is promising for further improvement by using a better structure of the neural network.

6 Conclusion

In this paper, we investigated a new area for the kinship identification task. Firstly, three benchmark dataset were generated from KinfaceWI and KinfaceWII. Secondly, an neural network JLNet was proposed. We got the superiority of performance on proposed datasets. Experimental results demonstrated that joint learning with kinship verification and identification could improve the performance of kinship identification. Since this approach is not restricted to any neural network, a better architecture can further improve the performance for kinship identification.


  • [1] M. Almuashi, S. Z. M. Hashim, D. Mohamad, M. H. Alkawaz, and A. Ali (2017) Automated kinship verification and identification through human facial images: a survey. Multimedia Tools and Applications 76 (1), pp. 265–307. Cited by: §1, §2, §3.1.
  • [2] E. Boutellaa, M. B. López, S. Ait-Aoudia, X. Feng, and A. Hadid (2017) Kinship verification from videos using spatio-temporal texture features and deep learning. arXiv preprint arXiv:1708.04069. Cited by: §2.
  • [3] R. L. Burch and G. G. Gallup Jr (2000) Perceptions of paternal resemblance predict family violence. Evolution and Human Behavior 21 (6), pp. 429–435. Cited by: §1.
  • [4] X. Cai, C. Wang, B. Xiao, X. Chen, and J. Zhou (2012) Deep nonlinear metric learning with independent subspace analysis for face verification. In Proceedings of the 20th ACM international conference on Multimedia, pp. 749–752. Cited by: §2.
  • [5] L. M. DeBruine, F. G. Smith, B. C. Jones, S. C. Roberts, M. Petrie, and T. D. Spector (2009) Kin recognition signals in adult faces. Vision research 49 (1), pp. 38–43. Cited by: §1.
  • [6] H. Dibeklioglu, A. Ali Salah, and T. Gevers (2013) Like father, like son: facial expression dynamics for kinship verification. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1497–1504. Cited by: §2.
  • [7] Z. Ding, S. Suh, J. Han, C. Choi, and Y. Fu (2015)

    Discriminative low-rank metric learning for face recognition

    In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. 1, pp. 1–6. Cited by: §2.
  • [8] R. Fang, A. C. Gallagher, T. Chen, and A. Loui (2013) Kinship classification by modeling facial feature heredity. In 2013 IEEE International Conference on Image Processing, pp. 2983–2987. Cited by: §2.
  • [9] R. Fang, K. D. Tang, N. Snavely, and T. Chen (2010) Towards computational models of kinship verification. In 2010 IEEE International conference on image processing, pp. 1577–1580. Cited by: §2.
  • [10] D. M. Fessler and C. D. Navarrete (2004) Third-party attitudes toward sibling incest: evidence for westermarck’s hypotheses. Evolution and Human Behavior 25 (5), pp. 277–294. Cited by: §1.
  • [11] Y. Guo, H. Dibeklioglu, and L. Van der Maaten (2014) Graph-based kinship recognition. In

    2014 22nd international conference on pattern recognition

    pp. 4287–4292. Cited by: §2.
  • [12] J. Hu, J. Lu, J. Yuan, and Y. Tan (2014) Large margin multi-metric learning for face and kinship verification in the wild. In Asian Conference on Computer Vision, pp. 252–267. Cited by: §2.
  • [13] G. B. Huang, H. Lee, and E. Learned-Miller (2012)

    Learning hierarchical representations for face verification with convolutional deep belief networks

    In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2518–2525. Cited by: §2.
  • [14] G. Kaminski, S. Dridi, C. Graff, and E. Gentaz (2009) Human ability to detect kinship in strangers’ faces: effects of the degree of relatedness. Proceedings of the Royal Society B: Biological Sciences 276 (1670), pp. 3193–3200. Cited by: §1.
  • [15] A. Kendall, Y. Gal, and R. Cipolla (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491. Cited by: §3.2.3.
  • [16] M. J. KHOURY, B. H. COHEN, E. L. DIAMOND, G. A. CHASE, and V. A. MCKUSICK (1987) Inbreeding and prereproductive mortality in the old order amish. i. genealogic epidemiology of inbreeding. American journal of epidemiology 125 (3), pp. 453–461. Cited by: §1.
  • [17] N. Kohli, D. Yadav, M. Vatsa, R. Singh, and A. Noore (2018)

    Supervised mixed norm autoencoder for kinship verification in unconstrained videos

    IEEE Transactions on Image Processing 28 (3), pp. 1329–1341. Cited by: §1.
  • [18] J. Lu, J. Hu, and Y. Tan (2017) Discriminative deep metric learning for face and kinship verification. IEEE Transactions on Image Processing 26 (9), pp. 4269–4282. Cited by: §2.
  • [19] J. Lu, J. Hu, X. Zhou, Y. Shang, Y. Tan, and G. Wang (2012) Neighborhood repulsed metric learning for kinship verification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2594–2601. Cited by: §1, §2, §5.1.
  • [20] J. Lu, X. Zhou, Y. Tan, Y. Shang, and J. Zhou (2013) Neighborhood repulsed metric learning for kinship verification. IEEE transactions on pattern analysis and machine intelligence 36 (2), pp. 331–345. Cited by: §1, §2, §5.1.
  • [21] C. Ober, T. Hyslop, and W. W. Hauck (1999) Inbreeding effects on fertility in humans: evidence for reproductive compensation. The American Journal of Human Genetics 64 (1), pp. 225–231. Cited by: §1.
  • [22] J. P. Robinson, M. Shao, Y. Wu, and Y. Fu (2016) Families in the wild (fiw): large-scale kinship image database and benchmarks. In Proceedings of the 24th ACM international conference on Multimedia, pp. 242–246. Cited by: §1, §2, §3.1.
  • [23] J. P. Robinson, M. Shao, Y. Wu, H. Liu, T. Gillis, and Y. Fu (2018) Visual kinship recognition of families in the wild. IEEE transactions on pattern analysis and machine intelligence 40 (11), pp. 2624–2637. Cited by: §3.
  • [24] J. P. Robinson, M. Shao, H. Zhao, Y. Wu, T. Gillis, and Y. Fu (2017) Recognizing families in the wild (rfiw): data challenge workshop in conjunction with acm mm 2017. In Proceedings of the 2017 Workshop on Recognizing Families in the Wild, pp. 5–12. Cited by: §1.
  • [25] Y. Sun, X. Wang, and X. Tang (2014) Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1891–1898. Cited by: §2.
  • [26] S. Wang, Z. Ding, and Y. Fu (2018) Cross-generation kinship verification with sparse discriminative metric. IEEE transactions on pattern analysis and machine intelligence 41 (11), pp. 2783–2790. Cited by: §1.
  • [27] S. Wang, J. P. Robinson, and Y. Fu (2017) Kinship verification on families in the wild with marginalized denoising metric learning. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 216–221. Cited by: §2.
  • [28] X. Wang, T. X. Han, and S. Yan (2009) An hog-lbp human detector with partial occlusion handling. In 2009 IEEE 12th international conference on computer vision, pp. 32–39. Cited by: §2.
  • [29] Y. Wu, Z. Ding, H. Liu, J. Robinson, and Y. Fu (2018) Kinship classification through latent adaptive subspace. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 143–149. Cited by: §1, §3.1.
  • [30] S. Xia, M. Shao, and Y. Fu (2011) Kinship verification through transfer learning. In

    Twenty-Second International Joint Conference on Artificial Intelligence

    Cited by: §2.
  • [31] S. Xia, M. Shao, J. Luo, and Y. Fu (2012) Understanding kin relationships in a photo. IEEE Transactions on Multimedia 14 (4), pp. 1046–1056. Cited by: §2.
  • [32] H. Yan, J. Lu, W. Deng, and X. Zhou (2014) Discriminative multimetric learning for kinship verification. IEEE Transactions on Information forensics and security 9 (7), pp. 1169–1178. Cited by: §2.
  • [33] H. Yan and J. Lu (2017)

    Facial kinship verification: a machine learning approach

    Springer. Cited by: §2.
  • [34] H. Yan and S. Wang (2019) Learning part-aware attention networks for kinship verification. Pattern Recognition Letters 128, pp. 169–175. Cited by: Figure 1, §1, §2, §4.1, 1st item.
  • [35] L. A. Zebrowitz and J. M. Montepare (2008) Social psychological face perception: why appearance matters. Social and personality psychology compass 2 (3), pp. 1497–1517. Cited by: §1.
  • [36] K. Zhang, Y. Huang, C. Song, H. Wu, and L. Wang (2015-09) Kinship verification with deep convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC), G. K. L. Tam (Ed.), pp. 148.1–148.12. External Links: Document, ISBN 1-901725-53-7, Link Cited by: §1, §2.
  • [37] X. Zhou, J. Hu, J. Lu, Y. Shang, and Y. Guan (2011) Kinship verification from facial images under uncontrolled conditions. In Proceedings of the 19th ACM international conference on Multimedia, pp. 953–956. Cited by: §2.
  • [38] X. Zhou, J. Lu, J. Hu, and Y. Shang (2012) Gabor-based gradient orientation pyramid for kinship verification under uncontrolled environments. In Proceedings of the 20th ACM international conference on Multimedia, pp. 725–728. Cited by: §2.