1 Introduction
Some research [1] in biology finds that human facial appearance contains important kin related information. Inspired by this finding, many methods [2, 3]
have been proposed for kinship recognition from facial images. The goal of kinship verification is to determine whether or not a kin relation exists for a given pair of facial images. Kinship verification has attracted increasing attention in the computer vision community due to its broad applications such as automatic album organization
[4], missing children searching [2], social mediabased analysis [5], and children adoptions [6].Although a variety of efforts [2, 3, 6] have been devoted to kinship verification, it is still far from ready to be deployed for any realworld uses. There are several challenges preventing the development of kinship recognition. First, as other facerelated tasks [7, 8], facial kinship verification is also confronted with a large variation of the pose, scale, and illumination, which makes learning discriminative features quite challenging. Second, unlike face verification which investigates the relations between different images of an entity, kinship verification has to discover the hidden similarity inherited by genetic relations between different identities, which naturally leads to a much larger appearance gap of intraclass samples, especially when there are significant gender differences and age gaps in kinship verification.
Many methods have been proposed to address these challenges over the past few years. Most of them pay attention to learning discriminative features for each facial image of a paired sample. For example, Lu et al. [2] proposed the NRML metric to pull intraclass samples as close as possible and push interclass samples in a neighborhood as far as possible in the learned feature space. Nevertheless, these approaches usually apply a similarity metric [9] or a multilayer perceptron (MLP) [6]
to the extracted features to obtain the probability of kinship between two facial images, which couldn’t fully exploit the genetic relations of two features.
In this paper, we focus on the aspect of how to compare and fuse the two extracted features of a paired sample to reason about the genetic relations. We hypothesize that when people reason about the kinship relations, they usually first compare the genetically related attributes of two individuals, such as cheekbone shape, eye color, and nose size, and then make comprehensive judgments based on these comparison results. For two given features of a paired sample, we consider that each dimension of the feature encodes one kind of kin related information. Therefore, we explicitly model the reasoning process of humans by comparing the features for each dimension and then fusing them. More specifically, we build a star graph named kinship relational graph for two features to perform relational reasoning, where each peripheral node models one dimension of features and the central node is utilized as a bridge of communication. We further propose a graphbased kinship reasoning (GKR) network on this graph to effectively exploit the hidden kin relations of extracted features. The key differences of our method and most existing methods are visualized in Figure 1. We validate the proposed GKR for kinship verification on two benchmarks: KinFaceWI [2] and KinFaceWII [2] datasets, and the results illustrate that our method outperforms the stateoftheart approaches.
2 Related Work
Kinship Verification: In the past few years, many papers [3, 6, 10, 11]
have been published for kinship verification and most of them focus on extracting discriminative features for each image, which can be divided into three categories: handcrafted methods, distance metricbased methods, and deep learningbased methods.
Handcrafted methods require researchers to design the feature extractors by hand. For example, as one of the earliest works, Fang et al. [10] proposed extracting color, facial parts, facial distances, and gradient histograms as the features for classification. Zhou et al. [4] further presented a Gaborbased gradient orientation pyramid (GGOP) feature representation method to make better of multiple feature information. Distance metricbased methods [12, 13] are the most popular methods for kinship verification, which aim to learn a distance metric such that the distance between positive face pairs is reduced and that of negative samples is enlarged. Yan et al. [3] first extracted multiple features with different descriptors and then learned multiple distance metrics to exploit complementary and discriminative information. A discriminative deep metric learning method was introduced in [9], which learned a set of hierarchical nonlinear transformations with deep neural networks. Zhou et al. [11] explicitly considered the discrepancy of crossgeneration and a kinship metric learning method with a coupled deep neural network was proposed to improve the performance. Recent years have witnessed the great success of deep learning. However, few deep learningbased works have been done for kinship verification. Zhang et al. [6] were the first attempt for kinship verification with deep CNNs and demonstrated the effectiveness of their method. Hamdi [14] further studied videobased kinship verification with deep learning. All these methods only focus on learning good feature representations, which ignore how to reason about the kin relations with extracted embeddings.
Graph Neural Networks: A variety of practical applications deal with the complex nonEuclidean structure of graph data. Graph neural networks (GNNs) are proposed to handle these kinds of data, which learn features on graphs. Li et al. [15]
proposed gated graph neural networks (GGNNs) with gated recurrent units, which could be trained with modern optimization techniques. Motivated by the success of convolutions on the image data, Kipf and Welling introduced the graph convolutional networks (GCNs)
[16]by applying the convolutional architecture on graphstructured data. A layerwise propagation rule was utilized in GCNs and both local graph structure and node features were encoded for the task of semisupervised learning. Petar
et al. [17] further proposed the graph attention networks (GATs) to assign different weights to different nodes in a selfattentional way. The generated weights didn’t require any prior knowledge of the graph structure. GATs were computationally efficient and had a larger model capacity due to the attention mechanism. GNNs have proven to be a good tool for relational reasoning. For example, Sun et al. [18] constructed a recurrent graph to jointly model the temporal and spatial interactions among different individuals with GNNs for action forecasting.3 Proposed Approach
In this section, we first present the problem formulation. Then we illustrate the details of the kinship relational graph building process. Lastly, we introduce the proposed graphbased kinship reasoning (GKR) network.
3.1 Problem Formulation
We use to denote the training set of paired images with kin relations, where and are the parent image and child image, respectively, and is the total number of the positive training set. Therefore, the negative training set is built as , where each parent image and each unrelated child image form a negative sample. However, the size of the negative training set is much larger than that of positive training set given that and . So we randomly select negative samples from the set to build a balanced negative training set such that . Then the whole training set is constructed with the union of the positive training set and negative training set: .
The goal of kinship verification can be formulated as learning a mapping function, where the input is a paired sample and the output is the probability of . Most existing methods aim to learn a good feature extractor . Handcrafted methods usually design shallow features by hand to implement , whereas deep learningbased methods usually learn a deep neural network as the extractor
. Metric learningbased methods usually first use handcrafted features or deep features as the initial sample features
, and then learn a distance metric:(1) 
where . In the end, we obtain the projected features and , where denotes the feature dimension.
Having obtained the features , we still need to learn a mapping function to map the features to a probability of kin relation between and . Most current methods mainly focus on the feature extractor and usually neglect the design of . One choice is to simply concatenate two features and send them to a multilayer perceptron (MLP):
(2) 
where represents the concatenation operation. Another commonly used way is to calculate the cosine similarity of two features:
(3) 
Both methods can’t fully exploit the relations of two features. In this paper, we aim to design a new to effectively perform relational reasoning on the two extracted features.
3.2 Building a Kinship Relational Graph
In recent years, deep CNNs have achieved great success in many computer vision tasks, such as image classification, object detection, and scene understanding, which demonstrates their superior ability for feature representation. Therefore, we utilize a deep CNN as the feature extractor
in this paper.Having obtained the deeply learned sample features , we consider how to perform relational reasoning on them. To achieve this, we first observe how humans reason about kin relations. As the genetic traits are usually exhibited by facial characteristics, humans reason about the kin relations by comparing the genetically related attributes to discover the hidden similarity. For example, if we find that the persons on two facial images have the same eye color and similar cheekbones, the probability that they are related will be higher. After comparing a variety of informative facial attributes of two persons, humans make the final decision by combining and analyzing all the information.
We explicitly model the above reasoning process by constructing a kinship relational graph and performing relational reasoning on this graph. We consider that each dimension of the extracted features encodes one kind of genetic information and we can reason about the kin relations by comparing and fusing all the genetic information. Since we use the same CNN to extract features for two images, the values of two features in the same dimension represent the comparison of one kind of kinship related information encoded in that dimension. We use one node in the kinship relational graph to denote the comparison of one feature dimension, then we have nodes which describe the comparisons in all dimensions. To fuse these comparisons, we need to define the interactions of these nodes. One intuitive way is to connect all the nodes given that any two nodes may have a relation. However, such a graph greatly increases the computational complexity of subsequent operations. Therefore, we create a super node that is connected to all other nodes while all other nodes are only connected to the super node. The super node is also the central node of the starstructured kinship relational graph, which plays an important role in the interaction and information communication of surrounding nodes. In this way, we build the kinship relational graph and will elaborate on the reasoning process with the proposed graphbased kinship reasoning network in the following subsection.
3.3 Reasoning on the Kinship Relational Graph
Having built the kinship relational graph, we consider how to perform relational reasoning on this graph. Recently, graph neural networks (GNNs) have attracted increasing attention for representation learning of graphs. Generally speaking, GNNs employ a recursive messagepassing scheme, where each node aggregates the messages sent by its neighbors to update its feature. We follow this scheme and propose the graphbased kinship reasoning (GKR) network to perform relational reasoning on the kinship relational graph.
Formally, Let denote the kinship relational graph with the node set and the edge set
. Each node in the graph has a feature vector and we have
, where represents the feature vector of the central node and is that of surrounding node. The edge set of this graph is formulated as , where denotes the edge between node and . The proposed GKR propagates messages according to the graph structure defined by and the aggregated messages are utilized to update the node features. As mentioned above, we set the initial node features as the values of two extracted image features in one dimension. Mathematically, the initial node features are set as follows:(4) 
where denotes the initial feature of node, and represent the values in the dimension of features and , respectively. In this way, each node encodes one kind of kinship related information.
The proposed GKR consists of layers where each layer represents one time step of the message passing phase. The layer transforms the node features into with message passing to perform relational reasoning, where and represent the corresponding feature dimensions. Having obtained the node features of the layer, we first generate the message of each node which is going to be sent out in the following message passing process. The message of the surrounding node is generated following:
(5) 
where is employed to transform the node features into messages. We apply the same operation for the central node with the same parameter :
(6) 
With these messages, we propagate and aggregate them according to the graph structure. Then we update the node features with the aggregated messages. For the peripheral nodes, since the central node is the only neighbor node, the aggregation is implemented by concatenating the message of the central node and its own message. Then we use the aggregated messages to update the node feature as follows:
(7) 
where is used to fuse all information to generate the new feature vector. For the central node, we first aggregate all the incoming messages:
(8) 
where the function AGGREGATE is implemented by a pooling operation. Then the feature of the central node is updated as follows:
(9) 
where is utilized to update the feature of the central node. In this way, we obtain the updated features by message passing.
We repeat the above process for times and have the final node feature vectors: . To make the final decision, we first combine all these features and send them to an MLP, which outputs a scalar value. Therefore, the mapping function of our proposed method can be formulated as:
(10) 
Lastly, we obtain the probability of kin relation between and
by applying a sigmoid function to the scalar value
.Note that the proposed GKR and the feature extractor network are trained endtoend. We employ the binary crossentropy loss as the objective function:
(11)  
In this way, our method is optimized in a classbalanced setting. Lastly, we depict the above pipeline in Figure 2.
Method  KinFaceWI  KinFaceWII  

FS  FD  MS  MD  Mean  FS  FD  MS  MD  Mean  
MNRML [2]  72.5%  66.5%  66.2%  72.0%  69.9%  76.9%  74.3%  77.4%  77.6%  76.5% 
DMML [3]  74.5%  69.5%  69.5%  75.5%  72.3%  78.5%  76.5%  78.5%  79.5%  78.3% 
CNNBasic [6]  75.7%  70.8%  73.4%  79.4%  74.8%  84.9%  79.6%  88.3%  88.5%  85.3% 
CNNPoint [6]  76.1%  71.8%  78.0%  84.1%  77.5%  89.4%  81.9%  89.9%  92.4%  88.4% 
DCBFD [19]  79.0%  74.2%  75.4%  77.3%  78.5%  81.0%  76.2%  77.4%  79.3%  78.5% 
WGEML [20]  78.5%  73.9%  80.6%  81.9%  78.7%  88.6%  77.4%  83.4%  81.6%  82.8% 
GKR  79.5%  73.2%  78.0%  86.2%  79.2%  90.8%  86.0%  91.2%  94.4%  90.6% 
4 Experiments
In this section, we conducted extensive experiments on two widelyused kinship verification datasets to illustrate the effectiveness of the proposed GKR.
4.1 Datasets and Implementation Details
We employ two widelyused databases: KinFaceWI [2] and KinFaceWII [2] for evaluation, which are collected from the internet. Four different types of kinship relations are considered in these two datasets: FatherDaughter (FD), FatherSon (FS), MotherDaughter (MD), and MotherSon (MS). There are 156, 134, 116 and 127 pairs of facial images for these four relations in KinFaceWI, respectively while KinFaceWII contains 250 pairs of parentchild facial images for each kin relation. The main difference between these two databases is that each image pair with kin relation in KinFaceWI comes from different photos whereas that in KinFaceWII is collected from the same photo.
We employed the ResNet18 as the feature extractor network
, which was initialized with the ImageNet pretrained weights. Naturally, the dimension
of the extracted image features was equal to 512. Since both databases are relatively small, data augmentation is a crucial step to improve performance. We performed data augmentation by first resizing the facial images into 73 73 pixels and then random cropping a 64 64 patch. Following the design choice of most GNNs methods [16], we used a twolayer () GKR and let . Adam optimizer was utilized with a learning rate of 0.0005. The batch size was set to 16 and 32 for KinFaceWI and KinFaceWII, respectively given that the size of the KinFaceWI database is only about half the size of the KinFaceWII database. For a fair comparison, we performed the fivefold crossvalidation following the standard protocol provided in [2].Dataset  Initialization  

Mean  Max  0  0.5  1  
KinFaceWI  77.4%  73.5%  77.5%  79.2%  78.1% 
KinFaceWII  79.1%  80.6%  79.5%  90.6%  87.5% 
Dataset  Pool  FS  FD  MS  MD  Mean 

KFWI  Mean  78.2%  76.8%  73.8%  86.4%  78.8% 
Max  79.5%  73.2%  78.0%  86.2%  79.2%  
KFWII  Mean  90.2%  87.0%  90.3%  92.5%  90.0% 
Max  90.8%  86.0%  91.2%  94.4%  90.6% 
4.2 Comparison with the StateoftheArt Methods
We first compare our GKR with several stateoftheart methods including metriclearning based methods and deep learningbased methods. Table 1 shows the comparison results on KinFaceWI and KinFaceWII datasets. We observe that our method achieves an average verification accuracy of 79.2% on KinFaceWI and that of 90.6% on KinFaceWII, which outperforms stateoftheart methods. Some early metric learningbased methods, such as MNRML [2] and DMML [3] learn the proposed metric with handcrafted features, which leads to unsatisfactory results. The method of WGEML [20] achieves stateoftheart results with deep features, which demonstrates the superiority of deep learning. Compared with WGEML, our method improves the mean accuracy by 0.5% and 7.8% on KinFaceWI and KinFaceWII, respectively, which shows the superior relational reasoning ability of the proposed GKR. Zhang et al. [6] propose the CNNBasic and CNNPoint, which directly learn deep neural networks for kinship verification to exploit the power of deep learning. Our method, which is also a deep learningbased method, outperforms the CNNPoint by 1.7% and 2.2% on KinFaceWI and KinFaceWII, respectively. Note that the CNNPoint contains 10 CNN backbones whereas our approach only employs one CNN backbone, which further illustrates the effectiveness of the proposed GKR.
4.3 Ablation Study
To investigate the influence of individual design choices and validate the effectiveness of the proposed GKR, we further conducted ablation experiments in this subsection.
Initialization of the Central Node:
The initialization of the central node is an important design choice given that the central node is the bridge of the kinship relational graph. One strategy is to aggregate the initial values of all other nodes by mean or max pooling. Another way is to initialize the central node with constant values, such as 0, 0.5, and 1. The results are listed in Table
2 and we see that the initialization with constant value 0.5 gives the best performance, which is employed in the following experiments.Pooling Operations of AGGREGATE: Two different pooling operations: maxpooling and meanpooling are considered to implement the function AGGREGATE. Table 3 tabulates the verification accuracy of these two pooling operations. We observe that the maxpooling achieves better results, perhaps it can better select more important information while the meanpooling treats all messages equally.
Dataset  FS  FD  MS  MD  Mean  

KFWI  Cos  60.6%  66.4%  64.2%  70.5%  65.4% 
MLP  70.5%  71.5%  71.2%  79.2%  73.1%  
ours  79.5%  73.2%  78.0%  86.2%  79.2%  
KFWII  Cos  78.2%  73.8%  76.6%  81%  77.4% 
MLP  80.6%  82.0%  80.8%  79.4%  80.7%  
ours  90.8%  86.0%  91.2%  94.4%  90.6% 
Mapping Function : To validate the effectiveness of our proposed GKR, we compare it with other widely used design choices for : MLP as formulated in (2) and cosine similarity as formulated in (3). For a fair comparison, all of them employ the ResNet18 to extract image features. Table 4 shows the results on two datasets. We see that our method outperforms the MLP and cosine similarity by a large margin on both databases, which demonstrates that our method can better exploit the relations of two extracted features and perform relational reasoning with the kinship relational graph.
5 Conclusion
In this paper, we have proposed a graphbased kinship reasoning network to effectively exploit the generic relations of two features of a sample. Different from other methods, the proposed GKR focuses on how to compare and fuse the two extracted features to perform relational reasoning. Our method first builds a kinship relational graph for two extracted features and then perform relational reasoning on this graph with message passing. Extensive experimental results on KinFaceWI and KinFaceWII databases demonstrate the effectiveness of our approach.
References
 [1] M. F Dal Martello and L. T Maloney, “Lateralization of kin recognition signals in the human face,” Journal of vision, vol. 10, no. 8, pp. 9–9, 2010.
 [2] J. Lu, X. Zhou, Y.P. Tan, Y. Shang, and J. Zhou, “Neighborhood repulsed metric learning for kinship verification,” TPAMI, vol. 36, no. 2, pp. 331–345, 2014.
 [3] H. Yan, J. Lu, W. Deng, and X. Zhou, “Discriminative multimetric learning for kinship verification,” TIFS, vol. 9, no. 7, pp. 1169–1178, 2014.
 [4] X. Zhou, J. Lu, J. Hu, and Y. Shang, “Gaborbased gradient orientation pyramid for kinship verification under uncontrolled environments,” in ACM MM, 2012, pp. 725–728.

[5]
A. Dehghan, E. G Ortiz, R. Villegas, and M. Shah,
“Who do i look like? determining parentoffspring resemblance via gated autoencoders,”
in CVPR, 2014, pp. 1757–1764. 
[6]
K. Zhang, Y. Huang, C. Song, H. Wu, and L. Wang,
“Kinship verification with deep convolutional neural networks,”
in BMVC, 2015, pp. 148.1–148.12. 
[7]
W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song,
“Sphereface: Deep hypersphere embedding for face recognition,”
in CVPR, 2017, pp. 212–220. 
[8]
W. Li, J. Lu, J. Feng, C. Xu, J. Zhou, and Q. Tian,
“Bridgenet: A continuityaware probabilistic network for age estimation,”
in CVPR, 2019, pp. 1145–1154.  [9] J. Lu, J. Hu, and Y.P. Tan, “Discriminative deep metric learning for face and kinship verification,” TIP, vol. 26, no. 9, pp. 4269–4282, 2017.
 [10] R. Fang, K. D Tang, N. Snavely, and T. Chen, “Towards computational models of kinship verification,” in ICIP, 2010, pp. 1577–1580.
 [11] X. Zhou, K. Jin, M. Xu, and G. Guo, “Learning deep compact similarity metric for kinship verification from face images,” Information Fusion, vol. 48, pp. 84–94, 2019.
 [12] Y.G. Zhao, Z. Song, F. Zheng, and L. Shao, “Learning a multiple kernel similarity metric for kinship verification,” Information Sciences, vol. 430, pp. 247–260, 2018.
 [13] S. Mahpod and Y. Keller, “Kinship verification using multiview hybrid distance learning,” CVIU, vol. 167, pp. 28–36, 2018.
 [14] H. Dibeklioglu, “Visual transformation aided contrastive learning for videobased kinship verification,” in ICCV, 2017, pp. 2459–2468.
 [15] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in ICLR, 2016.
 [16] T. N Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” in ICLR, 2017.
 [17] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in ICLR, 2017.
 [18] C. Sun, A. Shrivastava, C. Vondrick, R. Sukthankar, K. Murphy, and C. Schmid, “Relational action forecasting,” in CVPR, 2019, pp. 273–283.
 [19] H. Yan, “Learning discriminative compact binary face descriptor for kinship verification,” PRL, vol. 117, pp. 146–152, 2019.
 [20] J. Liang, Q. Hu, C. Dang, and W. Zuo, “Weighted graph embeddingbased metric learning for kinship verification,” TIP, vol. 28, no. 3, pp. 1149–1162, 2019.