Recognizing Families In the Wild (RFIW): The 4th Edition

by   Joseph P. Robinson, et al.

Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before. Organized in conjunction with the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG) as a Challenge, RFIW provides a platform for publishing original work and the gathering of experts for a discussion of the next steps. This paper summarizes the supported tasks (i.e., kinship verification, tri-subject verification, and search retrieval of missing children) in the evaluation protocols, which include the practical motivation, technical background, data splits, metrics, and benchmark results. Furthermore, top submissions (i.e., leader-board stats) are listed and reviewed as a high-level analysis on the state of the problem. In the end, the purpose of this paper is to describe the 2020 RFIW challenge, end-to-end, along with forecasts in promising future directions.



There are no comments yet.


page 2

page 3

page 4

page 5


Recognizing Families In the Wild (RFIW): The 5th Edition

Recognizing Families In the Wild (RFIW), held as a data challenge in con...

Top 3 in FG 2021 Families In the Wild Kinship Verification Challenge

Kinship verification is the task of determining whether a parent-child, ...

Visual Kinship Recognition: A Decade in the Making

Kinship recognition is a challenging problem with many practical applica...

Challenge report: Recognizing Families In the Wild Data Challenge

This paper is a brief report to our submission to the Recognizing Famili...

Deep Fusion Siamese Network for Automatic Kinship Verification

Automatic kinship verification aims to determine whether some individual...

Emotion Recognition for In-the-wild Videos

This paper is a brief introduction to our submission to the seven basic ...

Families In Wild Multimedia (FIW-MM): A Multi-Modal Database for Recognizing Kinship

Recognizing kinship - a soft biometric with vast applications - in photo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Automatic kinship recognition has numerous uses. For instance - as an aid in forensic investigations, automated photo library management, historical lineage and genealogical studies, social-media-based analysis, tragedies of missing children and human trafficking, and concerns about immigration and border patrol. Nonetheless, the challenges in such face-based tasks (i.e., fine-grained classification in unconstrained settings), are only amplified in the kin-based problem sets, as the data exhibits a high degree of variability in pose, illumination, background, and clarity, along with soft bio-metric target labels (i.e., kinship), which only further exacerbates the challenges with consideration for the directional relationships. Hence, the usefulness brought by the practical benefits of enhancing kinship-based technology is matched by the challenges posed by the problem of automatic kinship understanding. This motivated the launching of the RFIW challenge series: a large-scale data challenge in support of multiple tasks with the aim to advance kinship recognition technologies. We intend for RFIW to serve as a platform for expert and junior researchers to present and share thoughts in an open forum.

The Families In the Wild (FIW) dataset [17, 18, 21]– a large-scale, multi-task image set for kinship recognition– supports the annual RFIW.111FIW project page, The aim of the RFIW challenge is to bridge the gap between research-and-reality using its large scale, variation, and rich label information. This makes modern-day data-driven approaches possible, as has been seen since its release in 2016 [1, 3, 6, 9, 23].

We summarize the evaluation protocols– practical motivation, technical background, data splits, metrics, and benchmarks– of the 2020 RFIW challenge. Specifically, this manuscript serves as a white-paper of the RFIW held in conjunction with the FG. Additional and information supplemental on the challenge website.222RFIW2020 webpage,

The remainder of the paper is organized as follows. The three tasks that make-up RFIW2020 are introduced separately (Section III-B, III-C, and III-D). For each task, a clear problem statement, the intended use, data splits, task protocols (i.e., evaluation settings and metrics), and benchmark results are provided. From there, we bring up the discussion (Section IV) on broader impacts and potential next steps. Then, we conclude (Section IV-B).

Ii Related Works

Kinship recognition, as seen in the machine vision, started in [5], where minimal data and low-level features set the stage for the task of kinship verification between parents and child. Soon thereafter, [24] took a gender specific view of the problem– moreover, the problem was viewed as a low rank transfer subspace problem, where the source and target are set as faces of the parent at younger and older ages, respectively [20]. Family101 [4] was the first facial image dataset with family tree labels; at about the same time, KinWild [12] was released and used to organize data challenges [11]. The task of tri-subject kinship verification (i.e., Track 2), was inspired by the work that came next, in [16], for which data (i.e., TS-Kin) and benchmarks were released. Until the release of FIW in 2016 [17]

, deep learning models were not widely applied to the kin-based domain, with the minimal exception (

i.e., [25]), as the data capacity of their more complex machinery was not met by previous datasets. As part of the first RFIW [19]), FIW was further extended [18, 21], making ever more kin-based problems possible to approach [6, 8]. A major focus of this (i.e., RFIW 2020) is to establish a record of state-of-the-art for the latest-and-greatest version of the FIW image-set.

width= BB SS SIBS FD FS MD MS GFGD GFGS GMGD GMGS Total Train P 991 1,029 1,588 712 721 736 716 136 124 116 114 6,983 F 303 304 286 401 404 399 402 81 73 71 66 2790 S 39,608 27,844 35,337 30,746 46,583 29,778 46,969 2,003 2,097 1,741 1,834 264,540 val P 433 433 206 220 261 200 234 53 48 56 42 2,186 F 74 57 90 134 135 124 130 32 29 36 27 868 S 8,340 5,982 21,204 7,575 9,399 8,441 7,587 762 879 714 701 71,584 test P 469 469 217 202 257 230 237 40 31 36 33 2,221 F 149 150 89 126 133 136 132 22 21 20 22 1,190 S 3,459 2,956 967 3,019 3,273 3,184 2,660 121 96 71 84 39,743

TABLE I: Counts for T-1: number of unique pairs (P), families (F), and face samples (S).
Fig. 1: Sample pairs for the categories of T-1, kinship verification. For each, sample pairs with similarity scores near the threshold (i.e., hard (H) samples), along with highly confident predictions (i.e., easy (E) samples).
FM-S FM-D Total


P 662 639 1,331
F 375 364 739
S 8,575 8,588 17,163


P 202 177 379
F 116 117 233
S 2,859 2,493 5,352


P 205 178 383
F 116 114 230
S 2,805 2,400 5,205
TABLE II: Counts for T-2. No. of pairs (P), families (F), face samples (S).
Sphereface 0.61 0.66 0.69 0.62 0.66 0.71 0.73 0.68 0.57 0.64 0.50 0.64
TABLE III: Verification accuracy scores for baseline experiments set for Task I of FIW automatic kinship recognition challenge.

Iii Task Evaluations, Protocols, Benchmarks

RFIW 2020 supported three tasks: kinship verification (T-1), tri-subject verification (T-2), and search & retrieval of family members for missing children (T-3). We next describe each task separately, following the same outline: the problem statement and motivation, data splits and protocols, and benchmark experiments (i.e., baselines). A brief section on experimental settings common to all tasks precedes the detailed descriptions of each task in separate subsections.

Iii-a Experimental settings


dataset provides the most extensive set of face pairs for kin-based face recognition.

FIW provides the data needed to train modern-day data-driven deep models [2, 9, 21, 23]. FIW was split into three parts: train, val, and test. Specifically, 60% of the families were assigned to the train set; the remaining 40% was split evenly between val and test. The three sets are completely disjoint in family and identity. Labeled train and unlabeled val were first released, with servers open for scoring (Phase 1). Then, ground-truth for val was made available (Phase 2). Finally, the “blind” test set was released at the start of Phase 3. Phase 3 lasted for ten days to allow teams to process and make final submissions for scoring. Teams were asked to only process the test set when generating submissions and any attempt to analyze or understand the test pairs was prohibited.

As preprocessing, faces for all three sets were encoded via Sphereface CNN [10] (i.e., 512 D). All pre-processing and the model weights were from the original work.333

Also common, is the use of cosine similarity to determine closeness of a pair of facial features

and  [14]. This is defined as

Scores were then compared to threshold (i.e., infers KIN; else, NON-KIN) or sorted (i.e., T-3).

Iii-B Kinship verification

The goal of kinship verification is to determine whether a pair of faces are blood relatives. This classical Boolean problem has two possible outcomes, KIN or NON-KIN (i.e., true or false, respectively). Hence, this is the one-to-one view of kin-based problems. The classical problem can be further extended by considering the type of kin relation between a pair of faces, rather than treating all kin relations equally.

Prior research mainly considered parent-child kinship types, i.e., father-daughter (FD), father-son (FS), mother-daughter (MD), mother-son (MS). Less attention has been given to sibling pairs, i.e., sister-sister (SS), brother-brother (BB), and brother-sister (SIBS

). Research findings in psychology and computer vision found that different relationship types share different familial features 

[13]. Hence, each relationship type can be modeled and evaluated independently. Thus, additional kinship types would further both our understanding and capabilities of automatic kinship recognition. With FIW, the number of facial pairs accessible for kinship verification has dramatically increased, with a subset of the pair types and face pairs listed in Table I. Additionally, benchmarks now include grandparent-grandchildren types, i.e., grandfather-granddaughter (GFGD), grandfather-grandson (GFGS), grandmother-granddaughter (GMGD), grandmother-grandson (GMGS).

Iii-B1 Data splits

FIW supports eleven different relationship types that were used in RFIW (Table I). The test set had an equal number of positive and negative pairs and with no family (and, hence, subject identity) overlap between sets.

Iii-B2 Settings and metrics

Conventional face verification supports different modes [7], which is followed here:

  1. Unsupervised: No labels provided, i.e., the prior knowledge about kinship or subject IDs.

  2. Image-restricted: Kinship labels (i.e., KIN/NON-KIN) will be provided for a training set that is completely disjoint from ”blind” evaluation set, i.e., no subject or family overlap between training and evaluation sets.

  3. Image unrestricted: Along with the kinship labels, subject IDs are provided. This allows for the ability to generate additional negative pair-wise samples.

Verification accuracy is used to evaluate. Specifically,

where . Then, the the overall accuracy is calculated as a weighted sum (i.e., weight by the pair count to determine the average accuracy).

Iii-B3 Baseline results

The threshold was determined by the value that maximizes the accuracy on the val set. Results are listed in Table III, with samples in Fig 1.

Iii-C Tri-subject verification

Tri-Subject Verification focuses on a different view of kinship verification– the goal is to decide if a child is related to a pair of parents. First introduced in [16], it makes a more realistic assumption, as having knowledge of one parent often means the other potential parent(s) can be easily inferred.

Triplet pairs consist of Father (F) / Mother (M) - Child (C) (FMC) pairs, where the child C could be either a Son (S) or a Daughter (D) (i.e., triplet pairs are FMS and FMD).

Baseline 0.68 0.68 0.68
TABLE IV: Tri-subject verification accuracy scores for T-II benchmark.
Fig. 2: Tri-subject pairs near the threshold, and for correct and incorrect predictions. Each shows FMS (top rows) and FMD (bottom).

Iii-C1 Data splits

Following the procedure in [16], we create positive (have kin relation) triplets by matching each husband-wife spouse pair with their biological children, and negative (no kin relation) triplets by shuffling the positive triplets until every spouse pair is matched with a child which is not theirs (Table II). Because the number of potential negative samples far exceeds the number of potential positive examples, we only generate one negative triplet for each positive triplet, again following the procedure of [16].

We post-process the positive triplets before generating negatives to ensure balance among individuals, families, and spouse pairs, since a naive data selection procedure which weights every face sample similarly would result in some individuals and families being severely over-represented due to an abundance of face samples for some identities and families. The post-processing is done by limiting the number of samples of any triplet , where , , and are identities of a father, mother, and child to 5, then limiting the appearance of each spouse-pair to 15, and then finally limiting the number of triplet samples from each family to 30. The test set has an equal number of positive and negative pairs. Lastly, note that there is no family or subject identity overlapping between any of the sets.

Iii-C2 Settings and metrics

Per convention in face verification, we offer 3 modes (i.e., the same as in task 1 listed in Section III-B2). The metric used is, again, verification accuracy, which is first calculated per triplet-pair type (i.e., FMD and FMS). Then, the weighted sum (i.e., average accuracy) determines the leader-board.

Iii-C3 Baseline results

Baseline results are shown in Table IV. A score was assigned to each triplet in the validation and test sets using the formula

where , and

are the feature vectors of the father, mother, and child images respectively from the i-th triplet. Scores were compared to a threshold

to infer a label (i.e., predict KIN if the score was above the threshold; else, NON-KIN). The threshold was found experimentally on the val set. The threshold was applied to the test (Table IV).

Iii-D Search and retrieval

T-3 is posed as a many-to-many, i.e., one-to-many samples per subject (Fig 3). Thus, we imitate template-based evaluations on the probe side, but faces in the gallery are not labeled by subject. Furthermore, the goal is to find relatives of search subjects (i.e., probes) in a search pool (i.e., gallery).

Fig. 3: Depiction of the protocol of T-3. Given a probe, consisting of one-to-many samples of that family member (Fig. 4), the task is then to find all family members in a gallery of faces. Specifically, provided the probe, the output is the ranked list of the subjects in the gallery. Here we show Brendan Lee, son of the legendary actor Bruce Lee, and the ranked list correctly returns rank 1 (his father, Bruce), rank 2 (his sister, Shannon), misses rank 3 (red), but then hits rank 4 with face of his grandfather (green).
Probe Gallery Total


I 3,021 3,021
F 571 571
S 15,845 15,845


I 192 802 994
F 192 192 192
S 1,086 4,030 5,116


I 190 783 9d73
F 190 190 190
S 1,487 31,787 33,274
TABLE V: Counts for T-3: individuals (I), families (F), face samples (S).

Kin information, as a search cue, can be leveraged to improve conventional FR search systems, or even as prior knowledge for mining social or family relationships in industries like However, the task is most directly related to missing persons. Thus, we formulate it as such.

T-3 mimics finding parents and other relatives of unknown, missing children. The gallery contains 31,787 facial images from 190 families (Fig. 4): inputs are subject labels (i.e., probes), and outputs are ranked lists of all faces in the gallery. The number of relatives varies for each subject, ranging anywhere from 0 to 20+. Furthermore, probes have one-to-many samples– the means of fusing samples of probes is an open research question. This many-to-many task is currently setup in closed form (i.e., all probes have relative(s)).

Iii-D1 Data spits

This task will be composed of search subjects (i.e., probes) from different families. Probes are supported by several samples of query subject, text description of family (e.g., ethnicity, some relationships between selected members, etc.), and list of relatives present in gallery. The test set will only consist of sets of images for the probes. Again, three disjoint sets were split (Table V).

Run ID Network(s) mAP Rank@5
Baseline-2 Sphereface 0.016 0.098
TABLE VI: Performance ratings for Track 3.
Fig. 4: Plot showing the face counts for each family in test set of T-3. The probes have about 8 faces on average, while the number of family members in the gallery nears 20 on average, with an average of 170 faces in total.

Iii-D2 Evaluation settings

Each subject (i.e., probe) gets searched independently, with 190 in total: hence, 190 families make-up the test set. Probes have one-to-many faces. Following template conventions of other many-to-many face evaluations, facial images for unique subjects are separated by identity, with a gallery containing variable number of relatives, each with a variable number of faces [22].

Teams were allowed to submit up to six final submissions, with each submissions being a ranked-list of all subjects in the gallery. Submissions were accompanied by a brief (text) description of the system used to generate results. With that was a ranked list per probe in the test. Per RFIW rules, participants were permitted to analyze test results, as this was the purpose of the 192 families provided as the val set.

Evaluation Metric

MAP was the underlying metric used for comparisons. Mathematically speaking, scores for each of the missing children are calculated as follows:

where average precision (AP) is a function of family with a total of true-positive rate (TPR). We then average all AP scores to determine overall MAP score as follows:

Iii-D3 Baseline results

Table VI and shown in Fig. 5.

Fig. 5: T-3 sample results (Rank 10). For each query (row) one or more faces of the probe returned the corresponding samples of gallery as top 10. Here, x (red) depicts false predictions, while true predictions displays the relationship type (in green): P for parent; C for child; S for sibling.

Iv Discussion

Iv-a A broader impact

The fourth RFIW gained fair attention. Task 1, kinship verification, saw the most (10+ submissions). Track 2 (i.e., tri-subject) and 3 (search and retrieval) were both supported for the first time by RFIW, are more complex than the classic task of T-1, and are practically motivated. Submissions for all tracks passed baselines by notable margins (leader-board coming soon).

The scope of kin-based problems spans much wider than RFIW. Specifically, in application (e.g., generative-based tasks [6, 15]) and experimental settings [8], focuses on particular views of the visual kinship recognition problem. Tasks of RFIW were thought to be appropriate, provided the difficulty and practicality; the question how best to formulate the problem is an open research question, in itself.

Iv-B Conclusion

This paper presented the 2020 RFIW challenge organized in conjunction with the FG. The 2020 challenge is the fourth edition of the RFIW annual evaluation. For this, we added 2 new tracks, tri-subject verification and search & retrieval of missinig children; the traditional kinship verification task continued to be supported as well. The FIW dataset was used to pose each of the challenge tracks. As challenging it may be, many entries outperformed the “vanilla” baselines in all tasks. Regardless, in all three cases, there still exists much room for improvement. Accuracy on the Verification and Tri-subject Verification tasks has just begun to approach the 80% mark, with Search & Retrieval further behind. Code and baselines available online ( RFIW supports research efforts. As we see it, the story of FIW is still in its infancy.


  • [1] Q. Duan and L. Zhang. Advnet: Adversarial contrastive residual net for 1 million kinship recognition. In Proceedings on RFIW Workshop in ACM MM, 2017.
  • [2] Q. Duan and L. Zhang. Advnet: Adversarial contrastive residual net for 1 million kinship recognition. In Proceedings on RFIW Workshop in ACM MM, pages 21–29, 2017.
  • [3] I. Ö. Ertugrul and H. Dibeklioglu. What will your future child look like? modeling and synthesis of hereditary patterns of facial dynamics. In IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2017.
  • [4] R. Fang, A. Gallagher, T. Chen, and A. Loui. Kinship classification by modeling facial feature heredity. In International Conference on Image Processing (ICIP). IEEE, 2013.
  • [5] R. Fang, K. D. Tang, N. Snavely, and T. Chen. Towards computational models of kinship verification. In International Conference on Image Processing (ICIP). IEEE, 2010.
  • [6] P. Gao, S. Xia, J. Robinson, J. Zhang, C. Xia, M. Shao, and Y. Fu. What will your child look like? dna-net: Age and gender aware kin face synthesizer. arXiv:1911.07014, 2019.
  • [7] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, UMass, Amherst, 2007.
  • [8] C. Kumar, R. Ryan, and M. Shao. Adversary for social good: Protecting familial privacy through joint adversarial attacks. In

    Conference on Artificial Intelligence (AAAI)

    , 2020.
  • [9] Y. Li, J. Zeng, J. Zhang, A. Dai, M. Kan, S. Shan, and X. Chen. Kinnet: Fine-to-coarse deep metric learning for kinship verification. In Proceedings on RFIW Workshop in ACM MM, pages 13–20, 2017.
  • [10] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song. Sphereface: Deep hypersphere embedding for face recognition. In

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2017.
  • [11] J. Lu, J. Hu, V. E. Liong, X. Zhou, A. Bottino, I. Ul Islam, T. Figueiredo Vieira, X. Qin, X. Tan, S. Chen, et al. The fg 2015 kinship verification in the wild evaluation. In IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2015.
  • [12] J. Lu, X. Zhou, Y.-P. Tan, Y. Shang, and J. Zhou. Neighborhood repulsed metric learning for kinship verification. IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 36(2), 2014.
  • [13] S. X. M. Shao and Y. Fu. Genealogical face recognition based on ub kinface database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2011.
  • [14] H. V. Nguyen and L. Bai. Cosine similarity metric learning for face verification. In Asian Conference on Computer Vision. Springer, 2010.
  • [15] S. Ozkan and A. Ozkan. Kinshipgan: Synthesizing of kinship faces from family photos by regularizing a deep face network. In International Conference on Image Processing (ICIP), 2018.
  • [16] X. Qin, X. Tan, and S. Chen. Tri-subject kinship verification: Understanding the core of a family. CoRR, abs/1501.02555, 2015.
  • [17] J. P. Robinson, M. Shao, Y. Wu, and Y. Fu. Families in the wild (fiw): Large-scale kinship image database and benchmarks. In ACM on International Conference on Multimedia (MM), 2016.
  • [18] J. P. Robinson, M. Shao, Y. Wu, H. Liu, T. Gillis, and Y. Fu. Visual kinship recognition of families in the wild. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018.
  • [19] J. P. Robinson, M. Shao, H. Zhao, Y. Wu, T. Gillis, and Y. Fu. Recognizing families in the wild (rfiw). In Proceedings on RFIW Workshop in ACM MM, 2017.
  • [20] M. Shao, C. Castillo, Z. Gu, and Y. Fu. Low-rank transfer subspace learning. In 2012 IEEE 12th International Conference on Data Mining, pages 1104–1109. IEEE, 2012.
  • [21] S. Wang, J. P. Robinson, and Y. Fu. Kinship verification on families in the wild with marginalized denoising metric learning. In Conference on Automatic Face and Gesture Recognition (FG), 2017.
  • [22] C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams, T. Miller, N. Kalka, A. K. Jain, J. A. Duncan, K. Allen, et al. Iarpa janus benchmark-b face dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, pages 90–98, 2017.
  • [23] Y. Wu, Z. Ding, H. Liu, J. Robinson, and Y. Fu. Kinship classification through latent adaptive subspace. In Conference on Automatic Face and Gesture Recognition. IEEE, 2018.
  • [24] S. Xia, M. Shao, J. Luo, and Y. Fu. Understanding kin relationships in a photo. IEEE Trans. on Multimedia, 14(4):1046–1056, 2012.
  • [25] K. Zhang, Y. Huang, C. Song, H. Wu, and L. Wang.

    Kinship verification with deep convolutional neural networks.

    In Proceedings of the British Machine Vision Conference (BMVC), pages 148.1–148.12. BMVA Press, September 2015.