It Runs in the Family: Searching for Similar Names using Digitized Family Trees

12/09/2019
by   Aviad Elyashar, et al.
0

Searching for a person's name is a common online activity. However, web search engines suffer from low numbers of accurate results to a query containing names. Underlying these poor results are the multiple legitimate spelling variations for a given name, as opposed to regular text that typically possesses a single way to be spelled correctly. Today, most of the techniques suggesting related names based on pattern matching and phonetic encoding approaches. However, they frequently lead to poor performance. Here, we propose a novel approach to tackle the problem of similar name suggestions. Our novel algorithm utilizes historical data collected from genealogy websites along with graph algorithms. In contrast to previous approaches that suggest similar names based on encoded representations or patterns, we propose a general approach that suggests similar names based on the construction and analysis of family trees. Using this valuable and historical information and combining it with network algorithms provides a large name-based graph that offers a great number of suggestions based on historical ancestors. Similar names are extracted from the graph based on generic ordering functions that outperform other algorithms suggesting names based on a single dimension, which limits their performance. Utilizing a large-scale online genealogy dataset with over 17M profiles and more than 200K unique first names, we constructed a large name-based graph. Using this graph along with 7,399 labeled given names with their true synonyms, we evaluated our proposed approach and showed that comparing our algorithm to other algorithms, including phonetic and string similarity algorithms, provides superior performance in terms of accuracy, F1, and precision. We suggest our algorithm as a useful tool for suggesting similar names based on constructing a name-based graph using family trees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2020

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Searching for information about a specific person is an online activity ...
research
12/29/2017

Personal Names in Modern Turkey

We analyzed the most common 5000 male and 5000 female Turkish names base...
research
09/07/2022

When Are Names Similar Or the Same? Introducing the Code Names Matcher Library

Program code contains functions, variables, and data structures that are...
research
12/02/2019

An Investigation of Biases in Web Search Engine Query Suggestions

Survey-based studies suggest that search engines are trusted more than s...
research
04/16/2020

Deep Generation of Coq Lemma Names Using Elaborated Terms

Coding conventions for naming, spacing, and other essentially stylistic ...
research
03/01/2021

Roosterize: Suggesting Lemma Names for Coq Verification Projects Using Deep Learning

Naming conventions are an important concern in large verification projec...
research
09/02/2022

A Novel Approach for Pill-Prescription Matching with GNN Assistance and Contrastive Learning

Medication mistaking is one of the risks that can result in unpredictabl...

Please sign up or login with your details

Forgot password? Click here to reset