Ranking in Genealogy: Search Results Fusion at Ancestry

02/27/2019
by   Peng Jiang, et al.
0

Genealogy research is the study of family history using available resources such as historical records. Ancestry provides its customers with one of the world's largest online genealogical index with billions of records from a wide range of sources, including vital records such as birth and death certificates, census records, court and probate records among many others. Search at Ancestry aims to return relevant records from various record types, allowing our subscribers to build their family trees, research their family history, and make meaningful discoveries about their ancestors from diverse perspectives. In a modern search engine designed for genealogical study, the appropriate ranking of search results to provide highly relevant information represents a daunting challenge. In particular, the disparity in historical records makes it inherently difficult to score records in an equitable fashion. Herein, we provide an overview of our solutions to overcome such record disparity problems in the Ancestry search engine. Specifically, we introduce customized coordinate ascent (customized CA) to speed up ranking within a specific record type. We then propose stochastic search (SS) that linearly combines ranked results federated across contents from various record types. Furthermore, we propose a novel information retrieval metric, normalized cumulative entropy (NCE), to measure the diversity of results. We demonstrate the effectiveness of these two algorithms in terms of relevance (by NDCG) and diversity (by NCE) if applicable in the offline experiments using real customer data at Ancestry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2023

A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases

Digitization of historical records has produced a significant amount of ...
research
08/13/2021

An ML-style record calculus with extensible records

In this work, we develop a polymorphic record calculus with extensible r...
research
02/24/2021

Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions

A way of finding interesting or exceptional records from instant-stamped...
research
04/13/2021

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Understanding voluminous historical records provides clues on the past i...
research
07/06/2018

Temporal graph-based clustering for historical record linkage

Research in the social sciences is increasingly based on large and compl...
research
11/10/2017

Arrhythmia Classification from the Abductive Interpretation of Short Single-Lead ECG Records

In this work we propose a new method for the rhythm classification of sh...

Please sign up or login with your details

Forgot password? Click here to reset