DeepAI AI Chat
Log In Sign Up

Ranking in Genealogy: Search Results Fusion at Ancestry

by   Peng Jiang, et al.

Genealogy research is the study of family history using available resources such as historical records. Ancestry provides its customers with one of the world's largest online genealogical index with billions of records from a wide range of sources, including vital records such as birth and death certificates, census records, court and probate records among many others. Search at Ancestry aims to return relevant records from various record types, allowing our subscribers to build their family trees, research their family history, and make meaningful discoveries about their ancestors from diverse perspectives. In a modern search engine designed for genealogical study, the appropriate ranking of search results to provide highly relevant information represents a daunting challenge. In particular, the disparity in historical records makes it inherently difficult to score records in an equitable fashion. Herein, we provide an overview of our solutions to overcome such record disparity problems in the Ancestry search engine. Specifically, we introduce customized coordinate ascent (customized CA) to speed up ranking within a specific record type. We then propose stochastic search (SS) that linearly combines ranked results federated across contents from various record types. Furthermore, we propose a novel information retrieval metric, normalized cumulative entropy (NCE), to measure the diversity of results. We demonstrate the effectiveness of these two algorithms in terms of relevance (by NDCG) and diversity (by NCE) if applicable in the offline experiments using real customer data at Ancestry.


page 1

page 2

page 3

page 4


A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases

Digitization of historical records has produced a significant amount of ...

An ML-style record calculus with extensible records

In this work, we develop a polymorphic record calculus with extensible r...

Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions

A way of finding interesting or exceptional records from instant-stamped...

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Understanding voluminous historical records provides clues on the past i...

Improving information retrieval from electronic health records using dynamic and multi-collaborative filtering

Due to the rapid growth of information available about individual patien...

Temporal graph-based clustering for historical record linkage

Research in the social sciences is increasingly based on large and compl...

Bayesian Non-Exhaustive Classification for Active Online Name Disambiguation

The name disambiguation task partitions a collection of records pertaini...