Effect of forename string on author name disambiguation

02/05/2021
by   Jinseok Kim, et al.
0

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performances of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled datasets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). Results show that increasing the ratios of full forenames improves substantially the performances of both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonym is prevalent. As the ratios of full forenames increase, however, they become marginal compared to the performances by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation compared to using full-length strings. These findings provide practical suggestions such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.

READ FULL TEXT
research
09/05/2019

A Simple Reduction for Full-Permuted Pattern Matching Problems on Multi-Track Strings

In this paper we study a variant of string pattern matching which deals ...
research
06/24/2020

Small Longest Tandem Scattered Subsequences

We consider the problem of identifying tandem scattered subsequences wit...
research
02/05/2021

ORCID-linked labeled data for evaluating author name disambiguation at scale

How can we evaluate the performance of a disambiguation method implement...
research
07/30/2018

The impact of imbalanced training data on machine learning for author name disambiguation

In supervised machine learning for author name disambiguation, negative ...
research
06/27/2018

Evaluating author name disambiguation for digital libraries: A case of DBLP

Author name ambiguity in a digital library may affect the findings of re...
research
01/12/2018

Cosmic String Detection with Tree-Based Machine Learning

We explore the use of random forest and gradient boosting, two powerful ...
research
12/10/2017

A novel algorithm for online inexact string matching and its FPGA implementation

Accelerating inexact string matching procedures is of utmost importance ...

Please sign up or login with your details

Forgot password? Click here to reset