Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches

04/29/2019
by   Alexander Tekles, et al.
0

Adequately disambiguating author names in bibliometric databases is a precondition for conducting reliable analyses at the author level. In the case of bibliometric studies that include many researchers, it is not possible to disambiguate each single researcher manually. Several approaches have been proposed for author name disambiguation but there has not yet been a comparison of them under controlled conditions. In this study, we compare a set of unsupervised disambiguation approaches. Unsupervised approaches specify a model to assess the similarity of author mentions a priori instead of training a model with labelled data. In order to evaluate the approaches, we applied them to a set of author mentions annotated with a ResearcherID, this being an author identifier maintained by the researchers themselves. Apart from comparing the overall performance, we take a more detailed look at the role of the parametrization of the approaches and analyse the dependence of the results on the complexity of the disambiguation task. It could be shown that all of the evaluated approaches produce better results than those that can be obtained by using only author names. In the context of this study, the approach proposed by Caron and van Eck (2014) produced the best results.

READ FULL TEXT
research
08/10/2018

Effective Unsupervised Author Disambiguation with Relative Frequencies

This work addresses the problem of author name homonymy in the Web of Sc...
research
03/03/2017

Coverage of Author Identifiers in Web of Science and Scopus

As digital collections of scientific literature are widespread and used ...
research
09/07/2022

How reliable are unsupervised author disambiguation algorithms in the assessment of research organization performance?

The paper examines extent of bias in the performance rankings of researc...
research
12/17/2014

Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts

In computational biology and bioinformatics, the manner to understand ev...
research
02/05/2021

ORCID-linked labeled data for evaluating author name disambiguation at scale

How can we evaluate the performance of a disambiguation method implement...
research
06/27/2023

Application of Structured Matrices for Solving Hartree-Fock Equations

This work was originally published by the author in 1999 in a book [1] a...

Please sign up or login with your details

Forgot password? Click here to reset