Effective Unsupervised Author Disambiguation with Relative Frequencies

08/10/2018
by   Tobias Backes, et al.
0

This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are sceptical towards the performance of author name disambiguation methods in general and compare our approach to the trivial single-cluster baseline. Our results are presented separately for each correct clustering size as we can explain that, when treating all cases together, the trivial baseline and more sophisticated approaches are hardly distinguishable in terms of evaluation results. Our model shows state-of-the-art performance for all correct clustering sizes without any discriminative training and with tuning only one convergence parameter.

READ FULL TEXT
research
04/29/2019

Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches

Adequately disambiguating author names in bibliometric databases is a pr...
research
02/05/2021

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Author name disambiguation results are often evaluated by measures such ...
research
03/03/2017

Coverage of Author Identifiers in Web of Science and Scopus

As digital collections of scientific literature are widespread and used ...
research
03/25/2021

A comparative analysis of local network similarity measurements: application to author citation networks

Understanding the evolution of paper and author citations is of paramoun...
research
03/09/2020

Anna Karenina and The Two Envelopes Problem

The Anna Karenina principle is named after the opening sentence in the e...
research
08/01/2019

Contrastive Reasons Detection and Clustering from Online Polarized Debate

This work tackles the problem of unsupervised modeling and extraction of...
research
06/09/2018

A hybrid econometric-machine learning approach for relative importance analysis: Food inflation

A measure of relative importance of variables is often desired by resear...

Please sign up or login with your details

Forgot password? Click here to reset