Ethnicity sensitive author disambiguation using semi-supervised learning

08/31/2015
by   Gilles Louppe, et al.
0

Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the inexpensive collection of large annotated training data, which can be further leveraged to build a complementary automated disambiguation system capable of inferring patterns for identifying publications written by the same person. Building on more than 1 million publicly released crowdsourced annotations, we propose an automated author disambiguation solution exploiting this data (i) to learn an accurate classifier for identifying coreferring authors and (ii) to guide the clustering of scientific publications by distinct authors in a semi-supervised way. To the best of our knowledge, our analysis is the first to be carried out on data of this size and coverage. With respect to the state of the art, we validate the general pipeline used in most existing solutions, and improve by: (i) proposing phonetic-based blocking strategies, thereby increasing recall; and (ii) adding strong ethnicity-sensitive features for learning a linkage function, thereby tailoring disambiguation to non-Western author names whenever necessary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

Whois? Deep Author Name Disambiguation using Bibliographic Data

As the number of authors is increasing exponentially over years, the num...
research
12/15/2019

NaïveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts

Information about the contributions of individual authors to scientific ...
research
03/17/2023

Deep Author Name Disambiguation using DBLP Data

In the academic world, the number of scientists grows every year and so ...
research
07/09/2021

Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data

Author name ambiguity remains a critical open problem in digital librari...
research
03/03/2017

Coverage of Author Identifiers in Web of Science and Scopus

As digital collections of scientific literature are widespread and used ...
research
02/04/2018

A Method for Discovering and Extracting Author Contributions Information from Scientific Biomedical Publications

Creating scientific publications is a complex process, typically compose...
research
09/03/2021

LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

The automatic verification of document authorships is important in vario...

Please sign up or login with your details

Forgot password? Click here to reset