Developing a Temporal Bibliographic Data Set for Entity Resolution

06/20/2018
by   Yichen Hu, et al.
0

Entity resolution is the process of identifying groups of records within or across data sets where each group represents a real-world entity. Novel techniques that consider temporal features to improve the quality of entity resolution have recently attracted significant attention. However, there are currently no large data sets available that contain both temporal information as well as ground truth information to evaluate the quality of temporal entity resolution approaches. In this paper, we describe the preparation of a temporal data set based on author profiles extracted from the Digital Bibliography and Library Project (DBLP). We completed missing links between publications and author profiles in the DBLP data set using the DBLP public API. We then used the Microsoft Academic Graph (MAG) to link temporal affiliation information for DBLP authors. We selected around 80K (1 million (50 names and personal web profile to improve the reliability of the resulting ground truth, while at the same time keeping the data set challenging for temporal entity resolution research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2023

Deep Author Name Disambiguation using DBLP Data

In the academic world, the number of scientists grows every year and so ...
research
06/15/2013

iCub World: Friendly Robots Help Building Good Vision Data-Sets

In this paper we present and start analyzing the iCub World data-set, an...
research
09/14/2015

A Practioner's Guide to Evaluating Entity Resolution Results

Entity resolution (ER) is the task of identifying records belonging to t...
research
08/07/2023

Labeling without Seeing? Blind Annotation for Privacy-Preserving Entity Resolution

The entity resolution problem requires finding pairs across datasets tha...
research
07/06/2018

Temporal graph-based clustering for historical record linkage

Research in the social sciences is increasingly based on large and compl...
research
03/27/2023

CoCon: A Data Set on Combined Contextualized Research Artifact Use

In the wake of information overload in academia, methodologies and syste...
research
05/26/2023

Combining Global and Local Merges in Logic-based Entity Resolution

In the recently proposed Lace framework for collective entity resolution...

Please sign up or login with your details

Forgot password? Click here to reset