S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications

04/22/2022
by   Shaurya Rohatgi, et al.
0

Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there are few large representative mentorship datasets available. We contribute two datasets to the study of mentorship. The first has over 300,000 ground truth academic mentor-mentee pairs obtained from multiple diverse, manually-curated sources, and linked to the Semantic Scholar (S2) knowledge graph. We use this dataset to train an accurate classifier for predicting mentorship relations from bibliographic features, achieving a held-out area under the ROC curve of 0.96. Our second dataset is formed by applying the classifier to the complete co-authorship graph of S2. The result is an inferred graph with 137 million weighted mentorship edges among 24 million nodes. We release this first-of-its-kind dataset to the community to help accelerate the study of scholarly mentorship: <https://github.com/allenai/S2AMP-data>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2022

EMAKG: An Enhanced Version Of The Microsoft Academic Knowledge Graph

Scholarly knowledge graphs are valuable sources of information in severa...
research
06/21/2020

Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Computational and cognitive studies suggest that the abstraction of even...
research
10/15/2019

From Academia to Software Development: Publication Citations in Source Code Comments

Academic publications have been evaluated with the impact on research co...
research
08/20/2020

VisualSem: a high-quality knowledge graph for vision and language

We argue that the next frontier in natural language understanding (NLU) ...
research
08/07/2023

SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

We present SemOpenAlex, an extensive RDF knowledge graph that contains o...
research
02/27/2023

Soft-Search: Two Datasets to Study the Identification and Production of Research Software

Software is an important tool for scholarly work, but software produced ...
research
06/17/2021

A Bibliography of Combinators

A categorized bibliography of combinators is given, providing what is be...

Please sign up or login with your details

Forgot password? Click here to reset