Assigning credit to scientific datasets using article citation networks

01/16/2020
by   Tong Zeng, et al.
0

A citation is a well-established mechanism for connecting scientific artifacts. Citation networks are used by citation analysis for a variety of reasons, prominently to give credit to scientists' work. However, because of current citation practices, scientists tend to cite only publications, leaving out other types of artifacts such as datasets. Datasets then do not get appropriate credit even though they are increasingly reused and experimented with. We develop a network flow measure, called DataRank, aimed at solving this gap. DataRank assigns a relative value to each node in the network based on how citations flow through the graph, differentiating publication and dataset flow rates. We evaluate the quality of DataRank by estimating its accuracy at predicting the usage of real datasets: web visits to GenBank and downloads of Figshare datasets. We show that DataRank is better at predicting this usage compared to alternatives while offering additional interpretable outcomes. We discuss improvements to citation behavior and algorithms to properly track and assign credit to datasets.

READ FULL TEXT

page 2

page 4

page 5

page 8

page 9

page 13

page 17

page 20

research
11/01/2019

Practice meets Principle: Tracking Software and Data Citations to Zenodo DOIs

Data and software citations are crucial for the transparency of research...
research
07/27/2018

On Good and Bad Intentions behind Anomalous Citation Patterns among Journals in Computer Sciences

Scientific journals are an important choice of publication venue for mos...
research
10/30/2018

The dispersion of the citation distribution of top scientists' publications

This work explores the distribution of citations for the publications of...
research
04/24/2020

Jupyter notebooks as discovery mechanisms for open science: Citation practices in the astronomy community

Citing data and software is a means to give scholarly credit and to faci...
research
10/07/2019

SentiCite: An Approach for Publication Sentiment Analysis

With the rapid growth in the number of scientific publications, year aft...
research
10/27/2020

Shapley Flow: A Graph-based Approach to Interpreting Model Predictions

Many existing approaches for estimating feature importance are problemat...

Please sign up or login with your details

Forgot password? Click here to reset