Computational reproducibility of Jupyter notebooks from biomedical publications

09/09/2022
by   Sheeba Samuel, et al.
0

Jupyter notebooks allow to bundle executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. Here, we analyze the computational reproducibility of 9625 Jupyter notebooks from 1117 GitHub repositories associated with 1419 publications indexed in the biomedical literature repository PubMed Central. 8160 of these were written in Python, including 4169 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 2684 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 396 notebooks ran through without any errors, including 245 that produced results identical to those reported in the original. Running the other notebooks resulted in exceptions. We zoom in on common problems and practices, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

READ FULL TEXT

page 9

page 11

page 12

page 13

page 18

page 19

page 20

page 23

research
06/22/2020

ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks

Computational notebooks have gained widespread adoption among researcher...
research
05/26/2020

Reconciler: A Workflow for Certifying Computational Research Reproducibility

Previous work in reproducibility focused on providing frameworks to make...
research
11/23/2022

: a Python "smuggler" for constructing lightweight reproducible notebooks

Reproducibility is a core requirement of modern scientific research. For...
research
04/13/2018

Exploration of Reproducibility Issues in Scientometric Research Part 2: Conceptual Reproducibility

This is the second part of a small-scale explorative study in an effort ...
research
04/13/2018

Exploration of reproducibility issues in scientometric research Part 1: Direct reproducibility

This is the first part of a small-scale explorative study in an effort t...
research
05/08/2020

Literature Triage on Genomic Variation Publications by Knowledge-enhanced Multi-channel CNN

Background: To investigate the correlation between genomic variation and...
research
08/09/2022

The Rise of GitHub in Scholarly Publications

The definition of scholarly content has expanded to include the data and...

Please sign up or login with your details

Forgot password? Click here to reset