Ten Simple Rules for Reproducible Research in Jupyter Notebooks

10/13/2018
by   Adam Rule, et al.
0

Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with narrative text and visualization of results. Combined with software repositories and open source licensing, notebooks are powerful tools for transparent, collaborative, reproducible, and reusable data analyses.

READ FULL TEXT
research
05/31/2022

Computational Reproducibility Within Prognostics and Health Management

Scientific research frequently involves the use of computational tools a...
research
03/24/2021

SCHeMa: Scheduling Scientific Containers on a Cluster of Heterogeneous Machines

In the era of data-driven science, conducting computational experiments ...
research
09/26/2018

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Evaluating the computational reproducibility of data analysis pipelines ...
research
03/15/2018

Sharing and Preserving Computational Analyses for Posterity with encapsulator

Open data and open source software have been proposed as the primary sol...
research
06/25/2021

SnakeLines: integrated set of computational pipelines for sequencing reads

Background: With the rapid growth of massively parallel sequencing techn...
research
03/19/2018

Data provenance tracking as the basis for a biomedical virtual research environment

In complex data analyses it is increasingly important to capture informa...
research
10/01/2021

Album: a framework for scientific data processing with software solutions of heterogeneous tools

Album is a decentralized distribution platform for solutions to specific...

Please sign up or login with your details

Forgot password? Click here to reset