A Framework to capture and reproduce the Absolute State of Jupyter Notebooks

03/31/2022
by   Dimuthu Wannipurage, et al.
0

Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful processing capabilities or store large or access-limited data. There are several challenges for making notebooks fully reproducible when examined in detail. The notebook code must be replicated entirely, and the underlying Python runtime environments must be identical. More subtle problems arise in replicating referenced data, external library dependencies, and runtime variable states. This paper presents solutions to these problems using Juptyer's standard extension mechanisms to create an archivable system state for a running notebook. We show that the overhead for these additional mechanisms, which involve interacting with the underlying Linux kernel, does not introduce substantial execution time overheads, demonstrating the approach's feasibility.

READ FULL TEXT
research
05/18/2022

Transparent Serverless execution of Python multiprocessing applications

Access transparency means that both local and remote resources are acces...
research
03/04/2021

Restoring Execution Environments of Jupyter Notebooks

More than ninety percent of published Jupyter notebooks do not state dep...
research
06/17/2022

WaTZ: A Trusted WebAssembly Runtime Environment with Remote Attestation for TrustZone

WebAssembly (Wasm) is a novel low-level bytecode format that swiftly gai...
research
07/20/2020

MKLpy: a python-based framework for Multiple Kernel Learning

Multiple Kernel Learning is a recent and powerful paradigm to learn the ...
research
03/08/2021

DepGraph: Localizing Performance Bottlenecks in Multi-Core Applications Using Waiting Dependency Graphs and Software Tracing

This paper addresses the challenge of understanding the waiting dependen...
research
03/19/2022

No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing

Serverless platforms essentially face a tradeoff between container start...
research
08/24/2021

The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

To support the growing demands of neuroscience applications, researchers...

Please sign up or login with your details

Forgot password? Click here to reset