Fine-Grained Lineage for Safer Notebook Interactions

by   Stephen Macke, et al.

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the code visible in the notebook's cells, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than 7× more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.



There are no comments yet.


page 9


Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds

Interactive computing notebooks, such as Jupyter notebooks, have become ...

SecureDL: Securing Code Execution and Access Control for Distributed Data Analytics Platforms

Distributed data analytics platforms such as Apache Spark enable cost-ef...

Keeping Track of User Steering Actions in Dynamic Workflows

In long-lasting scientific workflow executions in HPC machines, computat...

Bounds and Code Constructions for Partially Defect Memory Cells

This paper considers coding for so-called partially stuck memory cells. ...

Symbolic Security Predicates: Hunt Program Weaknesses

Dynamic symbolic execution (DSE) is a powerful method for path explorati...

A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code

We present an automatic static analyzer PyTea that detects tensor-shape ...

Bayesian Particles on Cyclic Graphs

We consider the problem of designing synthetic cells to achieve a comple...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.