A Static Analysis Framework for Data Science Notebooks

by   Pavle Subotić, et al.

Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualizations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is unexpected behaviour caused by the unique out-of-order execution model of notebooks. As a result, data scientists face various challenges ranging from notebook correctness, reproducibility and cleaning. In this paper, we propose a framework that performs static analysis on notebooks, incorporating their unique execution semantics. Our framework is general in the sense that it accommodate for a wide range of analyses, useful for various notebook use cases. We have instantiated our framework on a diverse set of analyses and have evaluated them on 2211 real world notebooks. Our evaluation demonstrates that the vast majority (98.7 well within the time frame required by interactive notebook clients



There are no comments yet.


page 9

page 10


Evaluating the Success of a Data Analysis

A fundamental problem in the practice and teaching of data science is ho...

VisAR: Bringing Interactivity to Static Data Visualizations through Augmented Reality

Static visualizations have analytic and expressive value. However, many ...

Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science

Data science and machine learning provide indispensable techniques for u...

Data science for urban equity: Making gentrification an accessible topic for data scientists, policymakers, and the community

The University of Washington eScience Institute runs an annual Data Scie...

Polyphorm: Structural Analysis of Cosmological Datasets via Interactive Physarum Polycephalum Visualization

This paper introduces Polyphorm, an interactive visualization and model ...

Sharing and Preserving Computational Analyses for Posterity with encapsulator

Open data and open source software have been proposed as the primary sol...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.