A Static Analysis Framework for Data Science Notebooks

10/15/2021
by   Pavle Subotić, et al.
0

Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualizations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is unexpected behaviour caused by the unique out-of-order execution model of notebooks. As a result, data scientists face various challenges ranging from notebook correctness, reproducibility and cleaning. In this paper, we propose a framework that performs static analysis on notebooks, incorporating their unique execution semantics. Our framework is general in the sense that it accommodate for a wide range of analyses, useful for various notebook use cases. We have instantiated our framework on a diverse set of analyses and have evaluated them on 2211 real world notebooks. Our evaluation demonstrates that the vast majority (98.7 well within the time frame required by interactive notebook clients

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

page 10

04/26/2019

Evaluating the Success of a Data Analysis

A fundamental problem in the practice and teaching of data science is ho...
08/04/2017

VisAR: Bringing Interactivity to Static Data Visualizations through Augmented Reality

Static visualizations have analytic and expressive value. However, many ...
03/08/2022

Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science

Data science and machine learning provide indispensable techniques for u...
10/06/2017

Data science for urban equity: Making gentrification an accessible topic for data scientists, policymakers, and the community

The University of Washington eScience Institute runs an annual Data Scie...
09/05/2020

Polyphorm: Structural Analysis of Cosmological Datasets via Interactive Physarum Polycephalum Visualization

This paper introduces Polyphorm, an interactive visualization and model ...
03/15/2018

Sharing and Preserving Computational Analyses for Posterity with encapsulator

Open data and open source software have been proposed as the primary sol...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.