Error Identification Strategies for Python Jupyter Notebooks

03/30/2022
by   Derek Robinson, et al.
0

Computational notebooks – such as Jupyter or Colab – combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors – errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2021

Catalogs of C and Python Antipatterns by CS1 Students

Understanding students' programming misconceptions is critical. Doing so...
research
04/02/2021

DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

Exploratory Data Analysis (EDA) is a crucial step in any data science pr...
research
11/19/2020

Categorical exploratory data analysis on goodness-of-fit issues

If the aphorism "All models are wrong"- George Box, continues to be true...
research
08/12/2020

Validating the Effectiveness of Data-Driven Gamification Recommendations: An Exploratory Study

Gamification design has benefited from data-driven approaches to creatin...
research
04/18/2019

One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis

Data analysis is at the core of scientific studies, a prominent task tha...
research
11/09/2022

Minimalist Data Wrangling with Python

Minimalist Data Wrangling with Python is envisaged as a student's first ...
research
08/06/2009

An Exploratory Analysis of the Impact of Named Ranges on the Debugging Performance of Novice Users

This paper describes an exploratory empirical study of the effect of nam...

Please sign up or login with your details

Forgot password? Click here to reset