Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

09/06/2022
by   Deepthi Raghunandan, et al.
0

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time – between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.

READ FULL TEXT
research
01/18/2020

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Today, the prominence of data science within organizations has given ris...
research
01/30/2022

Training and Evaluating a Jupyter Notebook Data Science Assistant

We study the feasibility of a Data Science assistant powered by a sequen...
research
10/07/2022

How Do Data Science Workers Communicate Intermediate Results?

Data science workers increasingly collaborate on large-scale projects be...
research
03/23/2023

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Data science workflows are human-centered processes involving on-demand ...
research
04/30/2021

Lux: Always-on Visualization Recommendations for Exploratory Data Science

Exploratory data science largely happens in computational notebooks with...
research
05/24/2018

Forming IDEAS Interactive Data Exploration & Analysis System

Modern cyber security operations collect an enormous amount of logging a...
research
03/09/2021

Performing Creativity With Computational Tools

The introduction of new tools in people's workflow has always been promo...

Please sign up or login with your details

Forgot password? Click here to reset