DeepAI AI Chat
Log In Sign Up

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

09/06/2022
by   Deepthi Raghunandan, et al.
University of Maryland
University of Washington
0

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time – between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.

READ FULL TEXT
01/18/2020

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Today, the prominence of data science within organizations has given ris...
01/30/2022

Training and Evaluating a Jupyter Notebook Data Science Assistant

We study the feasibility of a Data Science assistant powered by a sequen...
10/07/2022

How Do Data Science Workers Communicate Intermediate Results?

Data science workers increasingly collaborate on large-scale projects be...
03/23/2023

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Data science workflows are human-centered processes involving on-demand ...
04/30/2021

Lux: Always-on Visualization Recommendations for Exploratory Data Science

Exploratory data science largely happens in computational notebooks with...
05/24/2018

Forming IDEAS Interactive Data Exploration & Analysis System

Modern cyber security operations collect an enormous amount of logging a...
03/09/2021

Performing Creativity With Computational Tools

The introduction of new tools in people's workflow has always been promo...