Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

03/23/2023
by   Frederick Choi, et al.
0

Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they lack robust state management capabilities and do not support user-defined customization of the interactive components. The absence of such capabilities hinders workflow reusability and transparency while limiting the scope of exploration of the end-users. In response, we developed MAGNETON, a framework for authoring interactive widgets within computational notebooks that enables transparent, reusable, and customizable data science workflows. The framework enhances existing widgets to support fine-grained interaction history management, reusable states, and user-defined customizations. We conducted three case studies in a real-world knowledge graph construction and serving platform to evaluate the effectiveness of these widgets. Based on the observations, we discuss future implications of employing MAGNETON widgets for general-purpose data science workflows.

READ FULL TEXT
research
02/09/2020

Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects

The trustworthiness of data science systems in applied and real-world se...
research
02/04/2022

An integrated heterogeneous computing framework for ensemble simulations of laser-induced ignition

An integrated computational framework is introduced to study complex eng...
research
05/24/2018

Forming IDEAS Interactive Data Exploration & Analysis System

Modern cyber security operations collect an enormous amount of logging a...
research
04/09/2021

Agile (data) science: a (draft) manifesto

Science has a data management as well as a project management problem. W...
research
02/16/2021

VIEW: a framework for organization level interactive record linkage to support reproducible data science

Objective: To design and evaluate a general framework for interactive re...
research
09/06/2022

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

Sensemaking is the iterative process of identifying, extracting, and exp...
research
07/09/2020

A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

This paper tries to reduce the effort of learning, deploying, and integr...

Please sign up or login with your details

Forgot password? Click here to reset