Code Duplication and Reuse in Jupyter Notebooks

05/27/2020
by   Andreas Koenzen, et al.
0

Duplicating one's own code makes it faster to write software. This expediency is particularly valuable for users of computational notebooks. Duplication allows notebook users to quickly test hypotheses and iterate over data. In this paper, we explore how much, how and from where code duplication occurs in computational notebooks, and identify potential barriers to code reuse. Previous work in the area of computational notebooks describes developers' motivations for reuse and duplication but does not show how much reuse occurs or which barriers they face when reusing code. To address this gap, we first analyzed GitHub repositories for code duplicates contained in a repository's Jupyter notebooks, and then conducted an observational user study of code reuse, where participants solved specific tasks using notebooks. Our findings reveal that repositories in our sample have a mean self-duplication rate of 7.6 preferring to reuse code from online sources.

READ FULL TEXT
research
02/23/2023

On Code Reuse from StackOverflow: An Exploratory Study on Jupyter Notebook

Jupyter Notebook is a popular tool among data analysts and scientists fo...
research
12/24/2017

Studying the Impact of Managers on Password Strength and Reuse

Despite their well-known security problems, passwords are still the incu...
research
03/30/2020

Repository for Reusing Artifacts of Artificial Neural Networks

Artificial Neural Networks (ANNs) replaced conventional software systems...
research
05/10/2023

Measuring the Runtime Performance of Code Produced with GitHub Copilot

GitHub Copilot is an artificially intelligent programming assistant used...
research
05/18/2020

Exploring Software Reusability Metrics with Q A Forum Data

Question and answer (Q A) forums contain valuable information regardin...
research
11/20/2019

Talking datasets: Understanding data sensemaking behaviours

The sharing and reuse of data are seen as critical to solving the most c...
research
02/11/2021

To Reuse or Not To Reuse? A Framework and System for Evaluating Summarized Knowledge

As the amount of information online continues to grow, a correspondingly...

Please sign up or login with your details

Forgot password? Click here to reset