ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

12/29/2021
by   Sergey Titov, et al.
0

Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained actions. At the same time, research shows that real-world notebooks fail to do so and suffer from the lack of proper structure. To combat this, we propose ReSplit, an algorithm for an automatic re-splitting of cells in Jupyter notebooks. The algorithm analyzes definition-usage chains in the notebook and consists of two parts - merging and splitting the cells. We ran the algorithm on a large corpus of notebooks to evaluate its performance and its overall effect on notebooks, and evaluated it by human experts: we showed them several notebooks in their original and the re-split form. In 29.5 preferred way of perceiving the code. We analyze what influenced this decision and describe several individual cases in detail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Negative binomial count splitting for single-cell RNA sequencing data

The analysis of single-cell RNA sequencing (scRNA-seq) data often involv...
research
06/08/2010

The Deterministic Dendritic Cell Algorithm

The Dendritic Cell Algorithm is an immune-inspired algorithm orig- inall...
research
11/27/2018

A Notebook Format for the Holistic Design of Embedded Systems (Tool Paper)

This paper proposes the use of notebooks for the design documentation an...
research
01/25/2018

Structuring Spreadsheets with the "Lish" Data Model

A spreadsheet is remarkably flexible in representing various forms of st...
research
07/01/2022

Inference after latent variable estimation for single-cell RNA sequencing data

In the analysis of single-cell RNA sequencing data, researchers often ch...
research
02/10/2023

Dealing with diffuse contaminants in single-droplet sequencing of transduced cells

In a screening experiment, genes are transduced into cells to determine ...
research
06/04/2020

Abstracting spreadsheet data flow through hypergraph redrawing

We believe the error prone nature of traditional spreadsheets is due to ...

Please sign up or login with your details

Forgot password? Click here to reset