ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

12/29/2021
by   Sergey Titov, et al.
0

Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained actions. At the same time, research shows that real-world notebooks fail to do so and suffer from the lack of proper structure. To combat this, we propose ReSplit, an algorithm for an automatic re-splitting of cells in Jupyter notebooks. The algorithm analyzes definition-usage chains in the notebook and consists of two parts - merging and splitting the cells. We ran the algorithm on a large corpus of notebooks to evaluate its performance and its overall effect on notebooks, and evaluated it by human experts: we showed them several notebooks in their original and the re-split form. In 29.5 preferred way of perceiving the code. We analyze what influenced this decision and describe several individual cases in detail.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset