ElasticNotebook: Enabling Live Migration for Computational Notebooks (Technical Report)

09/20/2023
by   Zhaoheng Li, et al.
0

Computational notebooks (e.g., Jupyter, Google Colab) are widely used for interactive data science and machine learning. In those frameworks, users can start a session, then execute cells (i.e., a set of statements) to create variables, train models, visualize results, etc. Unfortunately, existing notebook systems do not offer live migration: when a notebook launches on a new machine, it loses its state, preventing users from continuing their tasks from where they had left off. This is because, unlike DBMS, the sessions directly rely on underlying kernels (e.g., Python/R interpreters) without an additional data management layer. Existing techniques for preserving states, such as copying all variables or OS-level checkpointing, are unreliable (often fail), inefficient, and platform-dependent. Also, re-running code from scratch can be highly time-consuming. In this paper, we introduce a new notebook system, ElasticNotebook, that offers live migration via checkpointing/restoration using a novel mechanism that is reliable, efficient, and platform-independent. Specifically, by observing all cell executions via transparent, lightweight monitoring, can find a reliable and efficient way (i.e., replication plan) for reconstructing the original session state, considering variable-cell dependencies, observed runtime, variable sizes, etc. To this end, our new graph-based optimization problem finds how to reconstruct all variables (efficiently) from a subset of variables that can be transferred across machines. We show that ElasticNotebook reduces end-to-end migration and restoration times by 85 Kaggle, JWST, and Tutorial) of notebooks with negligible runtime and memory overheads of <2.5

READ FULL TEXT
research
01/14/2021

Checkpoint, Restore, and Live Migration for Science Platforms

We demonstrate a fully functional implementation of (per-user) checkpoin...
research
09/06/2023

UMS: Live Migration of Containerized Services across Autonomous Computing Systems

Containerized services deployed within various computing systems, such a...
research
01/24/2021

SLA-Aware Multiple Migration Planning and Scheduling in SDN-NFV-enabled Clouds

In Software-Defined Networking (SDN)-enabled cloud data centers, live mi...
research
09/15/2020

MigrOS: Transparent Operating Systems Live Migration Support for Containerised RDMA-applications

Major data centre providers are introducing RDMA-based networks for thei...
research
07/01/2021

Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds

Interactive computing notebooks, such as Jupyter notebooks, have become ...
research
12/04/2018

Megaphone: Live state migration for distributed streaming dataflows

We design and implement Megaphone, a data migration mechanism for statef...
research
05/16/2018

NFVactor: A Resilient NFV System using the Distributed Actor Model

Resilience functionality, including failure resilience and flow migratio...

Please sign up or login with your details

Forgot password? Click here to reset