CHEX: Multiversion Replay with Ordered Checkpoints

02/17/2022
by   Naga Nithin Manne, et al.
0

In scientific computing and data science disciplines, it is often necessary to share application workflows and repeat results. Current tools containerize application workflows, and share the resulting container for repeating results. These tools, due to containerization, do improve sharing of results. However, they do not improve the efficiency of replay. In this paper, we present the multiversion replay problem which arises when multiple versions of an application are containerized, and each version must be replayed to repeat results. To avoid executing each version separately, we develop CHEX, which checkpoints program state and determines when it is permissible to reuse program state across versions. It does so using system call-based execution lineage. Our capability to identify common computations across versions enables us to consider optimizing replay using an in-memory cache, based on a checkpoint-restore-switch system. We show the multiversion replay problem is NP-hard, and propose efficient heuristics for it. CHEX reduces overall replay time by sharing common computations but avoids storing a large number of checkpoints. We demonstrate that CHEX maintains lightweight package sharing, and improves the total time of multiversion replay by 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2011

Efficient Deterministic Replay Using Complete Race Detection

Data races can significantly affect the executions of multi-threaded pro...
research
08/13/2023

Towards Efficient Record and Replay: A Case Study in WeChat

WeChat, a widely-used messenger app boasting over 1 billion monthly acti...
research
12/06/2021

Virtual Replay Cache

Return caching is a recent strategy that enables efficient minibatch tra...
research
04/09/2022

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks

In this paper, we investigate the properties of the cepstrogram and demo...
research
05/19/2022

Transformer with Memory Replay

Transformers achieve state-of-the-art performance for natural language p...
research
07/04/2022

Progressive Latent Replay for efficient Generative Rehearsal

We introduce a new method for internal replay that modulates the frequen...
research
04/15/2020

Implementing Software Resiliency in HPX for Extreme Scale Computing

Exceptions and errors occurring within mission critical applications due...

Please sign up or login with your details

Forgot password? Click here to reset