Engineering Record And Replay For Deployability: Extended Technical Report

05/16/2017
by   Robert O'Callahan, et al.
0

The ability to record and replay program executions with low overhead enables many applications, such as reverse-execution debugging, debugging of hard-to-reproduce test failures, and "black box" forensic analysis of failures in deployed systems. Existing record-and-replay approaches limit deployability by recording an entire virtual machine (heavyweight), modifying the OS kernel (adding deployment and maintenance costs), requiring pervasive code instrumentation (imposing significant performance and complexity overhead), or modifying compilers and runtime systems (limiting generality). We investigated whether it is possible to build a practical record-and-replay system avoiding all these issues. The answer turns out to be yes - if the CPU and operating system meet certain non-obvious constraints. Fortunately modern Intel CPUs, Linux kernels and user-space frameworks do meet these constraints, although this has only become true recently. With some novel optimizations, our system 'rr' records and replays real-world low-parallelism workloads with low overhead, with an entirely user-space implementation, using stock hardware, compilers, runtimes and operating systems. "rr" forms the basis of an open-source reverse-execution debugger seeing significant use in practice. We present the design and implementation of 'rr', describe its performance on a variety of workloads, and identify constraints on hardware and operating system design required to support our approach.

READ FULL TEXT
research
05/22/2018

Optimal Record and Replay under Causal Consistency

We investigate the minimum record needed to replay executions of process...
research
03/22/2023

IRIS: a Record and Replay Framework to Enable Hardware-assisted Virtualization Fuzzing

Nowadays, industries are looking into virtualization as an effective mea...
research
04/04/2018

iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications

Reproducing executions of multithreaded programs is very challenging due...
research
04/15/2020

Implementing Software Resiliency in HPX for Extreme Scale Computing

Exceptions and errors occurring within mission critical applications due...
research
09/06/2019

Lightweight Record-and-Replay for Intermittent Tests Failures

In this paper we present lightweight record-and-replay (RR). In contrast...
research
06/12/2020

Hindsight Logging for Model Training

Due to the long time-lapse between the triggering and detection of a bug...
research
12/09/2022

An Implementation of the Extended Tower Number Field Sieve using 4d Sieving in a Box and a Record Computation in Fp4

We report on an implementation of the Extended Tower Number Field Sieve ...

Please sign up or login with your details

Forgot password? Click here to reset