Checkpoint Placement for Systematic Fault-Injection Campaigns

08/10/2023
by   Christian Dietrich, et al.
0

Shrinking hardware structures and decreasing operating voltages lead to an increasing number of transient hardware faults,which thus become a core problem to consider for safety-critical systems. Here, systematic fault injection (FI), where one program-under-test is systematically stressed with faults, provides an in-depth resilience analysis in the presence of faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. One cost factor is the forwarding phase, which is the time required to bring the system-under test into the fault-free state at injection time. One common technique to speed up the forwarding are checkpoints of the fault-free system state at fixed points in time. In this paper, we show that the placement of checkpoints has a significant influence on the required forwarding cycles, especially if we place faults non-uniformly on the time axis. For this, we discuss the checkpoint-selection problem in general, formalize it as a maximum-weight reward path problem in graphs, propose an ILP formulation and a dynamic programming algorithm that find the optimal solution, and provide a heuristic checkpoint-selection method based on a genetic algorithm. Applied to the MiBench benchmark suite, our approach consistently reduces the forward-phase cycles by at least 88 percent and up to 99.934 percent when placing 16 checkpoints.

READ FULL TEXT

page 1

page 5

page 6

page 7

research
01/24/2020

Accelerating Transient Fault Injection Campaigns by using Dynamic HDL Slicing

Along with the complexity of electronic systems for safety-critical appl...
research
04/27/2022

MetFI: Model-driven Fault Simulation Framework

Safety-critical designs need to ensure reliable operations under hostile...
research
02/03/2021

Exploring Fault Parameter Space using Reinforcement Learning-based Fault Injection

Assessing the safety of complex Cyber-Physical Systems (CPS) is a challe...
research
07/17/2018

Experimental Resilience Assessment of An Open-Source Driving Agent

Autonomous vehicles (AV) depend on the sensors like RADAR and camera for...
research
03/14/2023

ISimDL: Importance Sampling-Driven Acceleration of Fault Injection Simulations for Evaluating the Robustness of Deep Learning

Deep Learning (DL) systems have proliferated in many applications, requi...
research
09/19/2022

Distributed Execution Indexing

This work-in-progress report presents both the design and partial evalua...
research
03/02/2017

Adapting the DMTCP Plugin Model for Checkpointing of Hardware Emulation

Checkpoint-restart is now a mature technology. It allows a user to save ...

Please sign up or login with your details

Forgot password? Click here to reset