ZOFI: Zero-Overhead Fault Injection Tool for Fast Transient Fault Coverage Analysis

06/22/2019
by   Vasileios Porpodas, et al.
0

The experimental evaluation of fault-tolerance studies relies on tools that inject errors while programs are running, and then monitor the execution and the output for faulty execution. In particular, the established methodology in software-based transient-fault reliability studies, involves running each workload hundreds or thousands of times, injecting a random bit-flip in the process. The majority of such studies rely on custom-built fault-injection tools that are based on either a modified processor simulator, or a code instrumentation framework. Such tools are non-trivial to develop, and are usually orders of magnitude slower than native execution. In this paper we present ZOFI, a novel timing-based fault-injection tool that is aimed at being used in fault-coverage studies for transient faults. ZOFI is a zero-overhead tool, meaning that the analyzed workload runs at native speed. This is orders-of-magnitude faster compared to common approaches that are designed around simulators or code instrumentation tools. ZOFI is free software and is available at https://github.com/vporpo/zofi.

READ FULL TEXT

page 6

page 8

research
01/24/2020

Accelerating Transient Fault Injection Campaigns by using Dynamic HDL Slicing

Along with the complexity of electronic systems for safety-critical appl...
research
05/11/2020

ProFIPy: Programmable Software Fault Injection as-a-Service

In this paper, we present a new fault injection tool (ProFIPy) for Pytho...
research
11/30/2019

Hardware Versus Software Fault Injection of Modern Undervolted SRAMs

To improve power efficiency, researchers are experimenting with dynamica...
research
07/26/2018

FINJ: A Fault Injection Tool for HPC Systems

We present FINJ, a high-level fault injection tool for High-Performance ...
research
03/07/2017

Redundancy Suppression In Time-Aware Dynamic Binary Instrumentation

Software tracing techniques are well-established and used by instrumenta...
research
06/11/2012

RepTFD: Replay Based Transient Fault Detection

The advances in IC process make future chip multiprocessors (CMPs) more ...
research
03/09/2021

Near-zero Downtime Recovery from Transient-error-induced Crashes

Due to the system scaling, transient errors caused by external noises, e...

Please sign up or login with your details

Forgot password? Click here to reset