TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications

04/03/2020
by   Zitao Chen, et al.
0

As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience techniques (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application's resilience. In this work, we present TensorFI, a high-level fault injection (FI) framework for TensorFlow-based applications. TensorFI is able to inject both hardware and software faults in general TensorFlow programs. TensorFI is a configurable FI tool that is flexible, easy to use, and portable. It can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., faults in particular operators). We use TensorFI to evaluate the resilience of 12 ML programs, including DNNs used in the autonomous vehicle domain. Our tool is publicly available at https://github.com/DependableSystemsLab/TensorFI.

READ FULL TEXT
research
07/01/2019

ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection

The safety and resilience of fully autonomous vehicles (AVs) are of sign...
research
07/01/2019

Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors

Fully autonomous vehicles (AVs), i.e., AVs with autonomy level 5, are ex...
research
12/07/2018

PARIS: Predicting Application Resilience Using Machine Learning

Extreme-scale scientific applications can be more vulnerable to soft err...
research
09/20/2023

Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework

Creating resilient machine learning (ML) systems has become necessary to...
research
09/05/2019

TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

The increasing inclusion of Machine Learning (ML) models in safety criti...
research
03/13/2023

CHESS: A Framework for Evaluation of Self-adaptive Systems based on Chaos Engineering

There is an increasing need to assess the correct behavior of self-adapt...
research
10/17/2021

Characterizing and Improving the Resilience of Accelerators in Autonomous Robots

Motion planning is a computationally intensive and well-studied problem ...

Please sign up or login with your details

Forgot password? Click here to reset