Characterizing a Neutron-Induced Fault Model for Deep Neural Networks

The reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results since it has limited access to the hardware resources and the adopted fault models may be too naive (i.e., single and double bit flip). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This paper proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at software level. We exposed GPUs running General Matrix Multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61 show that ECC is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from RTL simulations. Our results show that by injecting complex fault models, the YOLOv3 misdetection rate is validated to be very close to the rate measured with beam experiments, which is 8.66x higher than the one measured with fault injection using only single-bit flips.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 9

research
05/31/2023

Special Session: Approximation and Fault Resiliency of DNN Accelerators

Deep Learning, and in particular, Deep Neural Network (DNN) is nowadays ...
research
06/17/2022

Experimental evaluation of neutron-induced errors on a multicore RISC-V platform

RISC-V architectures have gained importance in the last years due to the...
research
07/31/2022

enpheeph: A Fault Injection Framework for Spiking and Compressed Deep Neural Networks

Research on Deep Neural Networks (DNNs) has focused on improving perform...
research
04/28/2020

Estimating Silent Data Corruption Rates Using a Two-Level Model

High-performance and safety-critical system architects must accurately e...
research
06/20/2023

MRFI: An Open Source Multi-Resolution Fault Injection Framework for Neural Network Processing

To ensure resilient neural network processing on even unreliable hardwar...
research
02/03/2020

Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks

Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy...
research
03/13/2023

DeepVigor: Vulnerability Value Ranges and Factors for DNNs' Reliability Assessment

Deep Neural Networks (DNNs) and their accelerators are being deployed ev...

Please sign up or login with your details

Forgot password? Click here to reset