Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

04/19/2021
by   Jack Kosaian, et al.
0

Neural networks (NNs) are increasingly employed in domains that require high reliability, such as scientific computing and safety-critical systems, as well as in environments more prone to unreliability (e.g., soft errors), such as on spacecraft. As recent work has shown that faults in NN inference can lead to mispredictions and safety hazards, it is critical to impart fault tolerance to NN inference. Algorithm-based fault tolerance (ABFT) is emerging as an appealing approach for efficient fault tolerance in NNs. In this work, we identify new, unexploited opportunities for low-overhead ABFT for NN inference: current inference-optimized GPUs have high compute-to-memory-bandwidth ratios, while many layers of current and emerging NNs have low arithmetic intensity. This leaves many convolutional and fully-connected layers in NNs memory-bandwidth-bound. These layers thus exhibit stalls in computation that could be filled by redundant execution, but that current approaches to ABFT for NN inference cannot exploit. To reduce execution-time overhead for such memory-bandwidth-bound layers, we first investigate thread-level ABFT schemes for inference-optimized GPUs that exploit this fine-grained compute underutilization. We then propose intensity-guided ABFT, an adaptive, arithmetic-intensity-guided approach to ABFT that selects the best ABFT scheme for each individual layer between traditional approaches to ABFT, which are suitable for compute-bound layers, and thread-level ABFT, which is suitable for memory-bandwidth-bound layers. Through this adaptive approach, intensity-guided ABFT reduces execution-time overhead by 1.09–5.3× across a variety of NNs, lowering the cost of fault tolerance for current and future NN inference workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2022

Winograd Convolution: A Perspective from Fault Tolerance

Winograd convolution is originally proposed to reduce the computing over...
research
03/27/2020

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Convolutional neural networks (CNNs) are becoming more and more importan...
research
04/02/2021

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific ...
research
02/05/2019

Enhancing Fault Tolerance of Neural Networks for Security-Critical Applications

Neural Networks (NN) have recently emerged as backbone of several sensit...
research
11/11/2020

FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

Deep neural networks (DNNs) are state-of-the-art algorithms for multiple...
research
08/24/2020

CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM

The share of the top 500 supercomputers with NVIDIA GPUs is now over 25 ...
research
04/10/2020

Entropy-Based Modeling for Estimating Soft Errors Impact on Binarized Neural Network Inference

Over past years, the easy accessibility to the large scale datasets has ...

Please sign up or login with your details

Forgot password? Click here to reset