Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

by   Jack Kosaian, et al.

Neural networks (NNs) are increasingly employed in domains that require high reliability, such as scientific computing and safety-critical systems, as well as in environments more prone to unreliability (e.g., soft errors), such as on spacecraft. As recent work has shown that faults in NN inference can lead to mispredictions and safety hazards, it is critical to impart fault tolerance to NN inference. Algorithm-based fault tolerance (ABFT) is emerging as an appealing approach for efficient fault tolerance in NNs. In this work, we identify new, unexploited opportunities for low-overhead ABFT for NN inference: current inference-optimized GPUs have high compute-to-memory-bandwidth ratios, while many layers of current and emerging NNs have low arithmetic intensity. This leaves many convolutional and fully-connected layers in NNs memory-bandwidth-bound. These layers thus exhibit stalls in computation that could be filled by redundant execution, but that current approaches to ABFT for NN inference cannot exploit. To reduce execution-time overhead for such memory-bandwidth-bound layers, we first investigate thread-level ABFT schemes for inference-optimized GPUs that exploit this fine-grained compute underutilization. We then propose intensity-guided ABFT, an adaptive, arithmetic-intensity-guided approach to ABFT that selects the best ABFT scheme for each individual layer between traditional approaches to ABFT, which are suitable for compute-bound layers, and thread-level ABFT, which is suitable for memory-bandwidth-bound layers. Through this adaptive approach, intensity-guided ABFT reduces execution-time overhead by 1.09–5.3× across a variety of NNs, lowering the cost of fault tolerance for current and future NN inference workloads.



There are no comments yet.


page 1

page 2

page 3

page 4


Winograd Convolution: A Perspective from Fault Tolerance

Winograd convolution is originally proposed to reduce the computing over...

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Convolutional neural networks (CNNs) are becoming more and more importan...

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific ...

Enhancing Fault Tolerance of Neural Networks for Security-Critical Applications

Neural Networks (NN) have recently emerged as backbone of several sensit...

CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM

The share of the top 500 supercomputers with NVIDIA GPUs is now over 25 ...

Entropy-Based Modeling for Estimating Soft Errors Impact on Binarized Neural Network Inference

Over past years, the easy accessibility to the large scale datasets has ...

Does Fully Homomorphic Encryption Need Compute Acceleration?

Fully Homomorphic Encryption (FHE) allows arbitrarily complex computatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.