Algorithm-Based Fault Tolerance for Convolutional Neural Networks

03/27/2020
by   Kai Zhao, et al.
0

Convolutional neural networks (CNNs) are becoming more and more important for solving challenging and critical problems in many fields. CNN inference applications have been deployed in safety-critical systems, which may suffer from soft errors caused by high-energy particles, high temperature, or abnormal voltage. Of critical importance is ensuring the stability of the CNN inference process against soft errors. Traditional fault tolerance methods are not suitable for CNN inference because error-correcting code is unable to protect computational components, instruction duplication techniques incur high overhead, and existing algorithm-based fault tolerance (ABFT) schemes cannot protect all convolution implementations. In this paper, we focus on how to protect the CNN inference process against soft errors as efficiently as possible, with the following three contributions. (1) We propose several systematic ABFT schemes based on checksum techniques and analyze their pros and cons thoroughly. Unlike traditional ABFT based on matrix-matrix multiplication, our schemes support any convolution implementations. (2) We design a novel workflow integrating all the proposed schemes to obtain a high detection/correction ability with limited total runtime overhead. (3) We perform our evaluation using ImageNet with well-known CNN models including AlexNet, VGG-19, ResNet-18, and YOLOv2. Experimental results demonstrate that our implementation can handle soft errors with very limited runtime overhead (4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2019

Algorithm-Based Fault Tolerance for Parallel Stencil Computations

The increase in HPC systems size and complexity, together with increasin...
research
04/19/2021

Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

Neural networks (NNs) are increasingly employed in domains that require ...
research
06/08/2020

Making Convolutions Resilient via Algorithm-Based Error Detection Techniques

The ability of Convolutional Neural Networks (CNNs) to accurately proces...
research
08/16/2021

Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision

Convolutional neural networks (CNNs) have become an established part of ...
research
04/02/2021

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific ...
research
02/22/2020

HarDNN: Feature Map Vulnerability Evaluation in CNNs

As Convolutional Neural Networks (CNNs) are increasingly being employed ...
research
10/31/2019

In-Place Zero-Space Memory Protection for CNN

Convolutional Neural Networks (CNN) are being actively explored for safe...

Please sign up or login with your details

Forgot password? Click here to reset