Making Convolutions Resilient via Algorithm-Based Error Detection Techniques

06/08/2020
by   Siva Kumar Sastry Hari, et al.
0

The ability of Convolutional Neural Networks (CNNs) to accurately process real-time telemetry has boosted their use in safety-critical and high-performance computing systems. As such systems require high levels of resilience to errors, CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a prohibitive 100 but the practical feasibility and performance of such techniques have never been studied for CNN deployment platforms (e.g., TensorFlow or TensorRT on GPUs). In this paper, we focus on algorithmically verifying Convolutions, which are the most resource-demanding operations in CNNs. We use checksums to verify convolutions, adding a small amount of redundancy, far less than full-duplication. We first identify the challenges that arise in employing Algorithm-Based Error Detection (ABED) for Convolutions in optimized inference platforms that fuse multiple network layers and use reduced-precision operations, and demonstrate how to overcome them. We propose and evaluate variations of ABED techniques that offer implementation complexity, runtime overhead, and coverage trade-offs. Results show that ABED can detect all transient hardware errors that might otherwise corrupt output and does so while incurring low runtime overheads (6-23 workloads compared to full duplication.

READ FULL TEXT

page 4

page 7

research
02/22/2020

HarDNN: Feature Map Vulnerability Evaluation in CNNs

As Convolutional Neural Networks (CNNs) are increasingly being employed ...
research
03/27/2020

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Convolutional neural networks (CNNs) are becoming more and more importan...
research
12/02/2019

ReD-CaNe: A Systematic Methodology for Resilience Analysis and Design of Capsule Networks under Approximations

Recent advances in Capsule Networks (CapsNets) have shown their superior...
research
12/16/2019

Efficient Error-Tolerant Quantized Neural Network Accelerators

Neural Networks are currently one of the most widely deployed machine le...
research
03/26/2018

Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision

We study performance characteristics of convolutional neural networks (C...
research
04/12/2019

Parity-Based Concurrent Error Detection Schemes for the ChaCha Stream Cipher

We propose two parity-based concurrent error detection schemes for the Q...
research
11/11/2020

FastPathology: An open-source platform for deep learning-based research and decision support in digital pathology

Deep convolutional neural networks (CNNs) are the current state-of-the-a...

Please sign up or login with your details

Forgot password? Click here to reset