Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

05/21/2023
by   Muhammad Abdullah Hanif, et al.
0

Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed for each chip individually considering its unique fault map, which further aggravates the problem. To reduce the overall retraining cost, in this work, we introduce the concept of resilience-driven retraining amount selection. To realize this concept, we propose a novel framework, Reduce, that, at first, computes the resilience of the given DNN to faults at different fault rates and with different amounts of retraining. Then, based on the resilience, it computes the amount of retraining required for each chip considering its unique fault map. We demonstrate the effectiveness of our methodology for a systolic array-based DNN accelerator experiencing permanent faults in the computational array.

READ FULL TEXT

page 1

page 2

research
04/20/2023

eFAT: Improving the Effectiveness of Fault-Aware Training for Mitigating Permanent Faults in DNN Hardware Accelerators

Fault-Aware Training (FAT) has emerged as a highly effective technique f...
research
05/21/2023

FAQ: Mitigating the Impact of Faults in the Weight Memory of DNN Accelerators through Fault-Aware Quantization

Permanent faults induced due to imperfections in the manufacturing proce...
research
02/11/2018

Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator

Due to their growing popularity and computational cost, deep neural netw...
research
10/12/2021

MoRS: An Approximate Fault Modelling Framework for Reduced-Voltage SRAMs

On-chip memory (usually based on Static RAMs-SRAMs) are crucial componen...
research
03/14/2023

ISimDL: Importance Sampling-Driven Acceleration of Fault Injection Simulations for Evaluating the Robustness of Deep Learning

Deep Learning (DL) systems have proliferated in many applications, requi...
research
03/27/2013

Bayesian Assessment of a Connectionist Model for Fault Detection

A previous paper [2] showed how to generate a linear discriminant networ...
research
06/09/2021

HyCA: A Hybrid Computing Architecture for Fault Tolerant Deep Learning

Hardware faults on the regular 2-D computing array of a typical deep lea...

Please sign up or login with your details

Forgot password? Click here to reset