SoftSNN: Low-Cost Fault Tolerance for Spiking Neural Network Accelerators under Soft Errors

Specialized hardware accelerators have been designed and employed to maximize the performance efficiency of Spiking Neural Networks (SNNs). However, such accelerators are vulnerable to transient faults (i.e., soft errors), which occur due to high-energy particle strikes, and manifest as bit flips at the hardware layer. These errors can change the weight values and neuron operations in the compute engine of SNN accelerators, thereby leading to incorrect outputs and accuracy degradation. However, the impact of soft errors in the compute engine and the respective mitigation techniques have not been thoroughly studied yet for SNNs. A potential solution is employing redundant executions (re-execution) for ensuring correct outputs, but it leads to huge latency and energy overheads. Toward this, we propose SoftSNN, a novel methodology to mitigate soft errors in the weight registers (synapses) and neurons of SNN accelerators without re-execution, thereby maintaining the accuracy with low latency and energy overheads. Our SoftSNN methodology employs the following key steps: (1) analyzing the SNN characteristics under soft errors to identify faulty weights and neuron operations, which are required for recognizing faulty SNN behavior; (2) a Bound-and-Protect technique that leverages this analysis to improve the SNN fault tolerance by bounding the weight values and protecting the neurons from faulty operations; and (3) devising lightweight hardware enhancements for the neural hardware accelerator to efficiently support the proposed technique. The experimental results show that, for a 900-neuron network with even a high fault rate, our SoftSNN maintains the accuracy degradation below 3 respectively, as compared to the re-execution technique.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2023

RescueSNN: Enabling Reliable Executions on Spiking Neural Network Accelerators under Permanent Faults

To maximize the performance and energy efficiency of Spiking Neural Netw...
research
04/10/2020

A Survey on Impact of Transient Faults on BNN Inference Accelerators

Over past years, the philosophy for designing the artificial intelligenc...
research
08/23/2021

ReSpawn: Energy-Efficient Fault-Tolerance for Spiking Neural Networks considering Unreliable Memories

Spiking neural networks (SNNs) have shown a potential for having low ene...
research
07/17/2023

Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

Deep recommendation systems (DRS) heavily depend on specialized HPC hard...
research
01/12/2023

Improving Reliability of Spiking Neural Networks through Fault Aware Threshold Voltage Optimization

Spiking neural networks have made breakthroughs in computer vision by le...
research
06/14/2018

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Machine Learning (ML) is making a strong resurgence in tune with the mas...
research
02/02/2021

Bit Error Tolerance Metrics for Binarized Neural Networks

To reduce the resource demand of neural network (NN) inference systems, ...

Please sign up or login with your details

Forgot password? Click here to reset