A low-overhead soft-hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systems

03/21/2020
by   Khanh N. Dang, et al.
0

The Network-on-Chip (NoC) paradigm has been proposed as a favorable solution to handle the strict communication requirements between the increasingly large number of cores on a single chip. However, NoC systems are exposed to the aggressive scaling down of transistors, low operating voltages, and high integration and power densities, making them vulnerable to permanent (hard) faults and transient (soft) errors. A hard fault in a NoC can lead to external blocking, causing congestion across the whole network. A soft error is more challenging because of its silent data corruption, which leads to a large area of erroneous data due to error propagation, packet re-transmission, and deadlock. In this paper, we present the architecture and design of a comprehensive soft error and hard fault-tolerant 3D-NoC system, named 3D-Hard-Fault-Soft-Error-Tolerant-OASIS-NoC (3D-FETO). With the aid of efficient mechanisms and algorithms, 3D-FETO is capable of detecting and recovering from soft errors which occur in the routing pipeline stages and leverages reconfigurable components to handle permanent faults in links, input buffers, and crossbars. In-depth evaluation results show that the 3D-FETO system is able to work around different kinds of hard faults and soft errors, ensuring graceful performance degradation, while minimizing additional hardware complexity and remaining power efficient.

READ FULL TEXT

page 4

page 6

page 12

page 13

page 14

research
03/21/2020

Soft-Error and Hard-fault Tolerant Architecture and Routing Algorithm for Reliable 3D-NoC Systems

Network-on-Chip (NoC) paradigm has been proposed as an auspicious soluti...
research
09/11/2020

DMR-based Technique for Fault Tolerant AES S-box Architecture

This paper presents a high-throughput fault-resilient hardware implement...
research
03/21/2020

Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems

Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an a...
research
06/19/2020

Design of a Near-Ideal Fault-Tolerant Routing Algorithm for Network-on-Chip-Based Multicores

With relentless CMOS technology downsizing Networks-on-Chips (NoCs) are ...
research
12/12/2017

OpenSEA: Semi-Formal Methods for Soft Error Analysis

Alpha-particles and cosmic rays cause bit flips in chips. Protection cir...
research
05/03/2020

Behind the Last Line of Defense – Surviving SoC Faults and Intrusions

Today, leveraging the enormous modular power, diversity and flexibility ...
research
03/12/2021

FT-GCR: a fault-tolerant generalized conjugate residual elliptic solver

With the steady advance of high performance computing systems featuring ...

Please sign up or login with your details

Forgot password? Click here to reset