FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

02/08/2017
by   Pengju Ren, et al.
0

To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dynamically adapt to component failures. First, we introduce a distributed intelligence unit, called Self-Awareness Module (SAM), which allows the router to detect permanent component failures and build a network connectivity map. Using local information, SAM adapts to faults, guarantees connectivity and deadlock-free routing inside the maximal connected subgraph and keeps routing tables up-to-date. Next, to reconfigure network links or virtual channels around faulty/power-gated components, we add bidirectional link and unified virtual channel structure features to the Fashion router. This version of the router, named Ex-Fashion, further mitigates the negative system performance impacts, leads to larger maximal connected subgraph and sustains a relatively high degree of fault-tolerance. To support the router, we develop a fault diagnosis and recovery algorithm executed by the Built-In Self-Test, self-monitoring, and self-reconfiguration units at runtime to provide fault-tolerant system functionalities. The Fashion router places no restriction on topology, position or number of faults. It drops 54.3-55.4 fewer nodes for same number of faults (between 30 and 60 faults) in an 8x8 2D-mesh over other state-of-the-art solutions. It is scalable and efficient. The area overheads are 2.311 2D-meshes using the TSMC 65nm library at 1.38GHz clock frequency.

READ FULL TEXT
research
10/25/2018

A General, Fault tolerant, Adaptive, Deadlock-free Routing Protocol for Network-on-chip

The paper presents a topology-agnostic greedy protocol for network-on-ch...
research
12/16/2021

DeFT: A Deadlock-Free and Fault-Tolerant Routing Algorithm for 2.5D Chiplet Networks

By interconnecting smaller chiplets through an interposer, 2.5D integrat...
research
06/19/2020

Design of a Near-Ideal Fault-Tolerant Routing Algorithm for Network-on-Chip-Based Multicores

With relentless CMOS technology downsizing Networks-on-Chips (NoCs) are ...
research
06/24/2022

Relative Survivable Network Design

One of the most important and well-studied settings for network design i...
research
03/21/2020

Soft-Error and Hard-fault Tolerant Architecture and Routing Algorithm for Reliable 3D-NoC Systems

Network-on-Chip (NoC) paradigm has been proposed as an auspicious soluti...
research
08/01/2019

Runtime Mitigation of Packet Drop Attacks in Fault-tolerant Networks-on-Chip

Fault-tolerant routing (FTR) in Networks-on-Chip (NoCs) has become a com...
research
08/30/2023

On-Chip Sensors Data Collection and Analysis for SoC Health Management

Data produced by on-chip sensors in modern SoCs contains a large amount ...

Please sign up or login with your details

Forgot password? Click here to reset