Design of a Near-Ideal Fault-Tolerant Routing Algorithm for Network-on-Chip-Based Multicores

06/19/2020
by   Costas Iordanou, et al.
0

With relentless CMOS technology downsizing Networks-on-Chips (NoCs) are inescapably experiencing escalating susceptibility to wearout and reduced reliability. While faults in processors and memories may be masked via redundancy, or mitigated via techniques such as task migration, NoCs are especially vulnerable to hardware faults as a single link breakdown may cause inter-tile communication to halt indefinitely, rendering the whole multicore chip inoperable. As such, NoCs impose the risk of becoming the pivotal point of failure in chip multicores that utilize them. Aiming towards seamless NoC operation in the presence of faulty links we propose Hermes, a near-ideal fault-tolerant routing algorithm that meets the objectives of exhibiting high levels of robustness, operating in a distributed mode, guaranteeing freedom from deadlocks, and evening-out traffic, among many. Hermes is a limited-overhead deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-throughput, while providing pre-reconfigured escape path selection in the vicinity of faults. Under such online mechanisms, Hermes's performance degrades gracefully with increasing faulty link counts, a crucially desirable response lacking in prior-art. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed such that packets being routed to physically isolated regions cause no network stagnation due to indefinite chained blockages starting at sub-network boundaries. An extensive experimental evaluation, including utilizing traffic workloads gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against the state-of-the-art. Further, hardware synthesis results prove Hermes's efficacy.

READ FULL TEXT

page 1

page 4

page 5

page 12

page 14

research
12/16/2021

DeFT: A Deadlock-Free and Fault-Tolerant Routing Algorithm for 2.5D Chiplet Networks

By interconnecting smaller chiplets through an interposer, 2.5D integrat...
research
03/21/2020

A low-overhead soft-hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systems

The Network-on-Chip (NoC) paradigm has been proposed as a favorable solu...
research
10/10/2019

Remote Control: A Simple Deadlock Avoidance Scheme for Modular System on Chip

The increase in design cost and complexity have motivated designers to a...
research
02/08/2017

FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

To avoid packet loss and deadlock scenarios that arise due to faults or ...
research
06/09/2021

HyCA: A Hybrid Computing Architecture for Fault Tolerant Deep Learning

Hardware faults on the regular 2-D computing array of a typical deep lea...
research
07/18/2017

Logic Programming approaches for routing fault-free and maximally-parallel Wavelength Routed Optical Networks on Chip (Application paper)

One promising trend in digital system integration consists of boosting o...
research
12/10/2018

Machine Learning-based Link Fault Identification and Localization in Complex Networks

With the proliferation of network devices and rapid development in infor...

Please sign up or login with your details

Forgot password? Click here to reset