On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster

05/25/2022
by   Michael Rogenmoser, et al.
0

With the shrinking of technology nodes and the use of parallel processor clusters in hostile and critical environments, such as space, run-time faults caused by radiation are a serious cross-cutting concern, also impacting architectural design. This paper introduces an architectural approach to run-time configurable soft-error tolerance at the core level, augmenting a six-core open-source RISC-V cluster with a novel On-Demand Redundancy Grouping (ODRG) scheme. ODRG allows the cluster to operate either as two fault-tolerant cores, or six individual cores for high-performance, with limited overhead to switch between these modes during run-time. The ODRG unit adds less than 11 a core's area for a three-core group, or a total of 1 shows negligible timing increase, which compares favorably to a commercial state-of-the-art implementation, and is 2.5× faster in fault recovery re-synchronization. Furthermore, unlike other implementations, when redundancy is not necessary, the ODRG approach allows the redundant cores to be used for independent computation, allowing up to 2.96× increase in performance for selected applications.

READ FULL TEXT
research
06/11/2012

RepTFD: Replay Based Transient Fault Detection

The advances in IC process make future chip multiprocessors (CMPs) more ...
research
03/15/2022

A Survey of fault models and fault tolerance methods for 2D bus-based multi-core systems and TSV based 3D NOC many-core systems

Reliability has taken centre stage in the development of high-performanc...
research
04/11/2023

Enhancement in Reliability for Multi-core system consisting of One Instruction Cores

Rapid CMOS device size reduction resulted in billions of transistors on ...
research
08/01/2019

Generalized Fault-Tolerance Topology Generation for Application Specific Network-on-Chips

The Network-on-Chips is a promising candidate for addressing communicati...
research
12/30/2021

A Survey of fault mitigation techniques for multi-core architectures

Fault tolerance in multi-core architecture has attracted attention of re...
research
01/13/2020

SERAD: Soft Error Resilient Asynchronous Design using a Bundled Data Protocol

The risk of soft errors due to radiation continues to be a significant c...
research
05/22/2020

Accelerate Cycle-Level Full-System Simulation of Multi-Core RISC-V Systems with Binary Translation

It has always been difficult to balance the accuracy and performance of ...

Please sign up or login with your details

Forgot password? Click here to reset