The Study of Transient Faults Propagation in Multithread Applications

07/28/2016
by   Navid Khoshavi, et al.
0

Whereas contemporary Error Correcting Codes (ECC) designs occupy a significant fraction of total die area in chip-multiprocessors (CMPs), approaches to deal with the vulnerability increase of CMP architecture against Single Event Upsets (SEUs) and Multi-Bit Upsets (MBUs) are sought. In this paper, we focus on reliability assessment of multithreaded applications running on CMPs to propose an adaptive application-relevant architecture design to accommodate the impact of both SEUs and MBUs in the entire CMP architecture. This work concentrates on leveraging the intrinsic soft-error-immunity feature of Spin-Transfer Torque RAM (STT-RAM) as an alternative for SRAM-based storage and operation components. We target a specific portion of working set for reallocation to improve the reliability level of the CMP architecture design. A selected portion of instructions in multithreaded program which experience high rate of referencing with the lowest memory modification are ideal candidate to be stored and executed in STT-RAM based components. We argue about why we cannot use STT-RAM for the global storage and operation counterparts and describe the obtained resiliency compared to the baseline setup. In addition, a detail study of the impact of SEUs and MBUs on multithreaded programs will be presented in the Appendix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2021

MUSE: Multi-Use Error Correcting Codes

In this work we present a new set of error correcting codes – Multi-Use ...
research
03/22/2023

A Cycle-Accurate Soft Error Vulnerability Analysis Framework for FPGA-based Designs

Many aerospace and automotive applications use FPGAs in their designs du...
research
03/21/2020

Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems

Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an a...
research
01/18/2023

Chip Guard ECC: An Efficient, Low Latency Method

Chip Guard is a new approach to symbol-correcting error correction codes...
research
12/23/2021

Dependability Analysis of Data Storage Systems in Presence of Soft Errors

In recent years, high availability and reliability of Data Storage Syste...
research
09/25/2022

Navigating the dynamic noise landscape of variational quantum algorithms with QISMET

Transient errors from the dynamic NISQ noise landscape are challenging t...
research
12/25/2016

Neutron induced strike: On the likelihood of multiple bit-flips in logic circuits

High energy particles from cosmic rays or packaging materials can genera...

Please sign up or login with your details

Forgot password? Click here to reset