On-Chip Sensors Data Collection and Analysis for SoC Health Management

08/30/2023
by   Konstantin Shibin, et al.
0

Data produced by on-chip sensors in modern SoCs contains a large amount of information such as occurring faults, aging status, accumulated radiation dose, performance characteristics, environmental and other operational parameters. Such information provides insight into the overall health of a system's hardware as well as the operability of individual modules. This gives a chance to mitigate faults and avoid using faulty units, thus enabling hardware health management. Raw data from embedded sensors cannot be immediately used to perform health management tasks. In most cases, the information about occurred faults needs to be analyzed taking into account the history of the previously reported fault events and other collected statistics. For this purpose, we propose a special structure called Health Map (HM) that holds the information about functional resources, occurring faults and maps relationships between these. In addition, we propose algorithms for aggregation and classification of data received from on-chip sensors. The proposed Health Map contains detailed information on a particular system level (e.g., module, SoC, board) that can be compiled into a summary of hardware health status that in its turn enables distributed hierarchical health management by using this information at a higher level of system hierarchy, thus increasing the system's availability and effective lifetime.

READ FULL TEXT

page 1

page 3

page 6

research
03/03/2023

Holistic IJTAG-based External and Internal Fault Monitoring in UAVs

Cyber-Physical Systems (CPSs), such as Unmanned Aerial Vehicles (UAVs), ...
research
06/19/2023

Understanding the Effects of Permanent Faults in GPU's Parallelism Management and Control Units

Graphics Processing Units (GPUs) are over-stressed to accelerate High-Pe...
research
10/05/2020

FaultNet: A Deep Convolutional Neural Network for bearing fault classification

The increased presence of advanced sensors on the production floors has ...
research
01/18/2023

Chip Guard ECC: An Efficient, Low Latency Method

Chip Guard is a new approach to symbol-correcting error correction codes...
research
02/08/2017

FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

To avoid packet loss and deadlock scenarios that arise due to faults or ...
research
03/23/2021

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

Health management is getting increasing attention all over the world. Ho...
research
05/16/2023

Newad: A register map automation tool for Verilog

Large scale scientific instrumentation-and-control FPGA gateware designs...

Please sign up or login with your details

Forgot password? Click here to reset