Calculation of the High-Energy Neutron Flux for Anticipating Errors and Recovery Techniques in Exascale Supercomputer Centres

12/15/2022
by   Hernán Asorey, et al.
0

The age of exascale computing has arrived and the risks associated with neutron and other atmospheric radiation are becoming more critical as the computing power increases, hence, the expected Mean Time Between Failures will be reduced because of this radiation. In this work, a new and detailed calculation of the neutron flux for energies above 50 MeV is presented. This has been done by using state-of-the-art Monte Carlo astroparticle techniques and including real atmospheric profiles at each one of the next 23 exascale supercomputing facilities. Atmospheric impact in the flux and seasonal variations were observed and characterised, and the barometric coefficient for high-energy neutrons at each site was obtained. With these coefficients, potential risks of errors associated with the increase in the flux of energetic neutrons, such as the occurrence of single event upsets or transients, and the corresponding failure-in-time rates, can be anticipated just by using the atmospheric pressure before the assignation of resources to critical tasks at each exascale facility. For more clarity, examples about how the rate of failures is affected by the cosmic rays are included, so administrators will better anticipate which more or less restrictive actions could take for overcoming errors.

READ FULL TEXT
research
07/27/2018

A Model for Android and iOS Applications Risk Calculation: CVSS Analysis and Enhancement Using Case-Control Studies

Various researchers have shown that the Common Vulnerability Scoring Sys...
research
12/09/2019

High performance computing and energy efficiency: focus on OpenFOAM

High performance calculation is increasingly used within society. Previo...
research
11/21/2019

Predicting Failures in Multi-Tier Distributed Systems

Many applications are implemented as multi-tier software systems, and ar...
research
06/14/2017

Towards Adaptive Resilience in High Performance Computing

Failure rates in high performance computers rapidly increase due to the ...
research
10/25/2017

Exhaustive Exploration of the Failure-oblivious Computing Search Space

High-availability of software systems requires automated handling of cra...
research
12/31/2020

Heterogeneous recovery from large scale power failures

Large-scale power failures are induced by nearly all natural disasters f...
research
05/26/2018

Evaluating Impact of Human Errors on the Availability of Data Storage Systems

In this paper, we investigate the effect of incorrect disk replacement s...

Please sign up or login with your details

Forgot password? Click here to reset