Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes

10/16/2018
by   Anne Reinarz, et al.
0

Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.

READ FULL TEXT
research
10/10/2020

A posteriori subcell finite volume limiter for general PNPM schemes: applications from gasdynamics to relativistic magnetohydrodynamics

In this work, we consider the general family of the so called ADER PNPM ...
research
03/12/2021

FT-GCR: a fault-tolerant generalized conjugate residual elliptic solver

With the steady advance of high performance computing systems featuring ...
research
01/02/2022

Visilence: An Interactive Visualization Tool for Error Resilience Analysis

Soft errors have become one of the major concerns for HPC applications, ...
research
10/25/2017

A Pattern Language for High-Performance Computing Resilience

High-performance computing systems (HPC) provide powerful capabilities f...
research
08/03/2018

Characterization and Comparison of Application Resilience for Serial and Parallel Executions

Soft error of exascale application is a challenge problem in modern HPC....
research
02/18/2022

Lightweight Soft Error Resilience for In-Order Cores

Acoustic-sensor-based soft error resilience is particularly promising, s...

Please sign up or login with your details

Forgot password? Click here to reset