Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

12/14/2020
by   Li Xu, et al.
0

Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC variability is a challenging problem in the engineering of HPC systems and there is little statistical work on this problem to date. Although there are many methods available in the computer experiment literature, the applicability of existing methods to HPC performance variability needs investigation, especially, when the objective is to predict performance variability both in interpolation and extrapolation settings. A data analytic framework is developed to model data collected from large-scale experiments. Various promising methods are used to build predictive models for the variability of HPC systems. We evaluate the performance of the methods by measuring prediction accuracy at previously unseen system configurations. We also discuss a methodology for optimizing system configurations that uses the estimated variability map. The findings from method comparisons and developed tool sets in this paper yield new insights into existing statistical methods and can be beneficial for the practice of HPC variability management. This paper has supplementary materials online.

READ FULL TEXT

page 12

page 16

page 21

research
01/24/2022

Design Strategies and Approximation Methods for High-Performance Computing Variability Management

Performance variability management is an active research area in high-pe...
research
05/19/2022

Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

Although high-performance computing (HPC) systems have been scaled to me...
research
03/10/2020

In Datacenter Performance, The Only Constant Is Change

All computing infrastructure suffers from performance variability, be it...
research
03/21/2021

Understanding performance variability in standard and pipelined parallel Krylov solvers

In this work, we collect data from runs of Krylov subspace methods and p...
research
12/18/2018

A Preliminary Study of Neural Network-based Approximation for HPC Applications

Machine learning, as a tool to learn and model complicated (non)linear r...
research
02/22/2019

Online Anomaly Detection in HPC Systems

Reliability is a cumbersome problem in High Performance Computing System...
research
04/19/2017

Testing Docker Performance for HPC Applications

The main goal for this article is to compare performance penalties when ...

Please sign up or login with your details

Forgot password? Click here to reset