Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach

07/06/2021
by   Eric Rutten, et al.
0

Production high-performance computing systems continue to grow in complexity and size. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Alongside the growing complexity of scientific workloads, this extreme heterogeneity is also an opportunity: as applications dynamically undergo variations in workload, due to phases or data/compute movement between devices, one can dynamically adjust power across compute elements to save energy without impacting performance. With an aim toward an autonomous and dynamic power management strategy for current and future HPC architectures, this paper explores the use of control theory for the design of a dynamic power regulation method. Structured as a feedback loop, our approach-which is novel in computing resource management-consists of periodically monitoring application progress and choosing at runtime a suitable power cap for processors. Thanks to a preliminary offline identification process, we derive a model of the dynamics of the system and a proportional-integral (PI) controller. We evaluate our approach on top of an existing resource management framework, the Argo Node Resource Manager, deployed on several clusters of Grid'5000, using a standard memory-bound HPC benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes

As Exascale computing becomes a reality, the energy needs of compute nod...
research
09/29/2016

DynIMS: A Dynamic Memory Controller for In-memory Storage on HPC Systems

In order to boost the performance of data-intensive computing on HPC sys...
research
08/22/2020

Online Adaptive Learning for Runtime Resource Management of Heterogeneous SoCs

Dynamic resource management has become one of the major areas of researc...
research
10/03/2018

Robust online identification of thermal models for in-production HPC clusters with machine learning-based data selection

Power and thermal management are critical components of high performance...
research
09/26/2018

dynamicMF: A Matrix Factorization Approach to Monitor Resource Usage in High Performance Computing Systems

High performance computing (HPC) facilities consist of a large number of...
research
05/11/2023

A Data-Driven Approach to Lightweight DVFS-Aware Counter-Based Power Modeling for Heterogeneous Platforms

Computing systems have shifted towards highly parallel and heterogeneous...
research
04/25/2018

Challenges Towards Deploying Data Intensive Scientific Applications on Extreme Heterogeneity Supercomputers

Shrinking transistors, which powered the advancement of computing in the...

Please sign up or login with your details

Forgot password? Click here to reset