Design of an energy aware petaflops class high performance cluster based on power architecture

07/11/2023
by   W. A. Ahmad, et al.
0

In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100 Gb/s networking) plus custom hardware and an innovative system middleware software. D.A.V.I.D.E. features (i) a dedicated power monitor interface, built around the BeagleBone Black Board that allows high frequency sampling directly from the power backplane and scalable integration with the internal node telemetry and system level power management software; (ii) a custom-built chassis, based on OpenRack form factor, and liquid cooling that allows the system to be used in modern, energy efficient, datacenter; (iii) software components designed for enabling fine grain power monitoring, power management (i.e. power capping and energy aware job scheduling) and application power profiling, based on dedicated machine learning components. Software APIs are offered to developers and users to tune the computing node performance and power consumption around on the application requirements. The first pilot system that we will deploy at the beginning of 2017, will demonstrate key HPC applications from different fields ported and optimized for this innovative platform.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
05/03/2023

Prediction of Performance and Power Consumption of GPGPU Applications

Graphics Processing Units (GPUs) have become an integral part of High-Pe...
research
06/07/2018

Dwarf in a Giant: Enabling Scalable, High-Resolution HPC Energy Monitoring for Real-Time Profiling and Analytics

Energy efficiency, predictive maintenance and security are today key cha...
research
09/28/2021

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being in...
research
04/14/2018

Optimization-in-the-Loop for Energy-Efficient 5G

We consider the problem of energy-efficient network management in 5G sys...
research
05/02/2018

Energy-Optimal Configurations for Single-Node HPC Applications

Energy efficiency is a growing concern for modern computing, especially ...
research
06/16/2023

A Testbed for Carbon-Aware Applications and Systems

To mitigate the growing carbon footprint of computing systems, there has...
research
05/17/2022

Cluster on Wheels

This paper presents a very compact 16-node cluster that is the core of a...

Please sign up or login with your details

Forgot password? Click here to reset