Performance Models for Data Transfers: A Case Study with Molecular Chemistry Kernels

04/15/2019
by   Suraj Kumar, et al.
0

With increasing complexity of hardwares, systems with different memory nodes are ubiquitous in High Performance Computing (HPC). It is paramount to develop strategies to overlap the data transfers between memory nodes with computations in order to exploit the full potential of these systems. In this article, we consider the problem of deciding the order of data transfers between two memory nodes for a set of independent tasks with the objective to minimize the makespan. We prove that with limited memory capacity, obtaining the optimal order of data transfers is a NP-complete problem. We propose several heuristics for this problem and provide details about their favorable situations. We present an analysis of our heuristics on traces, obtained by running 2 molecular chemistry kernels, namely, Hartree-Fock (HF) and Coupled Cluster Single Double (CCSD) on 10 nodes of an HPC system. Our results show that some of our heuristics achieve significant overlap for moderate memory capacities and are very close to the lower bound of makespan.

READ FULL TEXT

page 13

page 17

page 25

research
06/12/2019

Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets

High-performance computing (HPC) requires resilience techniques such as ...
research
01/18/2019

Exploiting OpenMP & OpenACC to Accelerate a Molecular Docking Mini-App in Heterogeneous HPC Nodes

In drug discovery, molecular docking is the task in charge of estimating...
research
04/17/2023

Effective implementation of the High Performance Conjugate Gradient benchmark on GraphBLAS

Applications in High-Performance Computing (HPC) environments face chall...
research
03/20/2020

New heuristics for burning graphs

The concept of graph burning and burning number (bn(G)) of a graph G was...
research
08/03/2022

The Case for Non-Volatile RAM in Cloud HPCaaS

HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud...
research
11/16/2010

Fast GPGPU Data Rearrangement Kernels using CUDA

Many high performance-computing algorithms are bandwidth limited, hence ...
research
07/14/2019

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

This paper proposes a versatile high-performance execution model, inspir...

Please sign up or login with your details

Forgot password? Click here to reset