On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

07/07/2020
by   David B. Williams-Young, et al.
0

The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.

READ FULL TEXT

page 8

page 18

research
10/30/2020

DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia

The demand for high-performance computing (HPC) is ever-increasing for e...
research
05/13/2020

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

In this report, I discuss the history and current state of GPU HPC syste...
research
08/23/2022

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems

Scientists are increasingly exploring and utilizing the massive parallel...
research
12/20/2021

Fast and Green Computing with Graphics Processing Units for solving Sparse Linear Systems

In this paper, we aim to introduce a new perspective when comparing high...
research
08/26/2020

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

Performance optimization can be a daunting task especially as the hardwa...
research
01/10/2023

GPU-based high-precision orbital propagation of large sets of initial conditions through Picard-Chebyshev augmentation

The orbital propagation of large sets of initial conditions under high a...

Please sign up or login with your details

Forgot password? Click here to reset