Batched matrix operations on distributed GPUs with application in theoretical physics

03/17/2022
by   Nenad Mijić, et al.
0

One of the most important and commonly used operations in many linear algebra functions is matrix-matrix multiplication (GEMM), which is also a key component in obtaining high performance of many scientific codes. It is a computationally intensive function requiring O(n^3) operations, and its high computational intensity makes it well-suited to be significantly accelerated with GPUs. Today, many research problems require solving a very large number of relatively small GEMM operations that cannot utilise the entire GPU. To overcome this bottleneck, special functions have been developed that pack several GEMM operations into one and then compute them simultaneously on a GPU, which is called a batch operation. In this research work, we have proposed a different approach based on linking multiple GEMM operations to MPI ranks and then binding multiple MPI ranks to a single GPU. To increase GPU utilisation, more MPI ranks (i.e. GEMM operations) are added. We implement and test this approach in the field of theoretical physics to compute entanglement properties through simulated annealing Monte Carlo simulation of quantum spin chains. For the specific use case, we were able to simulate a much larger spin system and achieve a speed-up of up to 35× compared to the parallel CPU-only version.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2021

H2Opus: A distributed-memory multi-GPU software package for non-local operators

Hierarchical ℋ^2-matrices are asymptotically optimal representations for...
research
09/20/2023

Matrix-based implementation and GPU acceleration of linearized ordinary state-based peridynamic models in MATLAB

Ordinary state-based peridynamic (OSB-PD) models have an unparalleled ca...
research
09/01/2021

Accelerating an Iterative Eigensolver for Nuclear Structure Configuration Interaction Calculations on GPUs using OpenACC

To accelerate the solution of large eigenvalue problems arising from man...
research
03/04/2022

Machine Learning for CUDA+MPI Design Rules

We present a new strategy for automatically exploring the design space o...
research
03/27/2019

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

Graph Convolutional Networks (GCNs) are recently getting much attention ...
research
09/20/2023

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Realistic reservoir simulation is known to be prohibitively expensive in...
research
03/27/2020

Dielectric breakdown prediction with GPU-accelerated BEM

The prediction of a dielectric breakdown in a high-voltage device is bas...

Please sign up or login with your details

Forgot password? Click here to reset