H2Opus: A distributed-memory multi-GPU software package for non-local operators

09/12/2021
by   Stefano Zampini, et al.
0

Hierarchical ℋ^2-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their O(N) complexity in both memory and operator application makes them particularly suited for large-scale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow large-scale problems to be represented. In this paper, we present high-performance, distributed-memory GPU-accelerated algorithms and implementations for matrix-vector multiplication and matrix recompression of hierarchical matrices in the ℋ^2 format. The algorithms are a new module of H2Opus, a performance-oriented package that supports a broad variety of ℋ^2-matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for inter-process communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show near-ideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrix-vector multiplication, and 670 Gflops/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigrid-preconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.

READ FULL TEXT

page 3

page 6

research
02/05/2019

Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression

Hierarchical matrices are space and time efficient representations of de...
research
03/17/2022

Batched matrix operations on distributed GPUs with application in theoretical physics

One of the most important and commonly used operations in many linear al...
research
06/20/2018

A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

In this work, we consider the solution of boundary integral equations by...
research
08/01/2016

A survey of sparse matrix-vector multiplication performance on large matrices

We contribute a third-party survey of sparse matrix-vector (SpMV) produc...
research
03/23/2021

A fast and oblivious matrix compression algorithm for Volterra integral operators

The numerical solution of dynamical systems with memory requires the eff...
research
06/18/2020

Kernel methods through the roof: handling billions of points efficiently

Kernel methods provide an elegant and principled approach to nonparametr...
research
07/29/2019

PyLops – A Linear-Operator Python Library for large scale optimization

Linear operators and optimisation are at the core of many algorithms use...

Please sign up or login with your details

Forgot password? Click here to reset