Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared Memory

06/30/2020
by   Mohammad Shafaet Islam, et al.
0

High fidelity scientific simulations modeling physical phenomena typically require solving large linear systems of equations which result from discretization of a partial differential equation (PDE) by some numerical method. This step often takes a vast amount of computational time to complete, and therefore presents a bottleneck in simulation work. Solving these linear systems efficiently requires the use of massively parallel hardware with high computational throughput, as well as the development of algorithms which respect the memory hierarchy of these hardware architectures to achieve high memory bandwidth. In this paper, we present an algorithm to accelerate Jacobi iteration for solving structured problems on graphics processing units (GPUs) using a hierarchical approach in which multiple iterations are performed within on-chip shared memory every cycle. A domain decomposition style procedure is adopted in which the problem domain is partitioned into subdomains whose data is copied to the shared memory of each GPU block. Jacobi iterations are performed internally within each block's shared memory, avoiding the need to perform expensive global memory accesses every step. We test our algorithm on the linear systems arising from discretization of Poisson's equation in 1D and 2D, and observe speedup in convergence using our shared memory approach compared to a traditional Jacobi implementation which only uses global memory on the GPU. We observe a x8 speedup in convergence in the 1D problem and a nearly x6 speedup in the 2D case from the use of shared memory compared to a conventional GPU approach.

READ FULL TEXT
research
12/29/2020

Scalable Parallel Linear Solver for Compact Banded Systems on Heterogeneous Architectures

A scalable algorithm for solving compact banded linear systems on distri...
research
03/12/2018

Effective Implementation of GPU-based Revised Simplex algorithm applying new memory management and cycle avoidance strategies

Graphics Processing Units (GPUs) with high computational capabilities us...
research
11/14/2018

Applying the swept rule for explicit partial differential equation solutions on heterogeneous computing systems

Applications that exploit the architectural details of high performance ...
research
08/24/2018

Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen's algorithm (Strassen) typi...
research
12/08/2016

An initial investigation of the performance of GPU-based swept time-space decomposition

Simulations of physical phenomena are essential to the expedient design ...
research
07/17/2023

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Scaling Transformers to longer sequence lengths has been a major problem...
research
01/13/2023

The Numerical Flow Iteration for the Vlasov-Poisson equation

We present the numerical flow iteration (NuFI) for solving the Vlasov–Po...

Please sign up or login with your details

Forgot password? Click here to reset