Distributed Parallelization of xPU Stencil Computations in Julia

11/28/2022
by   Samuel Omlin, et al.
0

We present a straightforward approach for distributed parallelization of stencil-based xPU applications on a regular staggered grid, which is instantiated in the package ImplicitGlobalGrid.jl. The approach allows to leverage remote direct memory access and enables close to ideal weak scaling of real-world applications on thousands of GPUs. The communication costs can be easily hidden behind computation.

READ FULL TEXT

page 1

page 2

research
12/11/2020

A fine-grained parallelization of the immersed boundary method

We present new algorithms for the parallelization of Eulerian-Lagrangian...
research
05/08/2017

Block-Parallel IDA* for GPUs (Extended Manuscript)

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*...
research
05/25/2018

ChASE: Chebyshev Accelerated Subspace iteration Eigensolver for sequences of Hermitian eigenvalue problems

Solving dense Hermitian eigenproblems arranged in a sequence with direct...
research
06/30/2021

A parallel fast multipole method for a space-time boundary element method for the heat equation

We present a novel approach to the parallelization of the parabolic fast...
research
03/15/2022

Distributed-Memory Sparse Kernels for Machine Learning

Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times...
research
03/15/2019

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism

Scaling CNN training is necessary to keep up with growing datasets and r...
research
09/10/2020

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

All-pairs compute problems apply a user-defined function to each combina...

Please sign up or login with your details

Forgot password? Click here to reset