High-performance xPU Stencil Computations in Julia

11/28/2022
by   Samuel Omlin, et al.
0

We present an efficient approach for writing architecture-agnostic parallel high-performance stencil computations in Julia, which is instantiated in the package ParallelStencil.jl. Powerful metaprogramming, costless abstractions and multiple dispatch enable writing a single code that is suitable for both productive prototyping on a single CPU thread and production runs on multi-GPU or CPU workstations or supercomputers. We demonstrate performance close to the theoretical upper bound on GPUs for a 3-D heat diffusion solver, which is a massive improvement over reachable performance with CUDA.jl Array programming.

READ FULL TEXT

page 1

page 2

research
09/15/2020

Term Rewriting on GPUs

We present a way to implement term rewriting on a GPU. We do this by let...
research
08/01/2016

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

We have developed a task-parallel runtime system, called TREES, that is ...
research
11/20/2022

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

The simplex algorithm has been successfully used for many years in solvi...
research
10/30/2020

DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia

The demand for high-performance computing (HPC) is ever-increasing for e...
research
07/27/2020

HeAT – a Distributed and GPU-accelerated Tensor Framework for Data Analytics

To cope with the rapid growth in available data, the efficiency of data ...
research
08/09/2023

__host__ __device__ – Generic programming in Cuda

We present patterns for Cuda/C++ to write save generic code which works ...
research
05/21/2020

Signal Processing for a Reverse-GPS Wildlife Tracking System: CPU and GPU Implementation Experiences

We present robust high-performance implementations of signal-processing ...

Please sign up or login with your details

Forgot password? Click here to reset