High-performance xPU Stencil Computations in Julia
We present an efficient approach for writing architecture-agnostic parallel high-performance stencil computations in Julia, which is instantiated in the package ParallelStencil.jl. Powerful metaprogramming, costless abstractions and multiple dispatch enable writing a single code that is suitable for both productive prototyping on a single CPU thread and production runs on multi-GPU or CPU workstations or supercomputers. We demonstrate performance close to the theoretical upper bound on GPUs for a 3-D heat diffusion solver, which is a massive improvement over reachable performance with CUDA.jl Array programming.
READ FULL TEXT