Fast parallel multidimensional FFT using advanced MPI

04/25/2018
by   Lisandro Dalcin, et al.
0

We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. Traditional methods use standard all-to-all collective communication of contiguous memory buffers, thus necessary requiring local data realignment steps intermixed in-between redistribution and transform steps. Instead, our method takes advantage of subarray datatypes and generalized all-to-all scatter/gather from the MPI-2 standard to communicate discontiguous memory buffers, effectively eliminating the need for local data realignments. Despite generalized all-to-all communication of discontiguous data being generally slower, our proposal economizes in local work. For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT, and 2DECOMP FFT. We provide compact routines implemented at the highest possible level using the MPI bindings for the C programming language. These routines apply to any global redistribution, over any two directions of a multidimensional array, decomposed on arbitrary Cartesian processor grids (1D slabs, 2D pencils, or even higher-dimensional decompositions). The high level implementation makes the code easy to read, maintain, and eventually extend. Our approach enables for future speedups from optimizations in the internal datatype handling engines within MPI implementations.

READ FULL TEXT

page 9

page 15

research
03/22/2022

Minimizing communication in the multidimensional FFT

We present a parallel algorithm for the fast Fourier transform (FFT) in ...
research
09/25/2019

Extending the Message Passing Interface (MPI) with User-Level Schedules

Composability is one of seven reasons for the long-standing and continui...
research
04/12/2018

A high-level C++ approach to manage local errors, asynchrony and faults in an MPI application

C++ advocates exceptions as the preferred way to handle unexpected behav...
research
11/12/2020

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Asynchronous programming models (APM) are gaining more and more traction...
research
05/20/2022

(Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI

We give a fast(er), communication-free, parallel construction of optimal...
research
06/02/2023

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

Irregular communication often limits both the performance and scalabilit...
research
08/22/2019

Network-Accelerated Non-Contiguous Memory Transfers

Applications often communicate data that is non-contiguous in the send- ...

Please sign up or login with your details

Forgot password? Click here to reset