A Hybrid MPI-CUDA Approach for Nonequispaced Discrete Fourier Transformation

01/01/2020
by   Sheng-Chun Yang, et al.
0

Nonequispaced discrete Fourier transformation (NDFT) is widely applied in all aspects of computational science and engineering. The computational efficiency and accuracy of NDFT has always been a critical issue in hindering its comprehensive applications both in intensive and in extensive aspects of scientific computing. In our previous work (2018, S.-C. Yang et al., Appl. Comput. Harmon. Anal. 44, 273), a CUNFFT method was proposed and it shown outstanding performance in handling NDFT at intermediate scale based on CUDA (Compute Unified Device Architecture) technology. In the current work, we further improved the computational efficiency of the CUNTTF method using an efficient MPI-CUDA hybrid parallelization (HP) scheme of NFFT to achieve a cutting-edge treatment of NDFT at super extended scale. Within this HP-NFFT method, the spatial domain of NDFT is decomposed into several parts according to the accumulative feature of NDFT and the detailed number of CPU and GPU nodes. These decomposed NDFT subcells are independently calculated on different CPU nodes using a MPI process-level parallelization mode, and on different GPU nodes using a CUDA threadlevel parallelization mode and CUNFFT algorithm. A massive benchmarking of the HP-NFFT method indicates that this method exhibit a dramatic improvement in computational efficiency for handling NDFT at super extended scale without loss of computational precision. Furthermore, the HP-NFFT method is validated via the calculation of Madelung constant of fluorite crystal structure, and thereafter verified that this method is robust for the calculation of electrostatic interactions between charged ions in molecular dynamics simulation systems.

READ FULL TEXT

page 3

page 4

page 5

page 6

research
11/22/2022

Improved Multi-GPU parallelization of a Lagrangian Transport Model

This report highlights our work on improving GPU parallelization by supp...
research
10/26/2020

Parallelizing multiple precision Taylor series method for integrating the Lorenz system

A hybrid MPI+OpenMP strategy for parallelizing multiple precision Taylor...
research
08/27/2021

Optimizing the hybrid parallelization of BHAC

We present our experience with the modernization on the GR-MHD code BHAC...
research
12/02/2022

Fast gap-filling of massive data by local-equilibrium conditional simulations on GPU

The ever-growing size of modern space-time data sets, such as those coll...
research
08/25/2019

OpenMP parallelization of multiple precision Taylor series method

OpenMP parallelization of multiple precision Taylor series method is pro...
research
10/31/2017

Performance Optimization and Parallelization of a Parabolic Equation Solver in Computational Ocean Acoustics on Modern Many-core Computer

As one of open-source codes widely used in computational ocean acoustics...
research
03/10/2015

Parallel Statistical Multi-resolution Estimation

We discuss several strategies to implement Dykstra's projection algorith...

Please sign up or login with your details

Forgot password? Click here to reset