Parallel Algorithms for Successive Convolution

07/06/2020
by   Andrew J. Christlieb, et al.
0

In this work, we consider alternative discretizations for PDEs which use expansions involving integral operators to approximate spatial derivatives. These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of 𝒪(N), where N is the number of mesh points along a direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial CFL condition that enforces a nearest-neighbor communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate > 1×10^8 DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.

READ FULL TEXT

page 15

page 19

page 20

page 21

page 33

research
08/24/2023

Experience with Distributed Memory Delaunay-based Image-to-Mesh Conversion Implementation

This paper presents some of our findings on the scalability of parallel ...
research
04/01/2022

Computational stability analysis of PDEs with integral terms using the PIE framework

The Partial Integral Equation (PIE) framework was developed to computati...
research
09/09/2020

KNN-DBSCAN: a DBSCAN in high dimensions

Clustering is a fundamental task in machine learning. One of the most su...
research
03/02/2020

High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for va...
research
09/12/2019

The Fast and Free Memory Method for the efficient computation of convolution kernels

We introduce the Fast Free Memory method (FFM), a new fast method for th...
research
02/10/2015

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

In this era of large-scale data, distributed systems built on top of clu...
research
03/02/2018

Specialized Interior Point Algorithm for Stable Nonlinear System Identification

Estimation of nonlinear dynamic models from data poses many challenges, ...

Please sign up or login with your details

Forgot password? Click here to reset