
Parallel solution of saddle point systems with nested iterative solvers based on the GolubKahan Bidiagonalization
We present a scalability study of GolubKahan bidiagonalization for the ...
read it

Accelerating linear solvers for Stokes problems with C++ metaprogramming
The efficient solution of large sparse saddle point systems is very impo...
read it

Highorder matrixfree incompressible flow solvers with GPU acceleration and loworder refined preconditioners
We present a matrixfree flow solver for highorder finite element discr...
read it

Textbook efficiency: massively parallel matrixfree multigrid for the Stokes system
We employ textbook multigrid efficiency (TME), as introduced by Achi Bra...
read it

AMG preconditioners for Linear Solvers towards Extreme Scale
Linear solvers for large and sparse systems are a key element of scienti...
read it

Analytical Estimation of the Scalability of Iterative Numerical Algorithms on Distributed Memory Multiprocessors
This article presents a new highlevel parallel computational model name...
read it

Recent Developments in Iterative Methods for Reducing Synchronization
On modern parallel architectures, the cost of synchronization among proc...
read it
A quantitative performance analysis for Stokes solvers at the extreme scale
This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to 786 432 parallel threads. A genuine multigrid method for the saddle point system using an Uzawatype smoother provides the best overall performance with respect to memory consumption and timetosolution. The largest system solved on a Blue Gene/Q system has more than ten trillion (1.1 · 10 ^13) unknowns and requires about 13 minutes compute time. Despite the matrix free and highly optimized implementation, the memory requirement for the solution vector and the auxiliary vectors is about 200 TByte. Brandt's notion of "textbook multigrid efficiency" is employed to study the algorithmic performance of iterative solvers. A recent extension of this paradigm to "parallel textbook multigrid efficiency" makes it possible to assess also the efficiency of parallel iterative solvers for a given hardware architecture in absolute terms. The efficiency of the method is demonstrated for simulating incompressible fluid flow in a pipe filled with spherical obstacles.
READ FULL TEXT
Comments
There are no comments yet.