On the performance of various parallel GMRES implementations on CPU and GPU clusters

06/10/2019
by   E. I. Ioannidis, et al.
0

As the need for computational power and efficiency rises, parallel systems become increasingly popular among various scientific fields. While multiple core-based architectures have been the center of attention for many years, the rapid development of general purposes GPU-based architectures takes high performance computing to the next level. In this work, different implementations of a parallel version of the preconditioned GMRES - an established iterative solver for large and sparse linear equation sets - are presented, each of them on different computing architectures: From distributed and shared memory core-based to GPU-based architectures. The computational experiments emanate from the dicretization of a benchmark boundary value problem with the finite element method. Major advantages and drawbacks of the various implementations are addressed in terms of parallel speedup, execution time and memory issues. Among others, comparison of the results in the different architectures, show the high potentials of GPU-based architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2018

The performances of R GPU implementations of the GMRES method

Although the performance of commodity computers has improved drastically...
research
11/05/2020

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

This paper wants to focus on providing a characterization of the runtime...
research
06/15/2020

Solving the Bethe-Salpeter equation on massively parallel architectures

The last ten years have witnessed fast spreading of massively parallel c...
research
04/13/2022

Explicit caching HYB: a new high-performance SpMV framework on GPGPU

Sparse Matrix-Vector Multiplication (SpMV) is a critical operation for t...
research
04/06/2012

Efficient computational noise in GLSL

We present GLSL implementations of Perlin noise and Perlin simplex noise...
research
09/08/2020

GPU Parallel Computation of Morse-Smale Complexes

The Morse-Smale complex is a well studied topological structure that rep...
research
11/20/2019

Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers

The minimum distance of a linear code is a key concept in information th...

Please sign up or login with your details

Forgot password? Click here to reset