Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

11/23/2015
by   Bogdan Oancea, et al.
0

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative methods which computes only an approximation of the solution. In this paper we present an implementation of a library that uses a hybrid model of computation using MPI and CUDA implementing both direct and iterative linear systems solvers. Our library implements LU and Cholesky factorization based solvers and some of the non-stationary iterative methods using the MPI/CUDA combination. We compared the performance of our MPI/CUDA implementation with classic programs written to be run on a single CPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

The PetscSF Scalable Communication Layer

PetscSF, the communication component of the Portable, Extensible Toolkit...
research
02/04/2019

Blaze: Simplified High Performance Cluster Computing

MapReduce and its variants have significantly simplified and accelerated...
research
10/11/2017

Subdomain Deflation Combined with Local AMG: a Case Study Using AMGCL Library

The paper proposes a combination of the subdomain deflation method and l...
research
09/01/2021

Accelerating an Iterative Eigensolver for Nuclear Structure Configuration Interaction Calculations on GPUs using OpenACC

To accelerate the solution of large eigenvalue problems arising from man...
research
05/30/2018

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Apache Spark is a popular system aimed at the analysis of large data set...
research
12/07/2019

Scalable Algorithms for High Order Approximations on Compact Stencils

The recent development of parallel technologies on modern desktop comput...
research
09/19/2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system

We evaluate using Julia as a single language and ecosystem paradigm powe...

Please sign up or login with your details

Forgot password? Click here to reset