A 3D Parallel Algorithm for QR Decomposition

05/14/2018
by   Grey Ballard, et al.
0

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2016

Node Aware Sparse Matrix-Vector Multiplication

The sparse matrix-vector multiply (SpMV) operation is a key computationa...
research
10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...
research
12/17/2017

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

Parallel computing has played an important role in speeding up convex op...
research
09/30/2020

Communication-Optimal Parallel Standard and Karatsuba Integer Multiplication in the Distributed Memory Model

We present COPSIM a parallel implementation of standard integer multipli...
research
02/11/2020

Parallel Direct Domain Decomposition Methods (D3M) for Finite Elements

A parallel direct solution approach based on domain decomposition method...
research
05/27/2019

Parallel and Communication Avoiding Least Angle Regression

We are interested in parallelizing the Least Angle Regression (LARS) alg...
research
07/30/2020

A Core Calculus for Static Latency Tracking with Placement Types

Developing efficient geo-distributed applications is challenging as prog...

Please sign up or login with your details

Forgot password? Click here to reset