DeepAI AI Chat
Log In Sign Up

A 3D Parallel Algorithm for QR Decomposition

05/14/2018
by   Grey Ballard, et al.
Wake Forest University
Inria
NYU college
berkeley college
Berkeley Lab
0

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/23/2016

Node Aware Sparse Matrix-Vector Multiplication

The sparse matrix-vector multiply (SpMV) operation is a key computationa...
10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...
12/17/2017

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

Parallel computing has played an important role in speeding up convex op...
07/30/2020

A Core Calculus for Static Latency Tracking with Placement Types

Developing efficient geo-distributed applications is challenging as prog...
02/11/2020

Parallel Direct Domain Decomposition Methods (D3M) for Finite Elements

A parallel direct solution approach based on domain decomposition method...
05/27/2019

Parallel and Communication Avoiding Least Angle Regression

We are interested in parallelizing the Least Angle Regression (LARS) alg...
06/28/2019

Polynomial Preconditioned GMRES to Reduce Communication in Parallel Computing

Polynomial preconditioning with the GMRES minimal residual polynomial ha...