DeepAI AI Chat
Log In Sign Up

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

by   Wajih Halim Boukaram, et al.

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.


A Hierarchical Singular Value Decomposition Algorithm for Low Rank Matrices

Singular value decomposition (SVD) is a widely used technique for dimens...

Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression

Hierarchical matrices are space and time efficient representations of de...

Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs

A parallel, blocked, one-sided Hari–Zimmermann algorithm for the general...

Matrix compression along isogenic blocks

A matrix-compression algorithm is derived from a novel isogenic block de...

Equal bi-Vectorized (EBV) method to high performance on GPU

Due to importance of reducing of time solution in numerical codes, we pr...

H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

Tile low rank representations of dense matrices partition them into bloc...

Language model compression with weighted low-rank factorization

Factorizing a large matrix into small matrices is a popular strategy for...