High Accuracy Low Precision QR Factorization and Least Square Solver on GPU with TensorCore

12/11/2019
by   Shaoshuai Zhang, et al.
0

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized special units such as NVIDIA TensorCore on GPU and Google Tensor Processing Unit (TPU) that does half precision matrix-matrix multiplication exceptionally efficiently. In this paper we present a large scale mixed precision linear least square solver that achieves high accuracy using the low precision TensorCore GPU. The mixed precision system consists of both innovative algorithms and implementations, and is shown to be up to 14x faster than single precision cuSOLVER at QR matrix factorization at large scale with slightly lower accuracy, and up to 10x faster than double precision direct QR least square solver with comparable accuracy.

READ FULL TEXT
research
08/03/2022

A Hybrid Factorization Algorithm for Sparse Matrix with Mixed Precision Arithmetic

A new hybrid algorithm for LDU-factorization for large sparse matrix com...
research
01/14/2019

Faster arbitrary-precision dot product and matrix multiplication

We present algorithms for real and complex dot product and matrix multip...
research
10/14/2015

Sapporo2: A versatile direct N-body library

Astrophysical direct N-body methods have been one of the first productio...
research
03/02/2021

Square Root Bundle Adjustment for Large-Scale Reconstruction

We propose a new formulation for the bundle adjustment problem which rel...
research
11/03/2020

Improving the Performance of the GMRES Method using Mixed-Precision Techniques

The GMRES method is used to solve sparse, non-symmetric systems of linea...
research
08/21/2023

Hierarchical Lowrank Arithmetic with Binary Compression

With lowrank approximation the storage requirements for dense data are r...
research
08/01/2023

Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

High-performance GPU-accelerated particle filter methods are critical fo...

Please sign up or login with your details

Forgot password? Click here to reset