Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems

01/09/2023
by   Jieyang Chen, et al.
0

One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the key components in scientific computing in many different fields. Although their design has been highly optimized for modern processors, they still consume a considerable amount of energy. As CPU-GPU heterogeneous systems are commonly used for matrix decompositions, in this work, we aim to further improve the energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems. We first build an Algorithm-Based Fault Tolerance protected overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking for key matrix decomposition operations. Then, we design an energy-saving matrix decomposition framework, Bi-directional Slack Reclamation(BSR), that can intelligently combine the capability provided by ABFT-OC and DVFS to maximize energy saving and maintain performance and reliability. Experiments show that BSR is able to save up to 11.7 energy saving optimization approach with no performance degradation and up to 14.1 performance-energy trade-off, which is able to provide up to 1.43x performance improvement without costing extra energy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2017

Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors

In addition to hardware wall-time restrictions commonly seen in high-per...
research
03/08/2017

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

Energy efficiency is becoming increasingly important for computing syste...
research
05/10/2021

Effective Methods of QR-Decompositions of Square Complex Matrices by Fast Discrete Signal-Induced Heap Transforms

The purpose of this work is to present an effective tool for computing d...
research
06/03/2018

Scaling Up Large-Scale Graph Processing for GPU-Accelerated Heterogeneous Systems

Not only with the large host memory for supporting large scale graph pro...
research
10/05/2021

Efficient GPU implementation of randomized SVD and its applications

Matrix decompositions are ubiquitous in machine learning, including appl...
research
07/12/2019

Equal bi-Vectorized (EBV) method to high performance on GPU

Due to importance of reducing of time solution in numerical codes, we pr...
research
07/20/2019

NNS: The Case For Neural Network-based Sorting

CPU-SIMD/GPU/TPUs will be increasingly powerful. The algorithm using neu...

Please sign up or login with your details

Forgot password? Click here to reset