Explicit caching HYB: a new high-performance SpMV framework on GPGPU

04/13/2022
by   Chong Chen, et al.
0

Sparse Matrix-Vector Multiplication (SpMV) is a critical operation for the iterative solver of Finite Element Methods on computer simulation. Since the SpMV operation is a memory-bound algorithm, the efficiency of data movements heavily influenced the performance of the SpMV on GPU. In recent years, many research is conducted in accelerating the performance of SpMV on the graphic processing units (GPU). The performance optimization methods used in existing studies focus on the following areas: improve the load balancing between GPU processors, and reduce the execution divergence between GPU threads. Although some studies have made preliminary optimization on the input vector fetching, the effect of explicitly caching the input vector on GPU base SpMV has not been studied in depth yet. In this study, we are trying to minimize the data movements cost for GPU-based SpMV using a new framework named "explicit caching Hybrid (EHYB)". The EHYB framework achieved significant performance improvement by using the following methods: 1. Improve the speed of data movements by partitioning and explicitly caching the input vector to the shared memory of the CUDA kernel. 2. Reduce the volume of data movements by storing the major part of the column index with a compact format. We tested our implementation with sparse matrices derived from FEM applications in different areas. The experiment results show that our implementation can overperform the state-of-the-arts implementation with significant speedup, and leads to higher FLOPs than the theoryperformance up-boundary of the existing GPU-based SpMV implementations.

READ FULL TEXT

page 9

page 10

page 11

research
11/07/2022

AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices

Sparse Matrix-Vector multiplication (SpMV) is an essential computational...
research
06/10/2019

On the performance of various parallel GMRES implementations on CPU and GPU clusters

As the need for computational power and efficiency rises, parallel syste...
research
04/05/2022

Persistent Kernels for Iterative Memory-bound GPU Applications

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU ...
research
05/06/2016

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of t...
research
02/02/2022

Giga-scale Kernel Matrix Vector Multiplication on GPU

Kernel matrix-vector multiplication (KMVM) is a foundational operation i...
research
09/30/2019

Optimizing GPU Cache Policies for MI Workloads

In recent years, machine intelligence (MI) applications have emerged as ...
research
04/20/2018

CUDA Support in GNA Data Analysis Framework

Usage of GPUs as co-processors is a well-established approach to acceler...

Please sign up or login with your details

Forgot password? Click here to reset