Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures

05/29/2018
by   Shizhao Chen, et al.
0

Sparse matrix vector multiplication (SpMV) is one of the most common operations in scientific and high-performance applications, and is often responsible for the application performance bottleneck. While the sparse matrix representation has a significant impact on the resulting application performance, choosing the right representation typically relies on expert knowledge and trial and error. This paper provides the first comprehensive study on the impact of sparse matrix representations on two emerging many-core architectures: the Intel's Knights Landing (KNL) XeonPhi and the ARM-based FT-2000Plus (FTP). Our large-scale experiments involved over 9,500 distinct profiling runs performed on 956 sparse datasets and five mainstream SpMV representations. We show that the best sparse matrix representation depends on the underlying architecture and the program input. To help developers to choose the optimal matrix representation, we employ machine learning to develop a predictive model. Our model is first trained offline using a set of training examples. The learned model can be used to predict the best matrix representation for any unseen input for a given architecture. We show that our model delivers on average 95 and FTP respectively, and it achieves this with no runtime profiling overhead.

READ FULL TEXT

page 3

page 5

page 7

research
01/09/2018

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has application...
research
11/20/2019

Characterizing Scalability of Sparse Matrix-Vector Multiplications on Phytium FT-2000+ Many-cores

Understanding the scalability of parallel programs is crucial for softwa...
research
06/30/2020

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Despite numerous efforts for optimizing the performance of Sparse Matrix...
research
04/14/2023

SpChar: Characterizing the Sparse Puzzle via Decision Trees

Sparse matrix computation is crucial in various modern applications, inc...
research
03/05/2020

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach

This article presents an automatic approach to quickly derive a good sol...
research
11/15/2017

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

This paper presents a low-overhead optimizer for the ubiquitous sparse m...
research
02/08/2018

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach

Many-core accelerators, as represented by the XeonPhi coprocessors and G...

Please sign up or login with your details

Forgot password? Click here to reset