Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

by   Christie L. Alappat, et al.

The A64FX CPU powers the current number one supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. We also identify architectural peculiarities and derive optimization hints. Applying the ECM model to sparse matrix-vector multiplication (SpMV), we motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format with suitable code optimizations can achieve bandwidth saturation for SpMV.



There are no comments yet.


page 1

page 2

page 3

page 4


ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

The A64FX CPU is arguably the most powerful Arm-based processor design t...

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...

Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format

Due to ill performance on many devices, sparse matrix-vector multiplicat...

Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

Achieving high performance for sparse applications is challenging due to...

An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

Field Programmable Gate Arrays generate algorithmic specific architectur...

Lattice QCD on a novel vector architecture

The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC's...

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Architectures with multiple classes of memory media are becoming a commo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.