Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

09/29/2020
by   Christie L. Alappat, et al.
0

The A64FX CPU powers the current number one supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. We also identify architectural peculiarities and derive optimization hints. Applying the ECM model to sparse matrix-vector multiplication (SpMV), we motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format with suitable code optimizations can achieve bandwidth saturation for SpMV.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2021

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

The A64FX CPU is arguably the most powerful Arm-based processor design t...
research
12/23/2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...
research
12/14/2018

Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

Achieving high performance for sparse applications is challenging due to...
research
07/24/2021

An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

Field Programmable Gate Arrays generate algorithmic specific architectur...
research
05/24/2023

Model-Based Performance Analysis of the HyTeG Finite Element Framework

In this work, we present how code generation techniques significantly im...
research
04/02/2018

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Architectures with multiple classes of memory media are becoming a commo...
research
01/21/2020

Lattice QCD on a novel vector architecture

The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC's...

Please sign up or login with your details

Forgot password? Click here to reset