Short reasons for long vectors in HPC CPUs: a study based on RISC-V

09/13/2023
by   Pablo Vizcaino, et al.
0

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction. The optimal vector width and its impact on CPU throughput due to memory latency and bandwidth remain challenging research areas. This study examines the behavior of four computational kernels on a RISC-V core connected to a customizable vector unit, capable of operating up to 256 double precision elements per instruction. The four codes have been purposefully selected to represent non-dense workloads: SpMV, BFS, PageRank, FFT. The experimental setup allows us to measure their performance while varying the vector length, the memory latency, and bandwidth. Our results not only show that larger vector lengths allow for better tolerance of limitations in the memory subsystem but also offer hope to code developers beyond dense linear algebra.

READ FULL TEXT
research
10/22/2018

Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

Among the (uncontended) common wisdom in High-Performance Computing (HPC...
research
09/28/2022

Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver

The continued growth in the processing power of FPGAs coupled with high ...
research
08/13/2019

Micro-architectural Analysis of OLAP: Limitations and Opportunities

Understanding micro-architectural behavior is profound in efficiently us...
research
08/03/2020

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

Matrix multiplications between asymmetric bit-width operands, especially...
research
09/06/2023

Vector-Processing for Mobile Devices: Benchmark and Analysis

Vector processing has become commonplace in today's CPU microarchitectur...
research
01/09/2023

Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

The diversity of workload requirements and increasing hardware heterogen...
research
11/09/2021

Adaptable Register File Organization for Vector Processors

Modern scientific applications are getting more diverse, and the vector ...

Please sign up or login with your details

Forgot password? Click here to reset