Case Study for Running Memory-Bound Kernels on RISC-V CPUs

05/16/2023
by   Valentin Volokitin, et al.
0

The emergence of a new, open, and free instruction set architecture, RISC-V, has heralded a new era in microprocessor architectures. Starting with low-power, low-performance prototypes, the RISC-V community has a good chance of moving towards fully functional high-end microprocessors suitable for high-performance computing. Achieving progress in this direction requires comprehensive development of the software environment, namely operating systems, compilers, mathematical libraries, and approaches to performance analysis and optimization. In this paper, we analyze the performance of two available RISC-V devices when executing three memory-bound applications: a widely used STREAM benchmark, an in-place dense matrix transposition algorithm, and a Gaussian Blur algorithm. We show that, compared to x86 and ARM CPUs, RISC-V devices are still expected to be inferior in terms of computation time but are very good in resource utilization. We also demonstrate that well-developed memory optimization techniques for x86 CPUs improve the performance on RISC-V CPUs. Overall, the paper shows the potential of RISC-V as an alternative architecture for high-performance computing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2018

Architectures for High Performance Computing and Data Systems using Byte-Addressable Persistent Memory

Non-volatile, byte addressable, memory technology with performance close...
research
06/30/2021

Improving the Efficiency of Transformers for Resource-Constrained Devices

Transformers provide promising accuracy and have become popular and used...
research
04/02/2021

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific ...
research
02/08/2023

Feature-based SpMV Performance Analysis on Contemporary Devices

The SpMV kernel is characterized by high performance variation per input...
research
03/04/2019

Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs

The Winograd or Cook-Toom class of algorithms help to reduce the overall...
research
04/30/2021

QDOT: Quantized Dot Product Kernel for Approximate High-Performance Computing

Approximate computing techniques have been successful in reducing comput...
research
12/09/2021

High performance computing on Android devices – a case study

High performance computing for low power devices can be useful to speed ...

Please sign up or login with your details

Forgot password? Click here to reset