A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

06/27/2021
by   Joseph Huber, et al.
11

This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor. The goal is to describethe changes required for the new architecture and generate efficient single instruction/multiple data(SIMD) instructions that target the new Scalable Vector Extension instruction set. During manualtuning, the authors used the LLVM tools to improve code parallelization by using OpenMP SIMD,refactored the code and applied transformation that enabled SIMD optimizations, and ensured thatthe correct libraries were used to achieve optimal performance. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor. The authorsaim to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existingand newly introduced LLVM tooling.

READ FULL TEXT
research
10/30/2020

RVCoreP-32IM: An effective architecture to implement mul/div instructions for five stage RISC-V soft processors

RISC-V, an open instruction set architecture, is getting the attention o...
research
07/27/2022

Performance of an Astrophysical Radiation Hydrodynamics Code under Scalable Vector Extension Optimization

We present results of a performance study of an astrophysical radiation ...
research
08/21/2018

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Statically estimating the number of processor clock cycles it takes to e...
research
01/22/2019

SVE-enabling Lattice QCD Codes

Optimization of applications for supercomputers of the highest performan...
research
09/07/2021

Efficient Instruction Scheduling using Real-time Load Delay Tracking

Many hardware structures in today's high-performance out-of-order proces...
research
04/06/2018

Combinatorial Register Allocation and Instruction Scheduling

This paper introduces a combinatorial optimization approach to register ...
research
06/14/2023

Transpiling RTL Pseudo-code of the POWER Instruction Set Architecture to C for Real-time Performance Analysis on Cavatools Simulator

This paper presents a transpiler framework for converting RTL pseudo cod...

Please sign up or login with your details

Forgot password? Click here to reset