Autovesk: Automatic vectorization of unstructured static kernels by graph transformations

01/03/2023
by   Hayfa Tayeb, et al.
0

Leveraging the SIMD capability of modern CPU architectures is mandatory to take full benefit of their increasing performance. To exploit this feature, binary executables must be explicitly vectorized by the developers or an automatic vectorization tool. This why the compilation research community has created several strategies to transform a scalar code into a vectorized implementation. However, the majority of the approaches focus on regular algorithms, such as affine loops, that can be vectorized with few data transformations. In this paper, we present a new approach that allow automatically vectorizing scalar codes with chaotic data accesses as long as their operations can be statically inferred. We describe how our method transforms a graph of scalar instructions into a vectorized one using different heuristics with the aim of reducing the number or cost of the instructions. Finally, we demonstrate the interest of our approach on various computational kernels using Intel AVX-512 and ARM SVE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512

The sparse matrix/vector product (SpMV) is a fundamental operation in sc...
research
01/21/2021

UNIT: Unifying Tensorized Instruction Compilation

Because of the increasing demand for computation in DNN, researchers dev...
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
04/11/2021

A Deep Learning Based Cost Model for Automatic Code Optimization

Enabling compilers to automatically optimize code has been a longstandin...
research
07/16/2021

A method for decompilation of AMD GCN kernels to OpenCL

Introduction: Decompilers are useful tools for software analysis and sup...
research
02/06/2020

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

At the heart of deep learning training and inferencing are computational...
research
05/29/2021

Examiner: Automatically Locating Inconsistent Instructions Between Real Devices and CPU Emulators for ARM

Emulator is widely used to build dynamic analysis frameworks due to its ...

Please sign up or login with your details

Forgot password? Click here to reset