A Variable Vector Length SIMD Architecture for HW/SW Co-designed Processors

02/26/2021
by   Rakesh Kumar, et al.
0

Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to improve the performance. One of the most potent optimizations, vectorization, has been utilized by modern microprocessors, to exploit the data level parallelism through SIMD accelerators. Due to their hardware simplicity, these accelerators have evolved in terms of width from 64-bit vectors in Intel MMX to 512-bit wide vector units in Intel Xeon Phi and AVX-512. Although SIMD accelerators are simple in terms of hardware design, code generation for them has always been a challenge. Moreover, increasing vector lengths with each new generation add to this complexity. This paper explores the scalability of SIMD accelerators from the code generation point of view. We discover that the SIMD accelerators remain underutilized at higher vector lengths mainly due to: a) reduced dynamic instruction stream coverage for vectorization and b) increase in permutations. Both of these factors can be attributed to the rigidness of the SIMD architecture. We propose a novel SIMD architecture that possesses the flexibility needed to support higher vector lengths. Furthermore, we propose Variable Length Vectorization and Selective Writing in a HW/SW co-designed environment to transparently target the flexibility of the proposed architecture. We evaluate our proposals using a set of SPECFP2006 and Physicsbench applications. Our experimental results show an average dynamic instruction reduction of 31 SPECFP2006 and Physicsbench respectively, for 512-bit vector length, over the scalar baseline code.

READ FULL TEXT

page 10

page 11

research
03/08/2022

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

We present a compilation flow for the generation of CNN inference accele...
research
11/09/2021

Adaptable Register File Organization for Vector Processors

Modern scientific applications are getting more diverse, and the vector ...
research
12/22/2022

Accelerating CNN inference on long vector architectures via co-design

CPU-based inference can be an alternative to off-chip accelerators, and ...
research
10/02/2017

The Tersoff many-body potential: Sustainable performance through vectorization

Molecular dynamics models materials by simulating each individual partic...
research
02/19/2019

Securing Accelerators with Dynamic Information Flow Tracking

Systems-on-chip (SoCs) are becoming heterogeneous: they combine general-...
research
02/28/2018

Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

High-performance DSL developers work hard to take advantage of modern ha...
research
03/05/2020

Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Statistical machine learning often uses probabilistic algorithms, such a...

Please sign up or login with your details

Forgot password? Click here to reset