Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

06/02/2020
by   Prasanth Chatarasi, et al.
0

Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that includes novel support for 2D SIMD datapaths and shuffle interconnection network. The current approach to programming the AI Engine relies on a C/C++ API for vector intrinsics. While an advance over assembly-level programming, it requires the programmer to specify a number of low-level operations based on detailed knowledge of the hardware. To address these challenges, we introduce Vyasa, a new programming system that extends the Halide DSL compiler to automatically generate code for the AI Engine. We evaluated Vyasa on 36 CONV2D and 6 CONV3D workloads, and achieved geometric means of 7.6 and 23.3 MACs/cycle for 32-bit and 16-bit operands (which represent 95.9 workloads for which expert-written codes were available to us, Vyasa demonstrated a geometric mean performance improvement of 1.10x with 50x smaller code relative to the expert-written codes.

READ FULL TEXT

page 8

page 9

research
04/20/2023

Backporting RISC-V Vector assembly

Leveraging vectorisation, the ability for a CPU to apply operations to m...
research
01/03/2023

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support ...
research
12/12/2012

Accelerating Inference: towards a full Language, Compiler and Hardware stack

We introduce Dimple, a fully open-source API for probabilistic modeling....
research
09/01/2020

Building Application-Specific Overlays on FPGAs with High-Level Customizable IPs

Overlays are virtual, re-configurable architectures that overlay on top ...
research
07/05/2020

Steroids for DOPed Applications: A Compiler for Automated Data-Oriented Programming

The wide-spread adoption of system defenses such as the randomization of...
research
08/04/2021

High-Performance Level-1 and Level-2 BLAS

The introduction of the Basic Linear Algebra Subroutine (BLAS) in the 19...
research
10/21/2022

A portable coding strategy to exploit vectorization on combustion simulations

The complexity of combustion simulations demands the latest high-perform...

Please sign up or login with your details

Forgot password? Click here to reset