Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework

12/06/2022
by   F. N. Peccia, et al.
0

The deployment of neural networks on heterogeneous SoCs coupled with custom accelerators is a challenging task because of the lack of end-to-end software tools provided for these systems. Moreover, the already available low level schedules and mapping strategies provided by the accelerator developers for typical tensor operations are not necessarily the best possible ones for each particular use case. This is why frameworks which automatically test the performance of the generated code on a specific hardware configuration are of special interest. In this work, the integration between the code generation framework TVM and the systolic array-based accelerator Gemmini is presented. A generic schedule to offload the GEneral Matrix Multiply (GEMM) tensor operation onto Gemmini is detailed, and its suitability is tested by executing the AutoTVM tuning process on it. Our generated code achieves a peak throughput of 46 giga-operations per second (GOPs) under a 100 MHz clock on a Xilinx ZCU102 FPGA, outperforming previous work. Furthermore, the code generated by this integration was able to surpass the default hand-tuned schedules provided by the Gemmini developers in real-world workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2023

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization

There are plenty of graph neural network (GNN) accelerators being propos...
research
08/11/2023

Code Transpilation for Hardware Accelerators

DSLs and hardware accelerators have proven to be very effective in optim...
research
12/10/2019

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads

In recent years, there has been tremendous advances in hardware accelera...
research
02/12/2018

TVM: End-to-End Optimization Stack for Deep Learning

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive...
research
07/23/2023

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Along with the fast evolution of deep neural networks, the hardware syst...
research
04/20/2020

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

Specialized accelerators for tensor-operations, such as blocked-matrix o...
research
02/26/2018

Tornado: A Practical And Efficient Heterogeneous Programming Framework For Managed Languages

This paper describes our experiences creating Tornado: a practical and e...

Please sign up or login with your details

Forgot password? Click here to reset