Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

06/19/2021
by   Gordon E. Moon, et al.
0

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 9

page 10

research
02/15/2023

Toward matrix multiplication for deep learning inference on the Xilinx Versal

The remarkable positive impact of Deep Neural Networks on many Artificia...
research
11/22/2019

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures

Advances in deep learning and neural networks have resulted in the rapid...
research
05/31/2023

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

The wide adoption and significant computing resource consumption of atte...
research
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
research
09/21/2022

POAS: A high-performance scheduling framework for exploiting Accelerator Level Parallelism

Heterogeneous computing is becoming mainstream in all scopes. This new e...
research
09/04/2019

Engineering Boolean Matrix Multiplication for Multiple-Accelerator Shared-Memory Architectures

We study the problem of multiplying two bit matrices with entries either...
research
08/28/2021

Power-Based Attacks on Spatial DNN Accelerators

With proliferation of DNN-based applications, the confidentiality of DNN...

Please sign up or login with your details

Forgot password? Click here to reset