KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow

06/09/2023
by   Zhiyao Li, et al.
0

Dataflow scheduling decisions are of vital importance to neural network (NN) accelerators. Recent scalable NN accelerators support a rich set of advanced dataflow techniques. The problems of comprehensively representing and quickly finding optimized dataflow schemes thus become significantly more complicated and challenging. In this work, we first propose comprehensive and pragmatic dataflow representations for temporal and spatial scheduling on scalable multi-node NN architectures. An informal hierarchical taxonomy highlights the tight coupling across different levels of the dataflow space as the major difficulty for fast design exploration. A set of formal tensor-centric directives accurately express various inter-layer and intra-layer schemes, and allow for quickly determining their validity and efficiency. We then build a generic, optimized, and fast dataflow solver, KAPLA, which makes use of the pragmatic directives to explore the design space with effective validity check and efficiency estimation. KAPLA decouples the upper inter-layer level for fast pruning, and solves the lower intra-layer schemes with a novel bottom-up cost descending method. KAPLA achieves within only 2.2 the result dataflow for training and inference, respectively, compared to the exhaustively searched optimal schemes. It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.

READ FULL TEXT
research
05/18/2021

TRIM: A Design Space Exploration Model for Deep Neural Networks Inference and Training Accelerators

There is increasing demand for specialized hardware for training deep ne...
research
08/01/2022

GANDSE: Generative Adversarial Network based Design Space Exploration for Neural Network Accelerator Design

With the popularity of deep learning, the hardware implementation platfo...
research
03/01/2021

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

As the need for edge computing grows, many modern consumer devices now c...
research
05/26/2021

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportuni...
research
02/13/2020

NN-PARS: A Parallelized Neural Network Based Circuit Simulation Framework

The shrinking of transistor geometries as well as the increasing complex...
research
05/12/2022

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

In recent years, many accelerators have been proposed to efficiently pro...
research
01/26/2022

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

Dataflow/mapping decides the compute and energy efficiency of DNN accele...

Please sign up or login with your details

Forgot password? Click here to reset