FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

07/13/2021
by   Sheng-Chun Kao, et al.
7

Attention mechanisms form the backbone of state-of-the-art machine learning models for a variety of tasks. Deploying them on deep neural network (DNN) accelerators, however, is prohibitively challenging especially under long sequences, as this work identifies. This is due to operators in attention layers exhibiting limited reuse opportunities and quadratic growth in memory footprint, leading to severe memory-boundedness. To address this, we introduce a new attention-tailored dataflow, termed FLAT, which identifies fusion opportunities within the attention layer, and implements an on-chip memory-aware interleaved execution and tiling mechanism. FLAT increases the effective memory bandwidth by efficiently utilizing the high-bandwidth, low-capacity on-chip buffer and thus achieves better run time and compute resource utilization. In our evaluation, FLAT achieves 1.94x and 1.76x speedup and 49 state-of-the-art edge and cloud accelerators.

READ FULL TEXT

page 3

page 5

page 6

page 11

research
10/14/2021

Bandwidth Utilization Side-Channel on ML Inference Accelerators

Accelerators used for machine learning (ML) inference provide great perf...
research
07/15/2022

Multi-node Acceleration for Large-scale GCNs

Limited by the memory capacity and compute power, singe-node graph convo...
research
09/30/2018

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse

Training convolutional neural networks (CNNs) requires intense computati...
research
11/02/2020

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource r...
research
08/06/2023

Nucleotide String Indexing using Range Matching

The two most common data-structures for genome indexing, FM-indices and ...
research
05/04/2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

We present MAESTRO, a framework to describe and analyze CNN dataflows, a...

Please sign up or login with your details

Forgot password? Click here to reset