AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

02/28/2023
by   Shikhar Tuli, et al.
0

Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing. Despite their efficacy, accelerating the transformer is challenging due to its quadratic computational complexity and large activation sizes. Existing transformer accelerators attempt to prune its tokens to reduce memory access, albeit with high compute overheads. Moreover, previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization. In order to address these challenges, this work proposes a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead, substantially reducing the number of ineffectual operations. This improves the throughput of transformer inference. We further propose tiling the matrices in transformer operations along with diverse dataflows to improve data reuse, thus enabling higher energy efficiency. To effectively implement these methods, we propose AccelTran, a novel accelerator architecture for transformers. Extensive experiments with different models and benchmarks demonstrate that DynaTran achieves higher accuracy than the state-of-the-art top-k hardware-aware pruning strategy while attaining up to 1.2× higher sparsity. One of our proposed accelerators, AccelTran-Edge, achieves 330K× higher throughput with 93K× lower energy requirement when compared to a Raspberry Pi device. On the other hand, AccelTran-Server achieves 5.73× higher throughput and 3.69× lower energy consumption compared to the state-of-the-art transformer co-processor, Energon. The simulation source code is available at https://github.com/jha-lab/acceltran.

READ FULL TEXT

page 1

page 4

page 5

page 8

page 9

page 10

page 11

page 14

research
03/27/2023

TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

Automated co-design of machine learning models and evaluation hardware i...
research
10/18/2021

Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention

In recent years, transformer models have revolutionized Natural Language...
research
12/06/2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Many state-of-the-art deep learning models for computer vision tasks are...
research
05/09/2022

Row-wise Accelerator for Vision Transformer

Following the success of the natural language processing, the transforme...
research
10/18/2022

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

Vision Transformers (ViTs) have achieved state-of-the-art performance on...
research
03/13/2023

X-Former: In-Memory Acceleration of Transformers

Transformers have achieved great success in a wide variety of natural la...
research
03/24/2023

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

Automated design of efficient transformer models has recently attracted ...

Please sign up or login with your details

Forgot password? Click here to reset