Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

12/06/2022
by   Kavya Sreedhar, et al.
0

Many state-of-the-art deep learning models for computer vision tasks are based on the transformer architecture. Such models can be computationally expensive and are typically statically set to meet the deployment scenario. However, in real-time applications, the resources available for every inference can vary considerably and be smaller than what state-of-the-art models require. We can use dynamic models to adapt the model execution to meet real-time application resource constraints. While prior dynamic work primarily minimized resource utilization for less complex input images, we adapt vision transformers to meet system dynamic resource constraints, independent of the input image. We find that unlike early transformer models, recent state-of-the-art vision transformers heavily rely on convolution layers. We show that pretrained models are fairly resilient to skipping computation in the convolution and self-attention layers, enabling us to create a low-overhead system for dynamic real-time inference without extra training. Finally, we explore compute organization and memory sizes to find settings to efficiency execute dynamic vision transformers. We find that wider vector sizes produce a better energy-accuracy tradeoff across dynamic configurations despite limiting the granularity of dynamic execution, but scaling accelerator resources for larger models does not significantly improve the latency-area-energy-tradeoffs. Our accelerator saves 20 accuracy with pretrained SegFormer B2 model in our dynamic inference approach and 57 accuracy with the Once-For-All approach.

READ FULL TEXT

page 1

page 3

research
02/28/2023

AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

Self-attention-based transformer models have achieved tremendous success...
research
02/26/2020

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep lea...
research
03/27/2023

TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

Automated co-design of machine learning models and evaluation hardware i...
research
06/29/2021

Multi-Exit Vision Transformer for Dynamic Inference

Deep neural networks can be converted to multi-exit architectures by ins...
research
08/31/2022

Efficient Sparsely Activated Transformers

Transformer-based neural networks have achieved state-of-the-art task pe...
research
05/19/2021

Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead

Deploying deep learning models in time-critical applications with limite...
research
03/02/2023

Learning to Grow Pretrained Models for Efficient Transformer Training

Scaling transformers has led to significant breakthroughs in many domain...

Please sign up or login with your details

Forgot password? Click here to reset