TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

03/27/2023
by   Shikhar Tuli, et al.
0

Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently-proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose ELECTOR, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3 state-of-the-art pair while incurring 5.2× lower latency and 3.0× lower energy consumption.

READ FULL TEXT

page 1

page 5

page 9

page 14

research
03/24/2023

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

Automated design of efficient transformer models has recently attracted ...
research
02/28/2023

AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

Self-attention-based transformer models have achieved tremendous success...
research
12/07/2022

CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework

Recently, automated co-design of machine learning (ML) models and accele...
research
12/06/2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Many state-of-the-art deep learning models for computer vision tasks are...
research
02/27/2023

Full Stack Optimization of Transformer Inference: a Survey

Recent advances in state-of-the-art DNN architecture design have been mo...
research
07/07/2022

Training Transformers Together

The infrastructure necessary for training state-of-the-art models is bec...
research
03/14/2022

Energy-Latency Attacks via Sponge Poisoning

Sponge examples are test-time inputs carefully-optimized to increase ene...

Please sign up or login with your details

Forgot password? Click here to reset