Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

02/26/2020
by   Zhuohan Li, et al.
0

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

READ FULL TEXT
research
12/06/2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Many state-of-the-art deep learning models for computer vision tasks are...
research
05/17/2023

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

Large Language Models (LLMs), armed with billions of parameters, exhibit...
research
05/01/2020

When Ensembling Smaller Models is More Efficient than Single Large Models

Ensembling is a simple and popular technique for boosting evaluation per...
research
11/18/2021

Quality and Cost Trade-offs in Passage Re-ranking Task

Deep learning models named transformers achieved state-of-the-art result...
research
05/20/2021

Model Compression

With time, machine learning models have increased in their scope, functi...
research
10/02/2022

Wide Attention Is The Way Forward For Transformers

The Transformer is an extremely powerful and prominent deep learning arc...
research
02/02/2021

It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation

On-device machine learning is becoming a reality thanks to the availabil...

Please sign up or login with your details

Forgot password? Click here to reset