E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

03/01/2022
by   Mohammad Akbari, et al.
0

Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of modules, or re-training. Unlike existing methods that are only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The E-LANG performance is verified through a set of experiments with T5 and BERT backbones on GLUE, SuperGLUE, and WMT. In particular, we outperform T5-11B with an average computations speed-up of 3.3× on GLUE and 2.9× on SuperGLUE. We also achieve BERT-based SOTA on GLUE with 3.2× less computations. Code and demo are available in the supplementary materials.

READ FULL TEXT
research
09/20/2023

Sequence-to-Sequence Spanish Pre-trained Language Models

In recent years, substantial advancements in pre-trained language models...
research
10/20/2021

EBJR: Energy-Based Joint Reasoning for Adaptive Inference

State-of-the-art deep learning models have achieved significant performa...
research
05/25/2021

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

The Lottery Ticket Hypothesis suggests that an over-parametrized network...
research
04/29/2020

General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

The state of the art on many NLP tasks is currently achieved by large pr...
research
08/07/2021

Tiny Neural Models for Seq2Seq

Semantic parsing models with applications in task oriented dialog system...
research
10/13/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

While large scale pre-trained language models such as BERT have achieved...
research
08/17/2023

Discrete Prompt Compression with Reinforcement Learning

Instruction-tuned Language Models (LMs) are widely used by users to addr...

Please sign up or login with your details

Forgot password? Click here to reset