When in Doubt, Summon the Titans: Efficient Inference with Large Models

10/19/2021
by   Ankit Singh Rawat, et al.
5

Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. Empirically, we demonstrate the benefits of our approach on both image classification and natural language processing benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Knowledge distillation is a popular and effective regularization techniq...
research
12/11/2020

Reinforced Multi-Teacher Selection for Knowledge Distillation

In natural language processing (NLP) tasks, slow inference speed and hug...
research
05/13/2023

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

Knowledge distillation is of key importance to launching multilingual pr...
research
09/08/2019

Transformer to CNN: Label-scarce distillation for efficient text classification

Significant advances have been made in Natural Language Processing (NLP)...
research
08/16/2020

Cascaded channel pruning using hierarchical self-distillation

In this paper, we propose an approach for filter-level pruning with hier...
research
10/16/2021

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

Distilling state-of-the-art transformer models into lightweight student ...
research
10/27/2022

QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation

Large Language Models (LLMs) have shown impressive results on a variety ...

Please sign up or login with your details

Forgot password? Click here to reset