Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings

06/04/2023
by   Daniel Rotem, et al.
0

Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different capacities, and allocating resources to each test instance according to its difficulty. In this work, we compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited. First, we observe that for models with the same architecture and size, individual Multi-Model classifiers outperform their Early-Exit counterparts by an average of 2.3 this gap is caused by Early-Exit classifiers sharing model parameters during training, resulting in conflicting gradient updates of model weights. We find that despite this gap, Early-Exit still provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach. To address these issues, we propose SWEET (Separating Weights in Early Exit Transformers), an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights, not updated by other classifiers. We compare SWEET's speed-accuracy curve to standard Early-Exit and Multi-Model baselines and find that it outperforms both methods at fast speeds while maintaining comparable scores to Early-Exit at slow speeds. Moreover, SWEET individual classifiers outperform Early-Exit ones by 1.1 both methods, paving the way for further reduction of inference costs in NLP.

READ FULL TEXT

page 4

page 12

page 14

research
06/07/2020

BERT Loses Patience: Fast and Robust Inference with Early Exit

In this paper, we propose Patience-based Early Exit, a straightforward y...
research
06/24/2022

SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform

Currently, it is a hot research topic to realize accurate, efficient, an...
research
04/16/2020

The Right Tool for the Job: Matching Model and Instance Complexities

As NLP models become larger, executing a trained model requires signific...
research
05/19/2021

Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead

Deploying deep learning models in time-critical applications with limite...
research
05/08/2023

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

The use of modern Natural Language Processing (NLP) techniques has shown...
research
06/17/2022

Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices

Deep neural networks have significantly improved performance on a range ...
research
05/01/2020

When Ensembling Smaller Models is More Efficient than Single Large Models

Ensembling is a simple and popular technique for boosting evaluation per...

Please sign up or login with your details

Forgot password? Click here to reset