You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

11/21/2022
by   Shengkun Tang, et al.
5

Large-scale Transformer models bring significant improvements for various downstream vision language tasks with a unified architecture. The performance improvements come with increasing model size, resulting in slow inference speed and increased cost for severing. While some certain predictions benefit from the full complexity of the large-scale model, not all of inputs need the same amount of computation to conduct, potentially leading to computation resource waste. To handle this challenge, early exiting is proposed to adaptively allocate computational power in term of input complexity to improve inference efficiency. The existing early exiting strategies usually adopt output confidence based on intermediate layers as a proxy of input complexity to incur the decision of skipping following layers. However, such strategies cannot apply to encoder in the widely-used unified architecture with both encoder and decoder due to difficulty of output confidence estimation in the encoder. It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely MuE. By decomposing the image and text modalities in the encoder, MuE is flexible and can skip different layers in term of modalities, advancing the inference efficiency while minimizing performance drop. Experiments on the SNLI-VE and MS COCO datasets show that the proposed approach MuE can reduce expected inference time by up to 50% and 40% while maintaining 99% and 96% performance respectively.

READ FULL TEXT
research
03/16/2023

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

Dynamic early exiting has been proven to improve the inference speed of ...
research
02/20/2020

Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transfor...
research
04/13/2022

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBE...
research
04/07/2022

PALBERT: Teaching ALBERT to Ponder

Currently, pre-trained models can be considered the default choice for a...
research
10/11/2021

CLIP4Caption ++: Multi-CLIP for Video Caption

This report describes our solution to the VALUE Challenge 2021 in the ca...
research
01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...
research
10/28/2022

LegoNet: A Fast and Exact Unlearning Architecture

Machine unlearning aims to erase the impact of specific training samples...

Please sign up or login with your details

Forgot password? Click here to reset