Scaling Vision-Language Models with Sparse Mixture of Experts

03/13/2023
by   Sheng Shen, et al.
0

The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as these models become larger and more complex, they also become more challenging to train and deploy. One approach to addressing this challenge is the use of sparsely-gated mixture-of-experts (MoE) techniques, which divide the model into smaller, specialized sub-models that can jointly solve a task. In this paper, we explore the effectiveness of MoE in scaling vision-language models, demonstrating its potential to achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling VLMs. We hope our work will inspire further research into the use of MoE for scaling large-scale vision-language models and other multimodal machine learning applications.

READ FULL TEXT

page 5

page 14

research
12/13/2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has drive...
research
06/10/2021

Scaling Vision with Sparse Mixture of Experts

Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...
research
02/17/2022

Designing Effective Sparse Expert Models

Scale has opened new frontiers in natural language processing – but at a...
research
05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...
research
09/12/2023

Do Generative Large Language Models need billions of parameters?

This paper presents novel systems and methodologies for the development ...
research
08/03/2021

SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

Deep learning has achieved great success in a wide spectrum of multimedi...
research
06/10/2023

Universal Language Modelling agent

Large Language Models are designed to understand complex Human Language....

Please sign up or login with your details

Forgot password? Click here to reset