Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

05/24/2023
by   Sheng Shen, et al.
0

The explosive growth of language models and their applications have led to an increased demand for efficient and scalable methods. In this paper, we introduce Flan-MoE, a set of Instruction-Finetuned Sparse Mixture-of-Expert (MoE) models. We show that naively finetuning MoE models on a task-specific dataset (in other words, no instruction-finetuning) often yield worse performance compared to dense models of the same computational complexity. However, our Flan-MoE outperforms dense models under multiple experiment settings: instruction-finetuning only and instruction-finetuning followed by task-specific finetuning. This shows that instruction-finetuning is an essential stage for MoE models. Specifically, our largest model, Flan-MoE-32B, surpasses the performance of Flan-PaLM-62B on four benchmarks, while utilizing only one-third of the FLOPs. The success of Flan-MoE encourages rethinking the design of large-scale, high-performance language models, under the setting of task-agnostic learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

Maybe Only 0.5 Training Data Instruction Tuning

Instruction tuning for large language models (LLMs) has gained attention...
research
08/23/2023

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

The recent surge of generative AI has been fueled by the generative powe...
research
03/13/2023

Scaling Vision-Language Models with Sparse Mixture of Experts

The field of natural language processing (NLP) has made significant stri...
research
09/06/2023

Zero-Resource Hallucination Prevention for Large Language Models

The prevalent use of large language models (LLMs) in various domains has...
research
08/31/2023

Enhancing PLM Performance on Labour Market Tasks via Instruction-based Finetuning and Prompt-tuning with Rules

The increased digitization of the labour market has given researchers, e...
research
11/03/2022

LMentry: A Language Model Benchmark of Elementary Language Tasks

As the performance of large language models rapidly improves, benchmarks...
research
09/09/2021

Analysis of Language Change in Collaborative Instruction Following

We analyze language change over time in a collaborative, goal-oriented i...

Please sign up or login with your details

Forgot password? Click here to reset